首頁(yè) 猿問(wèn) 在數(shù)據(jù)框的特定列上應(yīng)用函數(shù)

在數(shù)據(jù)框的特定列上應(yīng)用函數(shù)

Python

BIG陽(yáng) 2022-06-02 15:19:06

def include_mean(): if pd.isnull('Age'): if 'Pclass'==1: return 38 elif 'Pclass'==2: return 30 elif 'Pclass'==3: return 25 else: return 'Age'train['Age']=train[['Age','Pclass']].apply(include_mean(),axis=1)為什么上面的代碼給我一個(gè)類型錯(cuò)誤。 TypeError: ("'NoneType' object is not callable", 'occurred at index 0')我現(xiàn)在知道正確的代碼是def impute_age(cols): Age = cols[0] Pclass = cols[1] if pd.isnull(Age):if Pclass == 1: return 37elif Pclass == 2: return 29else: return 24else: return Agetrain['Age'] = train[['Age','Pclass']].apply(impute_age,axis=1)現(xiàn)在我想知道為什么需要進(jìn)行更改，即更改背后的確切原因。'cols' 在這里做什么。

查看完整描述

3 回答

慕雪6442864

TA貢獻(xiàn)1812條經(jīng)驗(yàn) 獲得超5個(gè)贊

sttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

當(dāng)您在applypanda 數(shù)據(jù)幀上使用該方法時(shí)，您傳遞給 apply 的函數(shù)會(huì)在每一列（或每一行，取決于axis默認(rèn)為0列軸的參數(shù)）上調(diào)用。因此，您的函數(shù)必須具有apply將傳遞給它的行的參數(shù)。

def include_mean():

if pd.isnull('Age'):

if 'Pclass'==1:

return 38

elif 'Pclass'==2:

return 30

elif 'Pclass'==3:

return 25

else: return 'Age'

這有幾個(gè)問(wèn)題。

'Pclass'==1:保證為False，因?yàn)槟诒容^一個(gè)字符串( 'Pclass') 和一個(gè)整數(shù)( 1)，它們不能相等。您想要的是比較Pclass列條目的值，您可以通過(guò)索引列來(lái)檢索它：col["Pclass"]，或者col[1]如果Pclass是第二列。
如果pd.isnull('Age')是False，則函數(shù)返回None。由于字符串'Age'不為空，因此應(yīng)該始終如此。當(dāng)你這樣做時(shí)d.apply(include_mean())，你正在調(diào)用include_mean，它返回None，然后將該值傳遞給apply. 但apply需要一個(gè)可調(diào)用的（例如一個(gè)函數(shù)）。
在else子句中，您將返回 string 'Age'。'Age'這意味著您的數(shù)據(jù)框?qū)⒃谀承﹩卧裰芯哂兄怠?/p>

您的第二個(gè)示例解決了這些問(wèn)題： impute_age 函數(shù)現(xiàn)在為 row( ) 提供一個(gè)參數(shù)，查找和比較colstheAge和列的值，然后您傳遞該函數(shù)而不調(diào)用該方法。Pclassapply

反對(duì) 回復(fù) 2022-06-02

翻翻過(guò)去那場(chǎng)雪

TA貢獻(xiàn)2065條經(jīng)驗(yàn) 獲得超14個(gè)贊

歡迎來(lái)到 Python。要回答您的問(wèn)題，尤其是在開(kāi)始階段，有時(shí)您只需要打開(kāi)一個(gè)新的 IPython 筆記本并嘗試一下：

In [1]: import pandas as pd

...: def function(x):

...: return x+1

...:

...: df = pd.DataFrame({'values':range(10)})

...: print(df)

...:

values

0 0

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

9 9

In [2]: print(df.apply(function))

values

0 1

1 2

2 3

3 4

4 5

5 6

6 7

7 8

8 9

9 10

在您的問(wèn)題中，cols是您循環(huán)的每一行的值。

反對(duì) 回復(fù) 2022-06-02

墨色風(fēng)雨

TA貢獻(xiàn)1853條經(jīng)驗(yàn) 獲得超6個(gè)贊

不要使用apply(axis=1). 相反，您應(yīng)該使用.loc. 這是頂殼的簡(jiǎn)單映射。

m = train.Age.isnull()

d = {1: 38, 2: 30, 3: 25}

train.loc[m, 'Age'] = train.loc[m, 'Pclass'].map(d)

對(duì)于底部情況，因?yàn)閑lse我們可以使用np.select. 它的工作方式是我們創(chuàng)建一個(gè)條件列表，它遵循 if、elif else 邏輯的順序。然后我們提供一個(gè)選擇列表，當(dāng)我們遇到第一個(gè)時(shí)可以從中選擇True。由于您有嵌套邏輯，我們需要首先取消嵌套它，以便它在邏輯上讀作

if age is null and pclass == 1

elif age is null and pclass == 2

elif age is null

else

樣本數(shù)據(jù)

import pandas as pd

import numpy as np

df = pd.DataFrame({'Age': [50, 60, 70, np.NaN, np.NaN, np.NaN, np.NaN],

'Pclass': [1, 1, 1, 1, 2, np.NaN, 1]})

# Age Pclass

#0 50.0 1.0

#1 60.0 1.0

#2 70.0 1.0

#3 NaN 1.0

#4 NaN 2.0

#5 NaN NaN

#6 NaN 1.0

m = df.Age.isnull()

conds = [m & df.Pclass.eq(1),

m & df.Pclass.eq(2),

m]

choices = [37, 29, 24]

df['Age'] = np.select(conds, choices, default=df.Age)

# |

# Takes care of else, i.e. Age not null

print(df)

# Age Pclass

#0 50.0 1.0

#1 60.0 1.0

#2 70.0 1.0

#3 37.0 1.0

#4 29.0 2.0

#5 24.0 NaN

#6 37.0 1.0

反對(duì) 回復(fù) 2022-06-02

3 回答
0 關(guān)注
199 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

在數(shù)據(jù)框的特定列上應(yīng)用函數(shù)

在數(shù)據(jù)框的特定列上應(yīng)用函數(shù)

3 回答

添加回答