我有一個包含客戶 ID 及其 2014-2018 年費用的數(shù)據(jù)框。我想要的是數(shù)據(jù)框中每個 ID 的 2014-2018 年費用的平均值。但是有一個條件:如果行(2014-2018)中的一個單元格為空,則應(yīng)返回 NaN。所以我只希望在 2014-2018 列中的所有 5 個行單元格都具有數(shù)值時計算平均值。初始數(shù)據(jù)框:2014 2015 2016 2017 2018 ID100 122.0 324 632 NaN 12.0120 159.0 54 452 541.0 96.0NaN 164.0 687 165 245.0 20.0180 421.0 512 184 953.0 73.0110 654.0 913 173 103.0 84.0130 NaN 754 124 207.0 26.0170 256.0 843 97 806.0 87.0140 754.0 95 101 541.0 64.0 80 985.0 184 84 90.0 11.0 96 65.0 127 130 421.0 34.0期望的輸出2014 2015 2016 2017 2018 ID mean 100 122.0 324 632 NaN 12.0 NaN 120 159.0 54 452 541.0 96.0 265.20 NaN 164.0 687 165 245.0 20.0 NaN 180 421.0 512 184 953.0 73.0 450.00 110 654.0 913 173 103.0 84.0 390.60 130 NaN 754 124 207.0 26.0 NaN 170 256.0 843 97 806.0 87.0 434.40 140 754.0 95 101 541.0 64.0 326.20 80 985.0 184 84 90.0 11.0 284.60 96 65.0 127 130 421.0 34.0 167.80嘗試過的代碼: -> 然而,這只是給了我平均值,忽略了 NaN 條件。他們是否有一些簡短的 lambda 函數(shù)可以將條件添加到代碼中?import pandas as pd?import numpy as np??data = pd.DataFrame({"ID": [12,96,20,73,84,26,87,64,11,34],? "2014": [100,120,np.nan,180,110,130,170,140,80,96],? "2015": [122,159,164,421,654,np.nan,256,754,985,65],? "2016": [324,54,687,512,913,754,843,95,184,127],? "2017": [632,452,165,184,173,124,97,101,84,130],? "2018": [np.nan,541,245,953,103,207,806,541,90,421]})??print(data)??fiveyear = ["2014", "2015", "2016", "2017", "2018"] -> if a cell in these rows is empty(NaN), then NaN should be in the new 'mean'-column. I only want the mean when, all 5 cells in the row have a numeric value.??data.loc[:, 'mean'] = data[fiveyear].mean(axis=1)??print(data)
2 回答

慕桂英4014372
TA貢獻(xiàn)1871條經(jīng)驗 獲得超13個贊
用于dropna
在計算平均值之前刪除行。因為pandas在分配結(jié)果時會在索引上對齊,并且這些行被刪除了,這些刪除行的結(jié)果是NaN
df['mean'] = df[fiveyear].dropna(how='any').mean(1)
也可能mask
只對那些非空的行產(chǎn)生結(jié)果
df['mean'] = df[fiveyear].mean(1).mask(df[fiveyear].isnull().any(1))
有點 hack,但是因為你知道你需要所有 5 個值,你也可以使用sum
它來支持這個min_count
參數(shù),所以任何少于 5 個值的值都是NaN
df['mean'] = df[fiveyear].sum(1, min_count=len(fiveyear))/len(fiveyear)

楊魅力
TA貢獻(xiàn)1811條經(jīng)驗 獲得超6個贊
這與@ALollz 答案相同,但無論 df 中有多少年,都可以靈活地檢測所有列
#get years columns in a list
yearsCols= [c for c in df if c != 'ID']
#calculate mean
df['mean'] = df[yearsCols].dropna(how='any').mean(1)
添加回答
舉報
0/150
提交
取消