BIG陽(yáng)
2022-11-09 16:52:54
我有一個(gè)包含客戶 ID 及其 2014-2018 年費(fèi)用的數(shù)據(jù)框。我想要的是數(shù)據(jù)框中每個(gè) ID 的 2014-2018 年費(fèi)用的平均值。但是有一個(gè)條件:如果行(2014-2018)中的一個(gè)單元格為空,則應(yīng)返回 NaN。所以我只希望在 2014-2018 列中的所有 5 個(gè)行單元格都具有數(shù)值時(shí)計(jì)算平均值。初始數(shù)據(jù)框:2014 2015 2016 2017 2018 ID100 122.0 324 632 NaN 12.0120 159.0 54 452 541.0 96.0NaN 164.0 687 165 245.0 20.0180 421.0 512 184 953.0 73.0110 654.0 913 173 103.0 84.0130 NaN 754 124 207.0 26.0170 256.0 843 97 806.0 87.0140 754.0 95 101 541.0 64.0 80 985.0 184 84 90.0 11.0 96 65.0 127 130 421.0 34.0期望的輸出2014 2015 2016 2017 2018 ID mean 100 122.0 324 632 NaN 12.0 NaN 120 159.0 54 452 541.0 96.0 265.20 NaN 164.0 687 165 245.0 20.0 NaN 180 421.0 512 184 953.0 73.0 450.00 110 654.0 913 173 103.0 84.0 390.60 130 NaN 754 124 207.0 26.0 NaN 170 256.0 843 97 806.0 87.0 434.40 140 754.0 95 101 541.0 64.0 326.20 80 985.0 184 84 90.0 11.0 284.60 96 65.0 127 130 421.0 34.0 167.80嘗試過(guò)的代碼: -> 然而,這只是給了我平均值,忽略了 NaN 條件。他們是否有一些簡(jiǎn)短的 lambda 函數(shù)可以將條件添加到代碼中?import pandas as pd?import numpy as np??data = pd.DataFrame({"ID": [12,96,20,73,84,26,87,64,11,34],? "2014": [100,120,np.nan,180,110,130,170,140,80,96],? "2015": [122,159,164,421,654,np.nan,256,754,985,65],? "2016": [324,54,687,512,913,754,843,95,184,127],? "2017": [632,452,165,184,173,124,97,101,84,130],? "2018": [np.nan,541,245,953,103,207,806,541,90,421]})??print(data)??fiveyear = ["2014", "2015", "2016", "2017", "2018"] -> if a cell in these rows is empty(NaN), then NaN should be in the new 'mean'-column. I only want the mean when, all 5 cells in the row have a numeric value.??data.loc[:, 'mean'] = data[fiveyear].mean(axis=1)??print(data)
2 回答

慕桂英4014372
TA貢獻(xiàn)1871條經(jīng)驗(yàn) 獲得超13個(gè)贊
用于dropna
在計(jì)算平均值之前刪除行。因?yàn)閜andas在分配結(jié)果時(shí)會(huì)在索引上對(duì)齊,并且這些行被刪除了,這些刪除行的結(jié)果是NaN
df['mean'] = df[fiveyear].dropna(how='any').mean(1)
也可能mask
只對(duì)那些非空的行產(chǎn)生結(jié)果
df['mean'] = df[fiveyear].mean(1).mask(df[fiveyear].isnull().any(1))
有點(diǎn) hack,但是因?yàn)槟阒滥阈枰?5 個(gè)值,你也可以使用sum
它來(lái)支持這個(gè)min_count
參數(shù),所以任何少于 5 個(gè)值的值都是NaN
df['mean'] = df[fiveyear].sum(1, min_count=len(fiveyear))/len(fiveyear)

楊魅力
TA貢獻(xiàn)1811條經(jīng)驗(yàn) 獲得超6個(gè)贊
這與@ALollz 答案相同,但無(wú)論 df 中有多少年,都可以靈活地檢測(cè)所有列
#get years columns in a list
yearsCols= [c for c in df if c != 'ID']
#calculate mean
df['mean'] = df[yearsCols].dropna(how='any').mean(1)
添加回答
舉報(bào)
0/150
提交
取消