2 回答

TA貢獻(xiàn)1829條經(jīng)驗(yàn) 獲得超9個(gè)贊
您可以使用每個(gè)組的value_counts第一個(gè)索引或第一個(gè)值:modereplace
def f(x):
#remove '**unknown**' rows and get top1 value
return x.replace('**unknown**', x[x.ne('**unknown**')].value_counts().index[0])
#return x.replace('**unknown**', x[x.ne('**unknown**')].mode().iat[0])
df['Country'] = df.groupby('City')['Country'].apply(f)
print (df)
City Country
0 Newyork USA
1 Newyork USA
2 Newyork USA
3 Newyork USA
4 delhi india
5 delhi india
6 delhi india
另一種解決方案是替換**unknown**缺失值,獲取最高值和fillna:
df['Country'] = df['Country'].replace('**unknown**', np.nan)
s = df.groupby('City')['Country'].transform(lambda x: x.value_counts().index[0])
#alternative
#s = df.groupby('City')['Country'].transform(lambda x: x.mode().iat[0])
df['Country'] = df['Country'].fillna(s)
print (df)
City Country
0 Newyork USA
1 Newyork USA
2 Newyork USA
3 Newyork USA
4 delhi india
5 delhi india
6 delhi india
添加回答
舉報(bào)