3 回答

TA貢獻(xiàn)1752條經(jīng)驗(yàn) 獲得超4個(gè)贊
首先,正確的語(yǔ)法是meds_df[['readcode_1', 'readcode_2','generic_name']](list索引切片中的列名)。這就是為什么你得到一個(gè)KeyError.
要回答您的問題,這是一種實(shí)現(xiàn)方法:
# Updated to use tuple per David's suggestion
idx = pd.concat((med_df[col].astype(str).str.startswith(tuple(list_to_extract)) for col in ['readcode_1', 'readcode_2','generic_name']), axis=1).any(axis=1)
med_df.loc[idx]
結(jié)果:
ID readcode_1 readcode_2 generic_name
1 1001 bxd1 1.146785e+09 Simvastatin
3 1003 NaN NaN Pravastatin
5 1005 bxd4 4.543234e+07 NaN
10 1010 bxde NaN NaN

TA貢獻(xiàn)2012條經(jīng)驗(yàn) 獲得超12個(gè)贊
您可以通過(guò)這種方式進(jìn)行申請(qǐng):
list_to_extract = ["bxd", "Simvastatin", "1146785342", "45432344", "Pravastatin"]
bool_df = df[['readcode_1', 'readcode_2','generic_name']].apply(lambda x: x.str.startswith(tuple(list_to_extract), na=False), axis=1)
df.loc[bool_df[bool_df.any(axis=1)].index]
輸出:
ID readcode_1 readcode_2 generic_name
1 1001 bxd1 1.146785e+09 Simvastatin
3 1003 NaN NaN Pravastatin
5 1005 bxd4 4.543234e+07 NaN
10 1010 bxde NaN NaN
感謝 r.ook 發(fā)現(xiàn)了一個(gè)小錯(cuò)誤

TA貢獻(xiàn)1776條經(jīng)驗(yàn) 獲得超12個(gè)贊
另一種解決方案,在重新創(chuàng)建數(shù)據(jù)幀之前,字符串處理發(fā)生在 vanilla python 中:
list_to_extract = ["bxd", "Simvastatin", "1146785342", "45432344", "Pravastatin"]
cols_to_search = ['readcode_1', 'readcode_2','generic_name']
output = [(ID, *searchbox)
for ID, searchbox in zip(df.ID,df.filter(cols_to_search).to_numpy())
if any([str(box).startswith(tuple(list_to_extract)) for box in searchbox])]
pd.DataFrame(output, columns = df.columns)
ID readcode_1 readcode_2 generic_name
0 1001 bxd1 1.146785e+09 Simvastatin
1 1003 NaN NaN Pravastatin
2 1005 bxd4 4.543234e+07 NaN
3 1010 bxde NaN NaN
添加回答
舉報(bào)