我有一個像這樣的 Pandas 數(shù)據(jù)框(作者、標(biāo)題和年份不相關(guān),因此 A、T 和 Y):Author Title Year Country A T Y UK. cat@mail.ukA T Y U.S.A.A T Y University of CambridgeA T Y United KingdomA T Y somename@uconn.edu我想要實(shí)現(xiàn)的是一個帶有“干凈”國家列的數(shù)據(jù)框:Author Title Year Country A T Y UKA T Y USAA T Y UKA T Y UKA T Y USA為此,我創(chuàng)建了一個(列表)字典:UK = ['UK.', 'Cambridge', 'United Kingdom']USA = ['U.S.A.', 'conn.edu']my_dict = {'UK': UK, 'USA': USA}輸入以下函數(shù)進(jìn)行清理:def clean_country(country_dict): for key in country_dict: for value in country_dict[key]: if df['Country'].str.contains(value): df['Country'] = np.where(value, key, df['Country'].str.replace('-', ' ')) return df else: continueclean_country(my_dict)但我收到以下錯誤:Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in clean_country File "/Users/birgitte/PycharmProjects/text/venv/lib/python3.7/site-packages/pandas/core/generic.py", line 1555, in __nonzero__ self.__class__.__name__ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().在函數(shù)內(nèi)部使用以下內(nèi)容時:df['Country'].str.contains(value).all(): False(并非所有字段都包含該值)。沒有更改任何國家/地區(qū)字段。df['Country'].str.contains(value).any():真(某些字段包含值)。結(jié)果是ValueError: invalid literal for int() with base 10: 'UK'df['Country'].str.contains(value).item(): 結(jié)果是ValueError: can only convert an array of size 1 to a Python scalardf['Country'].str.contains(value).bool(): 導(dǎo)致ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().非常歡迎任何關(guān)于如何實(shí)現(xiàn)“干凈”國家專欄的幫助。
1 回答

滄海一幻覺
TA貢獻(xiàn)1824條經(jīng)驗(yàn) 獲得超5個贊
您可以apply在 DataFrame 中使用該函數(shù)
# Replacement logic
def replace(x):
for key in country_dict:
for value in country_dict[key]:
if value in x:
return key
return x
# use either ways:
df['Country'] = df['Country'].apply(lambda x: replace(x))
# or
df['Country'] = df['Country'].apply(replace)
更新:
正確使用替換方法并修復(fù)復(fù)制粘貼錯誤檢查字符串中是否存在值。
添加回答
舉報(bào)
0/150
提交
取消