3 回答

TA貢獻(xiàn)1775條經(jīng)驗(yàn) 獲得超11個(gè)贊
也許這是一個(gè)漏洞答案,但您可以從已經(jīng)描述的答案中過濾掉這些值。所以如果你從這個(gè)開始:
>>> df2 = df.favorite_fruit.str.split(expand=True).stack()
>>> df2
0 0 apple
1 banana
2 cherries
1 0 banana
1 cherries
2 dragonfruit
2 0 cherries
1 dragonfruit
3 0 dragonfruit
4 0 apple
1 elderberry
dtype: object
您可以使用isin將數(shù)據(jù)限制為目標(biāo)列表中的數(shù)據(jù):
>>> target = ['apple', 'banana']
>>> df2[df2.isin(target)].value_counts()
banana 2
apple 2
dtype: int64
或者甚至在你最初的回答之后:
>>> df.favorite_fruit.str.split(expand=True).stack().value_counts().loc[target]
apple 2
banana 2
dtype: int64
如果問題是這么多數(shù)據(jù)的expand操作stack成本很高,那么這可能不會(huì)令人滿意。但我認(rèn)為這可能比基于循環(huán)的答案更好?

TA貢獻(xiàn)1789條經(jīng)驗(yàn) 獲得超8個(gè)贊
也許有點(diǎn)迂回的方式,但如果你的favorite_fruit列總是以空格分隔,這樣的方法應(yīng)該可行:
import pandas as pd
list = ['apple','banana','cherries','dragonfruit','elderberry']
data = {'name': ['Alpha', 'Bravo','Charlie','Delta','Echo'],
'favorite_fruit': ['apple banana cherries', 'banana cherries dragonfruit',
'cherries dragonfruit','dragonfruit','apple elderberry']}
df = pd.DataFrame (data, columns = ['name','favorite_fruit'])
new_df = pd.DataFrame()
data = {}
for i, row in df.iterrows():
s = row['favorite_fruit']
items = s.split(' ')
for item in items:
if item in data.keys():
data[item].append(1)
else:
data[item] = [1]
for key, value in data.items():
data[key] = sum(value)
fruit = []
frequency = []
for key, value in data.items():
fruit.append(key)
frequency.append(value)
new_df = pd.DataFrame({'fruit': fruit, 'frequency':frequency})
print(new_df)
這會(huì)打印出以下內(nèi)容:
fruit frequency
0 apple 2
1 banana 2
2 cherries 3
3 dragonfruit 3
4 elderberry 1

TA貢獻(xiàn)1779條經(jīng)驗(yàn) 獲得超6個(gè)贊
拆分后嘗試使用爆炸功能。
df.favorite_fruit.str.split().explode().value_counts()
cherries 3
dragonfruit 3
banana 2
apple 2
elderberry 1
Name: favorite_fruit, dtype: int64
添加回答
舉報(bào)