2 回答

TA貢獻(xiàn)1802條經(jīng)驗(yàn) 獲得超10個(gè)贊
因此,正如您所提到的,使用集合絕對(duì)是這里的方法。這是因?yàn)榧现械牟檎冶攘斜碇械牟檎乙斓枚?。如果您想知道原因,?qǐng)?jiān)?google 上快速搜索散列。進(jìn)行此更改所需要做的就是將 word_list 中的方括號(hào)更改為花括號(hào)。
您需要處理的真正問題是“標(biāo)題是多個(gè)單詞的字符串,而 word_list 是單個(gè)單詞”
您需要做的是遍歷許多單詞。我假設(shè) header_col 是標(biāo)題列表,其中標(biāo)題是包含一個(gè)或多個(gè)單詞的字符串。我們將遍歷所有標(biāo)題,然后遍歷標(biāo)題中的每個(gè)單詞。
word_list = {"Slam", "Slams", "Slammed", "Slamming", "Blast", "Blasts", "Blasting", "Blasted"}
# Iterate over each headline
for headline in headline_col:
# Iterate over each word in headline
# Headline.split will break the headline into a list of words (breaks on whitespace)
for word in headline.split():
# if we've found our word
if word in word_list:
# add the word to our list
slam_list.append(headline)
# we're done with this headline, so break from the inner for loop
break

TA貢獻(xiàn)1827條經(jīng)驗(yàn) 獲得超4個(gè)贊
pandas在這里,由于您正在閱讀 csv,因此使用它來實(shí)現(xiàn)您的目標(biāo)可能會(huì)更容易。
你想要做的是通過它的索引來識(shí)別列,看起來它是 2。然后你找到第三列的值在word_list.
import pandas as pd
df = pd.read_csv("website_headlines.csv")
col = df.columns[2]
df.loc[df[col].isin(word_list), col]
考慮以下示例
import numpy as np
import pandas as pd
word_list = ["Slam", "Slams", "Slammed", "Slamming",
"Blast", "Blasts", "Blasting", "Blasted"]
# add some extra characters to see if limited to exact matches
word_list_mutated = np.random.choice(word_list + [item + '_extra' for item in word_list], 10)
data = {'a': range(1, 11), 'b': range(1, 11), 'c': word_list_mutated}
df = pd.DataFrame(data)
col = df.columns[2]
>>>df.loc[df[col].isin(word_list), col]
a b c
0 1 1 Slams
1 2 2 Slams
2 3 3 Blasted_extra
3 4 4 Blasts
4 5 5 Slams_extra
5 6 6 Slamming_extra
6 7 7 Slam
7 8 8 Slams_extra
8 9 9 Slam
9 10 10 Blasting
添加回答
舉報(bào)