我有一個DataFramedf_sentences和一個列表question_words,如下所示:df_sentences:sentence labelyou will not forget this movie 0will the novel ever die 1why we drink alcohol 1did trump win the election 1ambiance is perfect 0question_words = ['what', 'why', 'when', 'where', 'whose', 'which', 'whom', 'who', 'how', 'do', 'are', 'will', 'did', 'will', 'am', 'are', 'was', 'were', 'can', 'has', 'have']我想檢查列表中是否sentence存在該列的第一個單詞,question_words并在新列中返回結(jié)果ques_word。預(yù)期產(chǎn)量:sentence label ques_wordyou will not forget this movie 0 0will the novel ever die 1 1why we drink alcohol 1 1did trump win the election 1 1the ambiance is perfect 0 0到目前為止,我正在嘗試使用什么,.str.contains('|'.join(question_words)).astype(int)但是正如預(yù)期的那樣,它將返回與question_wordslist匹配的所有子字符串的所有數(shù)量。
2 回答

慕村9548890
TA貢獻1884條經(jīng)驗 獲得超4個贊
如果您想要快速的解決方案,請使用列表理解。
q_set = set(question_words)
df['ques_word'] = [
1 if w.split(None, 1)[0] in q_set else 0 for w in df.sentence
]
df
sentence label ques_word
0 you will not forget this movie 0 0
1 will the novel ever die 1 1
2 why we drink alcohol 1 1
3 did trump win the election 1 1
4 ambiance is perfect 0 0

揚帆大魚
TA貢獻1799條經(jīng)驗 獲得超9個贊
.str.split(" ")[0].contains('|'.join(question_words)).astype(int)
應(yīng)該做的工作
添加回答
舉報
0/150
提交
取消