3 回答

TA貢獻(xiàn)1828條經(jīng)驗 獲得超13個贊
你可以做:
# clean the sentence
import re
sent = re.sub(r'\.','',sent)
# convert to list
sent = sent.lower().split()
# get values from dict using comprehension
new_sent = ''.join([str(1) if x in mydict else str(0) for x in sent])
print(new_sent)
'001100000000000000000000100000'

TA貢獻(xiàn)1829條經(jīng)驗 獲得超4個贊
首先不要dict用作變量名,因為內(nèi)置函數(shù)(python 保留字),然后使用list comprehensionwithget將不匹配的值替換為0.
注意:
如果數(shù)據(jù)是這樣的date.Amazing- 標(biāo)點符號后沒有空格需要用空格替換。
df = pd.DataFrame({'reviews':['Simply the best. I bought this last year. Still using. No problems faced till date.Amazing battery life. Works fine in darkness or broad daylight. Best gift for any book lover.']})
d = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}
df['reviews'] = df['reviews'].str.replace(r'[^\w\s]+', ' ').str.lower()
df['newreviews'] = [''.join(d.get(y, '0') for y in x.split()) for x in df['reviews']]
選擇:
df['newreviews'] = df['reviews'].apply(lambda x: ''.join(d.get(y, '0') for y in x.split()))
print (df)
reviews \
0 simply the best i bought this last year stil...
newreviews
0 0011000000000001000000000100000

TA貢獻(xiàn)1895條經(jīng)驗 獲得超7個贊
你可以通過
df.replace(repl, regex=True, inplace=True)
df
你的數(shù)據(jù)框在哪里,repl
你的字典在哪里。
添加回答
舉報