1 回答

TA貢獻1830條經(jīng)驗 獲得超3個贊
您可以對當前代碼稍作修改:
from pandas import DataFrame
import re
df = {'id':[11,12,13,14,15,16],
'term': ['Ford', 'EXpensive', 'TOYOTA', 'Mercedes Benz', 'electric', 'cars'],
'sentence': ['F-FORD FORD/FORD is less expensive than Mercedes Benz.' ,'toyota, hyundai mileage is good compared to ford','tesla is an electric-car','toyota too has electric cars','CARS','CArs are expensive.']
}
#Dataframe creation
df = DataFrame(df,columns= ['id','term','sentence'])
#Dictionary creation
dct = {}
l_term = list(df['term'])
l_id = list(df['id'])
for i,j in zip(l_term,l_id):
dct[str(i).upper()] = j
#Building patterns to replace
pattern = r'(?i)(?<!-)(?<!\w)(?:{})(?!\w)'.format('|'.join(map(re.escape, sorted(df["term"],key=len,reverse=True))))
#Replace
df["sentence"]=df["sentence"].str.replace(pattern, lambda x: "{}|{}".format(x.group(),dct[x.group().upper()]))
注意事項:
dict
是保留名稱,不要命名變量dict
,使用dct
dct[str(i).upper()] = j
- 將大寫的鍵添加到字典中以啟用字典中的鍵不區(qū)分大小寫的搜索df["sentence"]=df["sentence"].str.replace(pattern, lambda x: "{}|{}".format(x.group(),dct[x.group().upper()]))
是主(最后)行,它使用Series.str.replace
它允許使用可調(diào)用作為替換參數(shù),一旦模式匹配,匹配將作為 Match 對象傳遞給 lambda 表達式,x
其中使用檢索值dct[x.group().upper()]
并使用 訪問整個匹配x.group()
。
添加回答
舉報