我有一個需要從 pandas 數(shù)據(jù)框列中刪除的 4,000 個字符串的列表。我下面的代碼適用于下面的示例,但是當(dāng)我在我的 20k+ 行的 pandas 數(shù)據(jù)幀上使用它時,它需要很長時間。關(guān)于加快速度的任何想法?import pandas as pdimport redf = pd.DataFrame( { "ID": [1, 2, 3, 4, 5], "name": [ "Hello Sam how is it going today? oh yeah", "Hello Jane how is it going today? oh yeah", "It is an Hello example how are you doing today?", "how is it going today?n[soldjgf ", "how is it going today Hello World", ], })my_list = ['how is it going today?n[soldjgf', 'how are you doing today?']# =============================================================================# p = re.compile('|'.join(map(re.escape, my_list)))df['cleaned_text'] = [p.sub(' ', text) for text in df['name']]
1 回答

絕地?zé)o雙
TA貢獻(xiàn)1946條經(jīng)驗 獲得超4個贊
使用 df.str.replace()
p = re.compile('|'.join(map(re.escape, my_list)))
df['cleaned_text'] = df['name'].str.replace(p, ' ')
添加回答
舉報
0/150
提交
取消