是否有可能與蟒蛇熊貓進行模糊匹配?我有兩個DataFrame,我想根據(jù)一個列合并它們。然而,由于交替拼寫,不同的空格數(shù)目,沒有/存在的指示符號,我希望能夠合并,只要它們是相似的另一個。任何相似算法都可以(Soundex,Levenshtein,Difflib)。假設一個DataFrame具有以下數(shù)據(jù):df1 = DataFrame([[1],[2],[3],[4],[5]], index=['one','two','three','four','five'], columns=['number']) numberone 1two 2three 3four 4five 5df2 = DataFrame([['a'],['b'],['c'],['d'],['e']], index=['one','too','three','fours','five'], columns=['letter']) letterone atoo bthree cfours dfive e然后我想要得到最終的DataFrame number letterone 1 atwo 2 bthree 3 cfour 4 dfive 5 e
3 回答

鴻蒙傳說
TA貢獻1865條經(jīng)驗 獲得超7個贊
difflib
get_close_matches
df2
join
:
In [23]: import difflib In [24]: difflib.get_close_matchesOut[24]: <function difflib.get_close_matches>In [25]: df2.index = df2.index. map(lambda x: difflib.get_close_matches(x, df1.index)[0])In [26]: df2Out[26]: letter one a two b three c four d five eIn [31]: df1.join(df2)Out[31]: number letter one 1 a two 2 b three 3 c four 4 d five 5 e
merge
:
df1 = DataFrame([[1,'one'],[2,'two'],[3,'three'],[4,'four'],[5,'five']], columns=['number', 'name'])
df2 = DataFrame([['a','one'],['b','too'],['c','three'],['d','fours'],['e','five']], columns=['letter', 'name'])
df2['name'] = df2['name'].apply(lambda x: difflib.get_close_matches(x, df1['name'])[0])
df1.merge(df2)

翻過高山走不出你
TA貢獻1875條經(jīng)驗 獲得超3個贊
def get_closest_match(x, list_strings): best_match = None highest_jw = 0 for current_string in list_strings: current_score = jellyfish.jaro_winkler(x, current_string) if(current_score > highest_jw): highest_jw = current_score best_match = current_string return best_match df1 = pandas.DataFrame([[1],[2],[3],[4],[5]], index=['one','two','three','four','five'], columns=['number'])df2 = pandas.DataFrame([['a'],['b'],['c'],['d'],['e']], index=['one','too','three','fours','five'], columns=['letter'])df2. index = df2.index.map(lambda x: get_closest_match(x, df1.index))df1.join(df2)
number letter one 1 a two 2 b three 3 c four 4 d five 5 e
添加回答
舉報
0/150
提交
取消