2 回答

TA貢獻1866條經(jīng)驗 獲得超5個贊
difflib.get_close_matches() 至少會幫助清理您的代碼,并且可能會運行得更快。
import difflib
p_names = ['BLUEAPPLE', 'GREENBUTTON20', '400100DUCK20']
i_names = ['BLUEAPPLE', 'GREENBUTTON', '100DUCK']
for p in p_names:
print(difflib.get_close_matches(p, i_names))
>>>
['BLUEAPPLE']
['GREENBUTTON']
['100DUCK']
>>>
仍然會進行很多比較,它必須將 p_names 中的每個字符串與 i_names 中的每個字符串匹配。
類似于您使用正則表達式查找匹配項的方法:
import re
for p in p_names:
for i in i_names:
if re.search(i, p):
print(i)
# stop looking
break

TA貢獻1796條經(jīng)驗 獲得超10個贊
試試這個:
def remove_nums(product):
if re.search('\d', product):
for item in item_nums_list:
if item in product:
return item
return re.sub('(\d+)', '', product)
else:
return product
另外,請確保您使用的是普通的 python 解釋器。IPython 和其他具有調(diào)試功能的解釋器比常規(guī)解釋器慢很多。
不過,您可能要考慮先進行一些設(shè)置操作。這是一個小例子:
product_set = set(product_list)
item_number_set = set(item_number_list)
# these are the ones that match straight away
product_matches = product_set & item_number_set
# now we can search through the substrings of ones that don't match
non_matches = product_set - item_number_set
for product in non_matches:
for item_number in item_number_set:
if item_number in product:
product_matches.add(product)
break
# product_matches is now a set of all unique codes contained in both lists by "fuzzy match"
print(product_matches)
您可能會丟失它們出現(xiàn)的順序,但也許您可以找到一種方法來修改它以供您使用。
添加回答
舉報