首頁猿問從列表中刪除數(shù)字，如果不包含在其他...

從列表中刪除數(shù)字，如果不包含在其他列表的子字符串中

Python

一只萌萌小番薯 2021-08-05 16:11:04

這是我的情況：我有一個產(chǎn)品名稱列表，例如：BLUEAPPLE, GREENBUTTON20, 400100DUCK20(len = 9000)和一個官方項目名稱列表，例如：BLUEAPPLE, GREENBUTTON, 100DUCK。(len = 2700)由于我將模糊字符串匹配應(yīng)用于產(chǎn)品 - 項目，我想從產(chǎn)品名稱中去除不必要的數(shù)字 - 但保留在官方項目名稱中表示的數(shù)字。我想出了一個解決方案，但問題是它的工作速度非常慢。def remove_nums(product): if bool(re.search('\d'), product): for item in item_nums_list: if item in product_name: substrings = [u for x in product_name.split(item) for u in (x, item)][:-1] no_num_list = [re.sub('(\d+)', '', substring) if substring not in item else substring for substring in substrings] return ''.join(no_num_list) return re.sub('(\d+)', '', product) else: return product例子：product_name = '400100DUCK20'item = '100DUCK'substrings = ['400','100DUCK','20']no_num_list = ['','100OG','']returns '100DUCK'這個函數(shù)被映射，以便它循環(huán)遍歷產(chǎn)品列表中的每個產(chǎn)品。我一直在想辦法在這里使用 lambdas、maps、applys 等，但我無法完全理解它。使用直接列表或熊貓完成我想要做的事情的最有效方法是什么？或者，我從 postgres 數(shù)據(jù)庫中獲取這些項目和產(chǎn)品列表，所以如果您認(rèn)為在 psql 中這樣做會更快，我會走這條路。

查看完整描述

2 回答

心有法竹

TA貢獻1866條經(jīng)驗獲得超5個贊

difflib.get_close_matches() 至少會幫助清理您的代碼，并且可能會運行得更快。

import difflib

p_names = ['BLUEAPPLE', 'GREENBUTTON20', '400100DUCK20']

i_names = ['BLUEAPPLE', 'GREENBUTTON', '100DUCK']

for p in p_names:

print(difflib.get_close_matches(p, i_names))

>>>

['BLUEAPPLE']

['GREENBUTTON']

['100DUCK']

>>>

仍然會進行很多比較，它必須將 p_names 中的每個字符串與 i_names 中的每個字符串匹配。

類似于您使用正則表達式查找匹配項的方法：

import re

for p in p_names:

for i in i_names:

if re.search(i, p):

print(i)

# stop looking

break

反對回復(fù) 2021-08-05

白衣染霜花

TA貢獻1796條經(jīng)驗獲得超10個贊

試試這個：

def remove_nums(product):

if re.search('\d', product):

for item in item_nums_list:

if item in product:

return item

return re.sub('(\d+)', '', product)

else:

return product

另外，請確保您使用的是普通的 python 解釋器。IPython 和其他具有調(diào)試功能的解釋器比常規(guī)解釋器慢很多。

不過，您可能要考慮先進行一些設(shè)置操作。這是一個小例子：

product_set = set(product_list)

item_number_set = set(item_number_list)

# these are the ones that match straight away

product_matches = product_set & item_number_set

# now we can search through the substrings of ones that don't match

non_matches = product_set - item_number_set

for product in non_matches:

for item_number in item_number_set:

if item_number in product:

product_matches.add(product)

break

# product_matches is now a set of all unique codes contained in both lists by "fuzzy match"

print(product_matches)

您可能會丟失它們出現(xiàn)的順序，但也許您可以找到一種方法來修改它以供您使用。

反對回復(fù) 2021-08-05

2 回答
0 關(guān)注
169 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

從列表中刪除數(shù)字，如果不包含在其他列表的子字符串中

從列表中刪除數(shù)字，如果不包含在其他列表的子字符串中

2 回答

添加回答

從列表中刪除數(shù)字，如果不包含在其他列表的子字符串中

從列表中刪除數(shù)字，如果不包含在其他列表的子字符串中