首頁猿問 Python -...

Python - 查找字符串列表中包含的唯一子字符串的索引，而無需遍歷所有項目

Go

慕妹3146593 2022-08-25 16:33:35

我有一個問題，聽起來像是已經(jīng)問過的問題，但實際上我找不到一個真正好的答案。每天我都有一個列表，里面有幾千個字符串。我也知道這個字符串將始終包含一個包含單詞“other”的項目。例如，有一天我可能有：a = ['mark','george', .... , " ...other ...", "matt','lisa', ... ]另一天我可能會得到：a = ['karen','chris','lucas', ............................., '...other']如您所見，包含子字符串“other”的項的位置是隨機的。我的目標是盡快獲得包含子字符串“other”的項目的索引。我在這里找到了其他答案，大多數(shù)人建議列出對尋找的理解。例如：在Python中查找列表中的子字符串并檢查Python列表項是否在另一個字符串中包含字符串它們對我不起作用，因為它們太慢了。此外，其他解決方案建議使用“any”來簡單地檢查列表中是否包含“other”，但我需要索引而不是布爾值。我相信正則表達式可能是一個很好的潛在解決方案，即使我很難弄清楚如何。到目前為止，我只是設法做了以下事情：# any_other_value_available will tell me extremely quickly if 'other' is contained in list.any_other_value_available = 'other' in str(list_unique_keys_in_dict).lower()從這里開始，我不太清楚該怎么辦。有什么建議嗎？謝謝

查看完整描述

2 回答

30秒到達戰(zhàn)場

TA貢獻1828條經(jīng)驗獲得超6個贊

探索的方法

1. 生成器方法

next(i for i,v in enumerate(test_strings) if 'other' in v)

2. 列表理解法

[i for i,v in enumerate(test_strings) if 'other' in v]

3. 將索引與生成器一起使用（由@HeapOverflow建議）

test_strings.index(next(v for v in test_strings if 'other' in v))

4. 帶生成器的正則表達式

re_pattern = re.compile('.*other.*')next(test_strings.index(x) for x in test_strings if re_pattern.search(x))

結論

索引方法具有最快的時間（@HeapOverflow在注釋中建議的方法）。

測試代碼

使用使用timeit的Perfplot

import random

import string

import re

import perfplot

def random_string(N):

return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))

def create_strings(length):

M = length // 2

random_strings = [random_string(5) for _ in range(length)]

front = ['...other...'] + random_strings

middle = random_strings[:M] + ['...other...'] + random_strings[M:]

end_ = random_strings + ['...other...']

return front, middle, end_

def search_list_comprehension(test_strings):

return [i for i,v in enumerate(test_strings) if 'other' in v][0]

def search_genearator(test_strings):

return next(i for i,v in enumerate(test_strings) if 'other' in v)

def search_index(test_strings):

return test_strings.index(next(v for v in test_strings if 'other' in v))

def search_regex(test_strings):

re_pattern = re.compile('.*other.*')

return next(test_strings.index(x) for x in test_strings if re_pattern.search(x))

# Each benchmark is run with the '..other...' placed in the front, middle and end of a random list of strings.

out = perfplot.bench(

setup=lambda n: create_strings(n), # create front, middle, end strings of length n

kernels=[

lambda a: [search_list_comprehension(x) for x in a],

lambda a: [search_genearator(x) for x in a],

lambda a: [search_index(x) for x in a],

lambda a: [search_regex(x) for x in a],

],

labels=["list_comp", "generator", "index", "regex"],

n_range=[2 ** k for k in range(15)],

xlabel="lenght list",

# More optional arguments with their default values:

# title=None,

# logx="auto", # set to True or False to force scaling

# logy="auto",

# equality_check=numpy.allclose, # set to None to disable "correctness" assertion

# automatic_order=True,

# colors=None,

# target_time_per_measurement=1.0,

# time_unit="s", # set to one of ("auto", "s", "ms", "us", or "ns") to force plot units

# relative_to=1, # plot the timings relative to one of the measurements

# flops=lambda n: 3*n, # FLOPS plots

)

out.show()

print(out)

結果

length list regex list_comp generator index

1.0 10199.0 3699.0 4199.0 3899.0

2.0 11399.0 3899.0 4300.0 4199.0

4.0 13099.0 4300.0 4599.0 4300.0

8.0 16300.0 5299.0 5099.0 4800.0

16.0 22399.0 7199.0 5999.0 5699.0

32.0 34900.0 10799.0 7799.0 7499.0

64.0 59300.0 18599.0 11799.0 11200.0

128.0 108599.0 33899.0 19299.0 18500.0

256.0 205899.0 64699.0 34699.0 33099.0

512.0 403000.0 138199.0 69099.0 62499.0

1024.0 798900.0 285600.0 142599.0 120900.0

2048.0 1599999.0 582999.0 288699.0 239299.0

4096.0 3191899.0 1179200.0 583599.0 478899.0

8192.0 6332699.0 2356400.0 1176399.0 953500.0

16384.0 12779600.0 4731100.0 2339099.0 1897100.0

反對回復 2022-08-25

當年話下

TA貢獻1890條經(jīng)驗獲得超9個贊

如果你正在尋找一個子字符串，正則表達式是找到它的好方法。

在你的情況下，你正在尋找所有包含“other”的子字符串。正如您已經(jīng)提到的，列表中的元素沒有特殊順序。因此，對所需元素的搜索是線性的，即使它是有序的。

可能描述您的搜索的正則表達式是。關于文檔query='.*other.*'

.（點。在默認模式下，這將匹配除換行符以外的任何字符。如果已指定 DOTALL 標志，則此標志將匹配任何字符，包括換行符。

*使生成的 RE 與前一個 RE 的 0 次或更多次重復匹配，并盡可能多地重復。ab* 將匹配 “a”、“ab”或 “a”，后跟任意數(shù)量的 “b”。

在之前和之后，任何字符都可以有0個或更多次重復。.*other

例如

import re

list_of_variables = ['rossum', 'python', '..other..', 'random']

query = '.*other.*'

indices = [list_of_variables.index(x) for x in list_of_variables if re.search(query, x)]

這將返回包含 .在此示例中，將是，因為是列表中的第三個元素。queryindices[2]'...other...'

反對回復 2022-08-25

2 回答
0 關注
156 瀏覽

關注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

Python - 查找字符串列表中包含的唯一子字符串的索引，而無需遍歷所有項目

Python - 查找字符串列表中包含的唯一子字符串的索引，而無需遍歷所有項目

2 回答

添加回答

Python - 查找字符串列表中包含的唯一子字符串的索引，而無需遍歷所有項目

Python - 查找字符串列表中包含的唯一子字符串的索引，而無需遍歷所有項目