5 回答

TA貢獻(xiàn)1797條經(jīng)驗(yàn) 獲得超6個(gè)贊
您應(yīng)該使用spacy來(lái)標(biāo)記您的列表,因?yàn)樽匀徽Z(yǔ)言往往很棘手,包括所有例外情況和不包括在內(nèi):
from spacy.lang.en import English
nlp = English()
# Create a Tokenizer with the default settings for English
# including punctuation rules and exceptions
tokenizer = nlp.Defaults.create_tokenizer(nlp)
txt = f.readlines()
line += 1
for txt_line in txt:
? ? [print(f'Word {word} found at line {line}; pos: {txt.index(word)}') for word in nlp(txt)]
或者,您可以通過(guò)以下方式使用textblob :
# from textblob import TextBlob
txt = f.readlines()
blob = TextBlob(txt)
for index, word in enumerate(list(blob.words)):
? ? line = line + 1
? ? print(f'Word {word.text} found in position {index} at line {line}')

TA貢獻(xiàn)1784條經(jīng)驗(yàn) 獲得超7個(gè)贊
用于nltk
以可靠的方式標(biāo)記您的文本。另外,請(qǐng)記住文本中的單詞可能會(huì)混合大小寫(xiě)。在搜索之前將它們轉(zhuǎn)換為小寫(xiě)。
import nltk words = nltk.word_tokenize(txt.lower())

TA貢獻(xiàn)1804條經(jīng)驗(yàn) 獲得超3個(gè)贊
一般的正則表達(dá)式,以及\b具體的術(shù)語(yǔ),意思是“單詞邊界”,是我將單詞與其他任意字符分開(kāi)的方式。這是一個(gè)例子:
import re
# words with arbitrary characters in between
data = """now is; the time for, all-good-men
to come\t to the, aid of
their... country"""
exp = re.compile(r"\b\w+")
pos = 0
while True:
m = exp.search(data, pos)
if not m:
break
print(m.group(0))
pos = m.end(0)
結(jié)果:
now
is
the
time
for
all
good
men
to
come
to
the
aid
of
their
country

TA貢獻(xiàn)1828條經(jīng)驗(yàn) 獲得超3個(gè)贊
您可以使用正則表達(dá)式:
import re
words_to_find = ["test1", "test2", "test3"] # converted this to a list to use `in`
line = 0
with open("User_Input.txt", "r") as f:
? txt = f.readline()
? line += 1
? rx = re.findall('(\w+)', txt) # rx will be a list containing all the words in `txt`
? # you can iterate for every word in a line
? for word in rx: # for every word in the RegEx list
? ? if word in words_to_find: print(word)
? ? # or you can iterate through your search case only
? ? # note that this will find only the first occurance of each word in `words_to_find`
? ? for word in words_to_find # `test1`, `test2`, `test3`...
? ? ? if word in rx: print(word) # if `test1` is present in this line's list of words...
上面的代碼的作用是將(\w+)正則表達(dá)式應(yīng)用于您的文本字符串并返回匹配列表。在這種情況下,正則表達(dá)式將匹配任何由空格分隔的單詞。

TA貢獻(xiàn)1853條經(jīng)驗(yàn) 獲得超18個(gè)贊
如果您嘗試在文本文件中查找單詞 test1、test2 或 test3,則不需要手動(dòng)增加行值。假設(shè)文本文件中的每個(gè)單詞都在單獨(dú)的行上,則以下代碼有效
words_to_find = ("test1", "test2", "test3")
file = open("User_Input.txt", "r").readlines()
for line in file:
txt = line.strip('\n')
for word in words_to_find:
if word in txt:
print(F"Word: '{word}' found at line {file.index(line)+1}, "F"pos: {txt.index(word)}")
我不知道立場(chǎng)意味著什么。
添加回答
舉報(bào)