首頁猿問標(biāo)記停止字生成的標(biāo)記...

標(biāo)記停止字生成的標(biāo)記 ['ha'， 'le'， 'u'， 'wa'] 而不是stop_words

Python

UYOU 2022-08-02 17:34:02

我正在使用Python制作一個聊天機(jī)器人。法典：import nltkimport numpy as npimport randomimport string f=open('/home/hostbooks/ML/stewy/speech/chatbot.txt','r',errors = 'ignore')raw=f.read()raw=raw.lower()# converts to lowercasesent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences word_tokens = nltk.word_tokenize(raw)# converts to list of wordslemmer = nltk.stem.WordNetLemmatizer() def LemTokens(tokens): return [lemmer.lemmatize(token) for token in tokens]remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)def LemNormalize(text): return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey","hii")GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]def greeting(sentence): for word in sentence.split(): if word.lower() in GREETING_INPUTS: return random.choice(GREETING_RESPONSES)from sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.metrics.pairwise import cosine_similaritydef response(user_response): robo_response='' sent_tokens.append(user_response) TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english') tfidf = TfidfVec.fit_transform(sent_tokens) vals = cosine_similarity(tfidf[-1], tfidf) idx=vals.argsort()[0][-2] flat = vals.flatten() flat.sort() req_tfidf = flat[-2] if(req_tfidf==0): robo_response=robo_response+"I am sorry! I don't understand you" return robo_response else: robo_response = robo_response+sent_tokens[idx] return robo_response

查看完整描述

1 回答

慕碼人2483693

TA貢獻(xiàn)1860條經(jīng)驗(yàn) 獲得超9個贊

原因是您已經(jīng)使用了自定義和默認(rèn)，因此在提取要素時，請檢查和之間是否存在任何不一致tokenizerstop_words='english'stop_wordstokenizer

如果您深入研究代碼，您會發(fā)現(xiàn)此代碼片段正在執(zhí)行一致性檢查：sklearn/feature_extraction/text.py

def _check_stop_words_consistency(self, stop_words, preprocess, tokenize):

"""Check if stop words are consistent

Returns

-------

is_consistent : True if stop words are consistent with the preprocessor

and tokenizer, False if they are not, None if the check

was previously performed, "error" if it could not be

performed (e.g. because of the use of a custom

preprocessor / tokenizer)

"""

if id(self.stop_words) == getattr(self, '_stop_words_id', None):

# Stop words are were previously validated

return None

# NB: stop_words is validated, unlike self.stop_words

try:

inconsistent = set()

for w in stop_words or ():

tokens = list(tokenize(preprocess(w)))

for token in tokens:

if token not in stop_words:

inconsistent.add(token)

self._stop_words_id = id(self.stop_words)

if inconsistent:

warnings.warn('Your stop_words may be inconsistent with '

'your preprocessing. Tokenizing the stop '

'words generated tokens %r not in '

'stop_words.' % sorted(inconsistent))

如您所見，如果發(fā)現(xiàn)不一致，它會引發(fā)警告。

反對回復(fù) 2022-08-02

1 回答
0 關(guān)注
215 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

標(biāo)記停止字生成的標(biāo)記 ['ha'， 'le'， 'u'， 'wa'] 而不是stop_words

標(biāo)記停止字生成的標(biāo)記 ['ha'， 'le'， 'u'， 'wa'] 而不是stop_words

1 回答

添加回答

標(biāo)記停止字生成的標(biāo)記 ['ha'， 'le'， 'u'， 'wa'] 而不是stop_words

標(biāo)記停止字生成的標(biāo)記 ['ha'， 'le'， 'u'， 'wa'] 而不是stop_words