第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號(hào)安全,請(qǐng)及時(shí)綁定郵箱和手機(jī)立即綁定
已解決430363個(gè)問題,去搜搜看,總會(huì)有你想問的

標(biāo)記停止字生成的標(biāo)記 ['ha', 'le', 'u', 'wa'] 而不是stop_words

標(biāo)記停止字生成的標(biāo)記 ['ha', 'le', 'u', 'wa'] 而不是stop_words

UYOU 2022-08-02 17:34:02
我正在使用Python制作一個(gè)聊天機(jī)器人。法典:import nltkimport numpy as npimport randomimport string f=open('/home/hostbooks/ML/stewy/speech/chatbot.txt','r',errors = 'ignore')raw=f.read()raw=raw.lower()# converts to lowercasesent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences word_tokens = nltk.word_tokenize(raw)# converts to list of wordslemmer = nltk.stem.WordNetLemmatizer()    def LemTokens(tokens):    return [lemmer.lemmatize(token) for token in tokens]remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)def LemNormalize(text):    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey","hii")GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]def greeting(sentence):    for word in sentence.split():        if word.lower() in GREETING_INPUTS:            return random.choice(GREETING_RESPONSES)from sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.metrics.pairwise import cosine_similaritydef response(user_response):    robo_response=''    sent_tokens.append(user_response)        TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')    tfidf = TfidfVec.fit_transform(sent_tokens)    vals = cosine_similarity(tfidf[-1], tfidf)    idx=vals.argsort()[0][-2]    flat = vals.flatten()    flat.sort()    req_tfidf = flat[-2]        if(req_tfidf==0):        robo_response=robo_response+"I am sorry! I don't understand you"        return robo_response    else:        robo_response = robo_response+sent_tokens[idx]        return robo_response
查看完整描述

1 回答

?
慕碼人2483693

TA貢獻(xiàn)1860條經(jīng)驗(yàn) 獲得超9個(gè)贊

原因是您已經(jīng)使用了自定義和默認(rèn),因此在提取要素時(shí),請(qǐng)檢查和 之間是否存在任何不一致tokenizerstop_words='english'stop_wordstokenizer


如果您深入研究代碼,您會(huì)發(fā)現(xiàn)此代碼片段正在執(zhí)行一致性檢查:sklearn/feature_extraction/text.py


def _check_stop_words_consistency(self, stop_words, preprocess, tokenize):

    """Check if stop words are consistent


    Returns

    -------

    is_consistent : True if stop words are consistent with the preprocessor

                    and tokenizer, False if they are not, None if the check

                    was previously performed, "error" if it could not be

                    performed (e.g. because of the use of a custom

                    preprocessor / tokenizer)

    """

    if id(self.stop_words) == getattr(self, '_stop_words_id', None):

        # Stop words are were previously validated

        return None


    # NB: stop_words is validated, unlike self.stop_words

    try:

        inconsistent = set()

        for w in stop_words or ():

            tokens = list(tokenize(preprocess(w)))

            for token in tokens:

                if token not in stop_words:

                    inconsistent.add(token)

        self._stop_words_id = id(self.stop_words)


        if inconsistent:

            warnings.warn('Your stop_words may be inconsistent with '

                          'your preprocessing. Tokenizing the stop '

                          'words generated tokens %r not in '

                          'stop_words.' % sorted(inconsistent))

如您所見,如果發(fā)現(xiàn)不一致,它會(huì)引發(fā)警告。


查看完整回答
反對(duì) 回復(fù) 2022-08-02
  • 1 回答
  • 0 關(guān)注
  • 200 瀏覽
慕課專欄
更多

添加回答

舉報(bào)

0/150
提交
取消
微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)