首頁(yè) 猿問(wèn) nltk...

nltk NaiveBayesClassifier情緒分析培訓(xùn)

Python

達(dá)令說(shuō) 2019-12-26 09:58:18

我正在NaiveBayesClassifier使用句子在Python中進(jìn)行訓(xùn)練，這給了我下面的錯(cuò)誤。我不知道錯(cuò)誤可能是什么，任何幫助都將是很好的。我嘗試了許多其他輸入格式，但錯(cuò)誤仍然存在。下面給出的代碼：from text.classifiers import NaiveBayesClassifierfrom text.blob import TextBlobtrain = [('I love this sandwich.', 'pos'), ('This is an amazing place!', 'pos'), ('I feel very good about these beers.', 'pos'), ('This is my best work.', 'pos'), ("What an awesome view", 'pos'), ('I do not like this restaurant', 'neg'), ('I am tired of this stuff.', 'neg'), ("I can't deal with this", 'neg'), ('He is my sworn enemy!', 'neg'), ('My boss is horrible.', 'neg') ]test = [('The beer was good.', 'pos'), ('I do not enjoy my job', 'neg'), ("I ain't feeling dandy today.", 'neg'), ("I feel amazing!", 'pos'), ('Gary is a friend of mine.', 'pos'), ("I can't believe I'm doing this.", 'neg') ]classifier = nltk.NaiveBayesClassifier.train(train)我包括下面的追溯。Traceback (most recent call last): File "C:\Users\5460\Desktop\train01.py", line 15, in <module> all_words = set(word.lower() for passage in train for word in word_tokenize(passage[0])) File "C:\Users\5460\Desktop\train01.py", line 15, in <genexpr> all_words = set(word.lower() for passage in train for word in word_tokenize(passage[0])) File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 87, in word_tokenize return _word_tokenize(text) File "C:\Python27\lib\site-packages\nltk\tokenize\treebank.py", line 67, in tokenize text = re.sub(r'^\"', r'``', text) File "C:\Python27\lib\re.py", line 151, in sub return _compile(pattern, flags).sub(repl, string, count)TypeError: expected string or buffer

查看完整描述

3 回答

慕桂英4014372

TA貢獻(xiàn)1871條經(jīng)驗(yàn) 獲得超13個(gè)贊

有關(guān)NLTK貝葉斯分類(lèi)器的數(shù)據(jù)結(jié)構(gòu)的教程很棒。從更高的角度來(lái)看，我們可以將其視為

我們有帶有情感標(biāo)簽的輸入句子：

training_data = [('I love this sandwich.', 'pos'),

('This is an amazing place!', 'pos'),

('I feel very good about these beers.', 'pos'),

('This is my best work.', 'pos'),

("What an awesome view", 'pos'),

('I do not like this restaurant', 'neg'),

('I am tired of this stuff.', 'neg'),

("I can't deal with this", 'neg'),

('He is my sworn enemy!', 'neg'),

('My boss is horrible.', 'neg')]

讓我們將特征集視為單個(gè)單詞，因此我們從訓(xùn)練數(shù)據(jù)中提取所有可能單詞的列表（我們稱(chēng)之為詞匯），如下所示：

from nltk.tokenize import word_tokenize

from itertools import chain

vocabulary = set(chain(*[word_tokenize(i[0].lower()) for i in training_data]))

本質(zhì)上，vocabulary這里是@ 275365的相同all_word

>>> all_words = set(word.lower() for passage in training_data for word in word_tokenize(passage[0]))

>>> vocabulary = set(chain(*[word_tokenize(i[0].lower()) for i in training_data]))

>>> print vocabulary == all_words

True

從每個(gè)數(shù)據(jù)點(diǎn)（即每個(gè)句子和pos / neg標(biāo)簽），我們要說(shuō)一個(gè)特征（即詞匯中的單詞）是否存在。

>>> sentence = word_tokenize('I love this sandwich.'.lower())

>>> print {i:True for i in vocabulary if i in sentence}

{'this': True, 'i': True, 'sandwich': True, 'love': True, '.': True}

但是我們還想告訴分類(lèi)器，句子中不存在哪個(gè)單詞，而是詞匯中的單詞，因此對(duì)于每個(gè)數(shù)據(jù)點(diǎn)，我們列出詞匯中所有可能的單詞，并說(shuō)出一個(gè)單詞是否存在：

>>> sentence = word_tokenize('I love this sandwich.'.lower())

>>> x = {i:True for i in vocabulary if i in sentence}

>>> y = {i:False for i in vocabulary if i not in sentence}

>>> x.update(y)

>>> print x

{'love': True, 'deal': False, 'tired': False, 'feel': False, 'is': False, 'am': False, 'an': False, 'good': False, 'best': False, '!': False, 'these': False, 'what': False, '.': True, 'amazing': False, 'horrible': False, 'sworn': False, 'ca': False, 'do': False, 'sandwich': True, 'very': False, 'boss': False, 'beers': False, 'not': False, 'with': False, 'he': False, 'enemy': False, 'about': False, 'like': False, 'restaurant': False, 'this': True, 'of': False, 'work': False, "n't": False, 'i': True, 'stuff': False, 'place': False, 'my': False, 'awesome': False, 'view': False}

但是，由于這兩次遍歷詞匯表，因此這樣做更有效：

>>> sentence = word_tokenize('I love this sandwich.'.lower())

>>> x = {i:(i in sentence) for i in vocabulary}

{'love': True, 'deal': False, 'tired': False, 'feel': False, 'is': False, 'am': False, 'an': False, 'good': False, 'best': False, '!': False, 'these': False, 'what': False, '.': True, 'amazing': False, 'horrible': False, 'sworn': False, 'ca': False, 'do': False, 'sandwich': True, 'very': False, 'boss': False, 'beers': False, 'not': False, 'with': False, 'he': False, 'enemy': False, 'about': False, 'like': False, 'restaurant': False, 'this': True, 'of': False, 'work': False, "n't": False, 'i': True, 'stuff': False, 'place': False, 'my': False, 'awesome': False, 'view': False}

因此，對(duì)于每個(gè)句子，我們想告訴每個(gè)句子的分類(lèi)器哪個(gè)詞存在，哪個(gè)詞不存在，并為其賦予pos / neg標(biāo)記。我們可以稱(chēng)其為feature_set，它是一個(gè)由x（如上所示）及其標(biāo)簽組成的元組。

>>> feature_set = [({i:(i in word_tokenize(sentence.lower())) for i in vocabulary},tag) for sentence, tag in training_data]

[({'this': True, 'love': True, 'deal': False, 'tired': False, 'feel': False, 'is': False, 'am': False, 'an': False, 'sandwich': True, 'ca': False, 'best': False, '!': False, 'what': False, '.': True, 'amazing': False, 'horrible': False, 'sworn': False, 'awesome': False, 'do': False, 'good': False, 'very': False, 'boss': False, 'beers': False, 'not': False, 'with': False, 'he': False, 'enemy': False, 'about': False, 'like': False, 'restaurant': False, 'these': False, 'of': False, 'work': False, "n't": False, 'i': False, 'stuff': False, 'place': False, 'my': False, 'view': False}, 'pos'), ...]

然后，我們將feature_set中的這些功能和標(biāo)簽提供給分類(lèi)器以對(duì)其進(jìn)行訓(xùn)練：

from nltk import NaiveBayesClassifier as nbc

classifier = nbc.train(feature_set)

現(xiàn)在，您擁有訓(xùn)練有素的分類(lèi)器，并且當(dāng)您要標(biāo)記新句子時(shí)，您必須“特征化”新句子以查看新句子中哪個(gè)詞在分類(lèi)器接受過(guò)訓(xùn)練的詞匯表中：

>>> test_sentence = "This is the best band I've ever heard! foobar"

>>> featurized_test_sentence = {i:(i in word_tokenize(test_sentence.lower())) for i in vocabulary}

注意：從上面的步驟中可以看到，樸素的貝葉斯分類(lèi)器無(wú)法處理超出詞匯量的單詞，因?yàn)閒oobar標(biāo)記化為特征后會(huì)消失。

然后，您將特征化的測(cè)試句子輸入分類(lèi)器，并要求其進(jìn)行分類(lèi)：

>>> classifier.classify(featurized_test_sentence)

'pos'

希望這可以更清晰地說(shuō)明如何將數(shù)據(jù)輸入到NLTK的樸素貝葉斯分類(lèi)器中進(jìn)行情感分析。這是完整的代碼，沒(méi)有注釋和演練：

from nltk import NaiveBayesClassifier as nbc

from nltk.tokenize import word_tokenize

from itertools import chain

training_data = [('I love this sandwich.', 'pos'),

('This is an amazing place!', 'pos'),

('I feel very good about these beers.', 'pos'),

('This is my best work.', 'pos'),

("What an awesome view", 'pos'),

('I do not like this restaurant', 'neg'),

('I am tired of this stuff.', 'neg'),

("I can't deal with this", 'neg'),

('He is my sworn enemy!', 'neg'),

('My boss is horrible.', 'neg')]

vocabulary = set(chain(*[word_tokenize(i[0].lower()) for i in training_data]))

feature_set = [({i:(i in word_tokenize(sentence.lower())) for i in vocabulary},tag) for sentence, tag in training_data]

classifier = nbc.train(feature_set)

test_sentence = "This is the best band I've ever heard!"

featurized_test_sentence = {i:(i in word_tokenize(test_sentence.lower())) for i in vocabulary}

print "test_sent:",test_sentence

print "tag:",classifier.classify(featurized_test_sentence)

反對(duì) 回復(fù) 2019-12-26

3 回答
0 關(guān)注
632 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書(shū)簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢(xún)優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

nltk NaiveBayesClassifier情緒分析培訓(xùn)

nltk NaiveBayesClassifier情緒分析培訓(xùn)

3 回答

添加回答