首頁(yè) 猿問(wèn) TfidfVectorizer 和...

TfidfVectorizer 和 SelectKBest 錯(cuò)誤

Python

www說(shuō) 2023-09-05 20:23:04

我正在嘗試按照本教程進(jìn)行一些情感分析，并且我很確定到目前為止我的代碼完全相同。然而，我的 BOW 值出現(xiàn)了重大差異。https://www.tensorscience.com/nlp/sentiment-analysis-tutorial-in-python-classifying-reviews-on-movies-and-products到目前為止，這是我的代碼。import nltkimport pandas as pdimport stringfrom nltk.corpus import stopwordsfrom sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.feature_selection import SelectKBest, chi2def openFile(path): #param path: path/to/file.ext (str) #Returns contents of file (str) with open(path) as file: data = file.read() return dataimdb_data = openFile('C:/Users/Flengo/Desktop/sentiment/data/imdb_labelled.txt')amzn_data = openFile('C:/Users/Flengo/Desktop/sentiment/data/amazon_cells_labelled.txt')yelp_data = openFile('C:/Users/Flengo/Desktop/sentiment/data/yelp_labelled.txt')datasets = [imdb_data, amzn_data, yelp_data]combined_dataset = []# separate samples from each otherfor dataset in datasets: combined_dataset.extend(dataset.split('\n'))# separate each label from each sampledataset = [sample.split('\t') for sample in combined_dataset]df = pd.DataFrame(data=dataset, columns=['Reviews', 'Labels'])df = df[df["Labels"].notnull()]df = df.sample(frac=1)labels = df['Labels']vectorizer = TfidfVectorizer(min_df=15)bow = vectorizer.fit_transform(df['Reviews'])len(vectorizer.get_feature_names())selected_features = SelectKBest(chi2, k=200).fit(bow, labels).get_support(indices=True)vectorizer = TfidfVectorizer(min_df=15, vocabulary=selected_features)bow = vectorizer.fit_transform(df['Reviews'])bow這是我的結(jié)果。這是教程的結(jié)果。我一直在試圖找出可能出現(xiàn)的問(wèn)題，但還沒(méi)有任何進(jìn)展。

查看完整描述

1 回答

LEATH

TA貢獻(xiàn)1936條經(jīng)驗(yàn) 獲得超7個(gè)贊

問(wèn)題是您正在提供索引，請(qǐng)嘗試提供真正的詞匯。

嘗試這個(gè)：

selected_features = SelectKBest(chi2, k=200).fit(bow, labels).get_support(indices=True)

vocabulary = np.array(vectorizer.get_feature_names())[selected_features]

vectorizer = TfidfVectorizer(min_df=15, vocabulary=vocabulary) # you need to supply a real vocab here

bow = vectorizer.fit_transform(df['Reviews'])

bow

<3000x200 sparse matrix of type '<class 'numpy.float64'>'

with 12916 stored elements in Compressed Sparse Row format>

反對(duì) 回復(fù) 2023-09-05

1 回答
0 關(guān)注
138 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

TfidfVectorizer 和 SelectKBest 錯(cuò)誤

TfidfVectorizer 和 SelectKBest 錯(cuò)誤

1 回答

添加回答