第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號(hào)安全,請(qǐng)及時(shí)綁定郵箱和手機(jī)立即綁定
已解決430363個(gè)問題,去搜搜看,總會(huì)有你想問的

在 Keras 模型中使用 Tf-Idf

在 Keras 模型中使用 Tf-Idf

富國(guó)滬深 2022-07-26 10:22:30
我已將我的訓(xùn)練、測(cè)試和驗(yàn)證句子讀入 train_sentences、test_sentences、val_sentences然后我在這些上應(yīng)用了 Tf-IDF 矢量化器。vectorizer = TfidfVectorizer(max_features=300)vectorizer = vectorizer.fit(train_sentences)X_train = vectorizer.transform(train_sentences)X_val = vectorizer.transform(val_sentences)X_test = vectorizer.transform(test_sentences)我的模型看起來像這樣model = Sequential()model.add(Input(????))model.add(Flatten())model.add(Dense(256, activation='relu'))model.add(Dense(32, activation='relu'))model.add(Dense(8, activation='sigmoid'))model.summary()model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])通常我們?cè)?word2vec 的情況下在嵌入層中傳遞嵌入矩陣。我應(yīng)該如何在 Keras 模型中使用 Tf-IDF?請(qǐng)給我一個(gè)使用的例子。謝謝。
查看完整描述

1 回答

?
收到一只叮咚

TA貢獻(xiàn)1821條經(jīng)驗(yàn) 獲得超5個(gè)贊

我無法想象將 TF/IDF 值與嵌入向量結(jié)合的充分理由,但這里有一個(gè)可能的解決方案:使用功能 API、多個(gè)Inputs 和concatenate函數(shù)。


要連接層輸出,它們的形狀必須對(duì)齊(被連接的軸除外)。一種方法是平均嵌入,然后連接到 TF/IDF 值的向量。


設(shè)置和一些示例數(shù)據(jù)


from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.model_selection import train_test_split


from sklearn.datasets import fetch_20newsgroups


import numpy as np


import keras


from keras.models import Model

from keras.layers import Dense, Activation, concatenate, Embedding, Input


from keras.preprocessing.text import Tokenizer

from keras.preprocessing.sequence import pad_sequences


# some sample training data

bunch = fetch_20newsgroups()

all_sentences = []


for document in bunch.data:

  sentences = document.split("\n")

  all_sentences.extend(sentences)


all_sentences = all_sentences[:1000]


X_train, X_test = train_test_split(all_sentences, test_size=0.1)

len(X_train), len(X_test)


vectorizer = TfidfVectorizer(max_features=300)

vectorizer = vectorizer.fit(X_train)


df_train = vectorizer.transform(X_train)


tokenizer = Tokenizer()

tokenizer.fit_on_texts(X_train)


maxlen = 50


sequences_train = tokenizer.texts_to_sequences(X_train)

sequences_train = pad_sequences(sequences_train, maxlen=maxlen)

模型定義


vocab_size = len(tokenizer.word_index) + 1

embedding_size = 300


input_tfidf = Input(shape=(300,))

input_text = Input(shape=(maxlen,))


embedding = Embedding(vocab_size, embedding_size, input_length=maxlen)(input_text)


# this averaging method taken from:

# https://stackoverflow.com/a/54217709/1987598


mean_embedding = keras.layers.Lambda(lambda x: keras.backend.mean(x, axis=1))(embedding)


concatenated = concatenate([input_tfidf, mean_embedding])


dense1 = Dense(256, activation='relu')(concatenated)

dense2 = Dense(32, activation='relu')(dense1)

dense3 = Dense(8, activation='sigmoid')(dense2)


model = Model(inputs=[input_tfidf, input_text], outputs=dense3)


model.summary()


model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

模型匯總輸出


Model: "model_2"

__________________________________________________________________________________________________

Layer (type)                    Output Shape         Param #     Connected to                     

==================================================================================================

input_11 (InputLayer)           (None, 50)           0                                            

__________________________________________________________________________________________________

embedding_5 (Embedding)         (None, 50, 300)      633900      input_11[0][0]                   

__________________________________________________________________________________________________

input_10 (InputLayer)           (None, 300)          0                                            

__________________________________________________________________________________________________

lambda_1 (Lambda)               (None, 300)          0           embedding_5[0][0]                

__________________________________________________________________________________________________

concatenate_4 (Concatenate)     (None, 600)          0           input_10[0][0]                   

                                                                 lambda_1[0][0]                   

__________________________________________________________________________________________________

dense_5 (Dense)                 (None, 256)          153856      concatenate_4[0][0]              

__________________________________________________________________________________________________

dense_6 (Dense)                 (None, 32)           8224        dense_5[0][0]                    

__________________________________________________________________________________________________

dense_7 (Dense)                 (None, 8)            264         dense_6[0][0]                    

==================================================================================================

Total params: 796,244

Trainable params: 796,244

Non-trainable params: 0


查看完整回答
反對(duì) 回復(fù) 2022-07-26
  • 1 回答
  • 0 關(guān)注
  • 114 瀏覽
慕課專欄
更多

添加回答

舉報(bào)

0/150
提交
取消
微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)