首頁猿問如何找到單詞之間的“連...

如何找到單詞之間的“連接”以對句子進行聚類

Python

浮云間 2023-08-15 18:45:22

我需要連接單詞4Gand mobile phonesorInternet以便將有關(guān)技術(shù)的句子聚集在一起。我有以下句子：4G is the fourth generation of broadband network.4G is slow. 4G is defined as the fourth generation of mobile technologyI bought a new mobile phone. 我需要在同一簇中考慮上述句子。目前還沒有，可能是因為它沒有找到 4G 和移動之間的關(guān)系。我嘗試使用firstwordnet.synsets來查找連接4G到互聯(lián)網(wǎng)或手機的同義詞，但不幸的是它沒有找到任何連接。將我正在做的句子聚類如下：rom sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.cluster import KMeansfrom sklearn.metrics import adjusted_rand_scoreimport numpytexts = ["4G is the fourth generation of broadband network.", "4G is slow.", "4G is defined as the fourth generation of mobile technology", "I bought a new mobile phone."]# vectoization of the sentencesvectorizer = TfidfVectorizer(stop_words="english")X = vectorizer.fit_transform(texts)words = vectorizer.get_feature_names()print("words", words)n_clusters=3number_of_seeds_to_try=10max_iter = 300number_of_process=2 # seads are distributedmodel = KMeans(n_clusters=n_clusters, max_iter=max_iter, n_init=number_of_seeds_to_try, n_jobs=number_of_process).fit(X)labels = model.labels_# indices of preferible words in each clusterordered_words = model.cluster_centers_.argsort()[:, ::-1]print("centers:", model.cluster_centers_)print("labels", labels)print("intertia:", model.inertia_)texts_per_cluster = numpy.zeros(n_clusters)for i_cluster in range(n_clusters): for label in labels: if label==i_cluster: texts_per_cluster[i_cluster] +=1 print("Top words per cluster:")for i_cluster in range(n_clusters): print("Cluster:", i_cluster, "texts:", int(texts_per_cluster[i_cluster])), for term in ordered_words[i_cluster, :10]: print("\t"+words[term])print("\n")print("Prediction")任何對此的幫助將不勝感激。

查看完整描述