2 回答

TA貢獻1895條經(jīng)驗 獲得超7個贊
事實上,我已經(jīng)找到了解決這個問題的辦法。
在gensim.models.keyedvectors文件中class WordEmbeddingKeyedVectors,我們可以從
def init_sims(self, replace=False):
"""Precompute L2-normalized vectors."""
if getattr(self, 'vectors_norm', None) is None or replace:
logger.info("precomputing L2-norms of word weight vectors")
self.vectors_norm = _l2_norm(self.vectors, replace=replace)
到
def init_sims(self, replace=False):
"""Precompute L2-normalized vectors."""
if getattr(self, 'vectors_norm', None) is None or replace:
logger.info("precomputing L2-norms of word weight vectors")
self.vectors_norm = _l2_norm(self.vectors, replace=replace)
elif (len(self.vectors_norm) == len(self.vectors)): #if all of the added vectors are pre-computed into L2-normalized vectors
pass
else: #when there are vectors added but have not been pre-computed into L2-normalized vectors yet
logger.info("adding L2-norm vectors for new documents")
diff = len(self.vectors) - len(self.vectors_norm)
self.vectors_norm = vstack((self.vectors_norm, _l2_norm(self.vectors[-diff:])))
本質(zhì)上,原始函數(shù)所做的是,如果沒有self.vectors_norm,則通過 L2-normalizing 計算self.vectors。但是,如果其中有任何新添加的向量self.vectors沒有被預(yù)先計算為 L2 歸一化向量,我們應(yīng)該預(yù)先計算它們?nèi)缓筇砑拥絪elf.vectors_norm.
我會將其作為評論發(fā)布到您的錯誤報告@gojomo 并添加拉取請求!謝謝 :)

TA貢獻1942條經(jīng)驗 獲得超3個贊
看來該操作并未清除由類似操作add()
創(chuàng)建和重用的歸一化到單位長度向量的緩存。most_similar()
在執(zhí)行 之前或之后add()
,您可以使用以下命令顯式刪除該緩存:
del test.vectors_norm
然后,您test.most_similar('3')
應(yīng)該在沒有IndexError
.
添加回答
舉報