1 回答

TA貢獻(xiàn)1860條經(jīng)驗(yàn) 獲得超9個(gè)贊
原因是您已經(jīng)使用了自定義和默認(rèn),因此在提取要素時(shí),請(qǐng)檢查和 之間是否存在任何不一致tokenizerstop_words='english'stop_wordstokenizer
如果您深入研究代碼,您會(huì)發(fā)現(xiàn)此代碼片段正在執(zhí)行一致性檢查:sklearn/feature_extraction/text.py
def _check_stop_words_consistency(self, stop_words, preprocess, tokenize):
"""Check if stop words are consistent
Returns
-------
is_consistent : True if stop words are consistent with the preprocessor
and tokenizer, False if they are not, None if the check
was previously performed, "error" if it could not be
performed (e.g. because of the use of a custom
preprocessor / tokenizer)
"""
if id(self.stop_words) == getattr(self, '_stop_words_id', None):
# Stop words are were previously validated
return None
# NB: stop_words is validated, unlike self.stop_words
try:
inconsistent = set()
for w in stop_words or ():
tokens = list(tokenize(preprocess(w)))
for token in tokens:
if token not in stop_words:
inconsistent.add(token)
self._stop_words_id = id(self.stop_words)
if inconsistent:
warnings.warn('Your stop_words may be inconsistent with '
'your preprocessing. Tokenizing the stop '
'words generated tokens %r not in '
'stop_words.' % sorted(inconsistent))
如您所見,如果發(fā)現(xiàn)不一致,它會(huì)引發(fā)警告。
添加回答
舉報(bào)