1 回答

TA貢獻(xiàn)1802條經(jīng)驗(yàn) 獲得超5個(gè)贊
一種可能性是用空格將標(biāo)點(diǎn)符號與單詞分開。我用預(yù)處理函數(shù)來做到這一點(diǎn)pad_punctuation。之后我Tokenizer申請filter=''
import re
import string
from tensorflow.keras.preprocessing.text import Tokenizer
def pad_punctuation(s): return re.sub(f"([{string.punctuation}])", r' \1 ', s)
S = ["The quick brown fox jumped over the lazy dog."]
S = [pad_punctuation(s) for s in S]
t = Tokenizer(filters='')
t.fit_on_texts(S)
print(t.word_index)
結(jié)果:
{'the': 1, 'quick': 2, 'brown': 3, 'fox': 4, 'jumped': 5, 'over': 6, 'lazy': 7, 'dog': 8, '.': 9}
該pad_punctuation功能對所有標(biāo)點(diǎn)符號都有效
添加回答
舉報(bào)