我有文本字符串,我正在使用以下字符串函數(shù)來清理它?,F(xiàn)在我想縮放它并將其應(yīng)用于數(shù)據(jù)幀。我面臨的挑戰(zhàn)是它不適用于數(shù)據(jù)框。我嘗試申請 numpy 數(shù)組,但結(jié)果是空字符串。數(shù)據(jù)框是單列,具有與給定的行變量相似的字符串: 00 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US...1 Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/2...2 Mozilla/5.0 (iPod; U; CPU iPhone OS 4_1 like M...3 Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/201...4 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...`` line = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; handyCafeCln/3.3.21)" re_print = re.compile('[^%s]' % re.escape(string.printable)) remove_digits = str.maketrans('', '', digits) remove_punc =str.maketrans('', '', string.punctuation) line = line.translate(remove_digits) line = line.translate(remove_punc) line = line.split()結(jié)果:['Mozilla'、'兼容'、'MSIE'、'Windows'、'NT'、'NET'、'CLR'、'handyCafeCln']我嘗試在函數(shù)中打包相同的步驟,但無法將其應(yīng)用于 datframe 并出現(xiàn)以下錯(cuò)誤 Series' object has no attribute 'translatedef clean_pairs(lines): re_print = re.compile('[^%s]' % re.escape(string.printable)) remove_digits = str.maketrans('', '', digits) remove_punc =str.maketrans('', '', string.punctuation) lines.translate(remove_digits) lines.translate(remove_punc) lines.split()df.apply(clean_pairs)
1 回答

POPMUISE
TA貢獻(xiàn)1765條經(jīng)驗(yàn) 獲得超5個(gè)贊
def clean_pairs(lines):
re_print = re.compile('[^%s]' % re.escape(string.printable))
remove_digits = str.maketrans('', '', string.digits)
remove_punc =str.maketrans('', '', string.punctuation)
lines = lines.translate(remove_digits)
lines = lines.translate(remove_punc)
lines = lines.split()
return lines
df = pd.DataFrame([line])
print(df[0].apply(clean_pairs))
添加回答
舉報(bào)
0/150
提交
取消