首頁猿問將 .txt 數(shù)據(jù)分組到數(shù)據(jù)幀中

將 .txt 數(shù)據(jù)分組到數(shù)據(jù)幀中

Python

守著一只汪 2021-10-19 14:51:35

我有一個(gè) .txt 文件，其中包含如下數(shù)據(jù)：[12.06.17, 13:18:36] Name1: Test test test[12.06.17, 13:20:20] Name2 ??: blabla[12.06.17, 13:20:44] Name2 ??: words words wordswordswordswords[12.06.17, 13:29:03] Name1: more words more words[12.06.17, 13:38:52] Name3 Surname Nickname: ????[12.06.17, 13:40:37] Name1: message?請(qǐng)注意，消息之前可以有多個(gè)名稱，也可以出現(xiàn)多行消息。在過去的幾天里，我已經(jīng)嘗試了很多方法來將數(shù)據(jù)分成“日期”、“時(shí)間”、“名稱”、“消息”組。我能夠弄清楚，正則表達(dá)式(.)(\d+\.\d+\.\d+)(,)(\s)(\d+:\d+:\d+)(.)(\s)([^:]+)(:)能夠捕獲消息之前的所有內(nèi)容（參見：https : //regex101.com/r/hQlgeM/3）。但我無法弄清楚如何添加消息，以便將多行消息分組到上一條消息中。最后：如果我能夠使用正則表達(dá)式從 .txt 中捕獲每個(gè)組，我如何實(shí)際將每個(gè)組傳遞到一個(gè)單獨(dú)的列中。過去三天我一直在查看示例，但我仍然無法弄清楚如何最終構(gòu)建此數(shù)據(jù)框。我嘗試使用的代碼：df = pd.read_csv('chat.txt', names = ['raw'])data = df.iloc[:,0]re.match(r'\[([^]]+)\] ([^:]+):(.*)', data)另一個(gè)無效的嘗試：input_file = open("chat.txt", "r", encoding='utf-8')content = input_file.read()df = pd.DataFrame(content, columns = ['raw'])df['date'] = df['raw'].str.extract(r'^(.)(\d+\.\d+\.\d+)', expand=True)df['time'] = df['raw'].str.extract(r'(\s)(\d+:\d+:\d+)', expand=True)df['name'] = df['raw'].str.extract(r'(\s)([^:]+)(:)', expand=True)df['message'] = df['raw'].str.extract(r'^(.)(?<=:).*$', expand=True)df

查看完整描述

1 回答

湖上湖

TA貢獻(xiàn)2003條經(jīng)驗(yàn) 獲得超2個(gè)贊

這是我認(rèn)為適用于我的情況的解決方案。問題是我在使用 read_csv() 時(shí)它是 txt 數(shù)據(jù)。我還需要在傳入熊貓之前使用正則表達(dá)式來構(gòu)建我的格式：

import re

import pandas as pd

chat = open('chat.txt').read()

pattern = r'(?s)\[(?P<date>\d+(?:\.\d+){2}),\s(?P<time>\d+(?::\d+){2})]\s(?P<name>[^:]+):(?P<message>.*?)(?=\[\d+\.\d+\.\d+,\s\d+:\d+:\d+]|\Z)'

for row in re.findall(pattern, chat):

row

df = pd.DataFrame(re.findall(pattern, chat), columns=['date', 'time', 'name', 'message'])

print (df.tail)

反對(duì) 回復(fù) 2021-10-19

1 回答
0 關(guān)注
182 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

將 .txt 數(shù)據(jù)分組到數(shù)據(jù)幀中

將 .txt 數(shù)據(jù)分組到數(shù)據(jù)幀中

1 回答

添加回答