1 回答

TA貢獻(xiàn)1848條經(jīng)驗(yàn) 獲得超6個贊
如果您想首先拆分行并將值提取到列中,請注意您可以使用str.extract. 在您的正則表達(dá)式中使用命名分組,它將自動為您的數(shù)據(jù)框分配列
split_line = r"\s+(?=\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})"
extract_values = r"(?P<date>\d{4}-\d{2}-\d{2})\s(?P<time>\d{2}:\d{2}:\d{2})\s(?P<value_one>.*?)\s(?P<value_two>.*?)\s(?P<value_three>.*?)$"
df = pd.DataFrame([{
"value": "2020-05-12 10:00:00 12.07 13 11.56 2020-06-12 11:00:00 13.07 16 11.16 2020-05-12 10:00:01 11.49 17 5.67",
},{
"value": "2020-05-13 10:00:00 14.07 13 15.56 2020-05-16 10:00:02 11.51 18 5.69",
}])
df = df["value"].str.split(split_line).explode().str.extract(extract_values, expand=True)
print(df)
# date time value_one value_two value_three
# 0 2020-05-12 10:00:00 12.07 13 11.56
# 0 2020-06-12 11:00:00 13.07 16 11.16
# 0 2020-05-12 10:00:01 11.49 17 5.67
# 1 2020-05-13 10:00:00 14.07 13 15.56
# 1 2020-05-16 10:00:02 11.51 18 5.69
如果您不知道日期和時間后的組數(shù),請使用split而不是正則表達(dá)式。我會建議這樣的事情:
split_line = r"\s+(?=\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})"
df = pd.DataFrame([{
"value": "2020-05-12 10:00:00 12.07 13 11.56 2020-06-12 11:00:00 13.07 16 11.16 2020-05-12 10:00:01 11.49 17 5.67",
},{
"value": "2020-05-13 10:00:00 14.07 13 14 15 15.56 2020-05-16 10:00:02 11.51 18 5.69",
}])
df = df["value"].str.split(split_line).explode().reset_index()
df = df['value'].str.split(" ").apply(pd.Series)
df.columns = [f"col_{col}" for col in df.columns]
print(df)
# col_0 col_1 col_2 col_3 col_4 col_5 col_6
# 0 2020-05-12 10:00:00 12.07 13 11.56 NaN NaN
# 1 2020-06-12 11:00:00 13.07 16 11.16 NaN NaN
# 2 2020-05-12 10:00:01 11.49 17 5.67 NaN NaN
# 3 2020-05-13 10:00:00 14.07 13 14 15 15.56
# 4 2020-05-16 10:00:02 11.51 18 5.69 NaN NaN
添加回答
舉報(bào)