2 回答

TA貢獻(xiàn)1784條經(jīng)驗(yàn) 獲得超7個(gè)贊
無法以相同的方式解析所有日期
加載數(shù)據(jù)框
如果尚未將該
dates
列轉(zhuǎn)換為 a,則將其轉(zhuǎn)換為 a。str
使用布爾索引選擇不同的日期類型
假設(shè)常規(guī)日期包含
/
假設(shè) Excel 序列日期不包含
/
根據(jù)日期時(shí)間類型分別修復(fù)每個(gè)數(shù)據(jù)幀
將數(shù)據(jù)幀重新連接在一起。
import pandas as pd
from datetime import datetime
# load data
df = pd.DataFrame({'dates': ['09/01/2020', '05/15/1985', '06/07/2013', '33233', '26299', '29428']})
# display(df)
? ? ? ? dates
0? 09/01/2020
1? 05/15/1985
2? 06/07/2013
3? ? ? ?33233
4? ? ? ?26299
5? ? ? ?29428
# set the column type as a str if it isn't already
df.dates = df.dates.astype('str')
# create a date mask based on the string containing a /
date_mask = df.dates.str.contains('/')
# split the dates out for excel
df_excel = df[~date_mask].copy()
# split the regular dates out
df_reg = df[date_mask].copy()
# convert reg dates to datetime
df_reg.dates = pd.to_datetime(df_reg.dates)
# convert excel dates to datetime; the column needs to be cast as ints
df_excel.dates = pd.TimedeltaIndex(df_excel.dates.astype(int), unit='d') + datetime(1900, 1, 1)
# combine the dataframes
df = pd.concat([df_reg, df_excel])
顯示(df)
? ? ? ?dates
0 2020-09-01
1 1985-05-15
2 2013-06-07
3 1990-12-28
4 1972-01-03
5 1980-07-28

TA貢獻(xiàn)2051條經(jīng)驗(yàn) 獲得超10個(gè)贊
pd.TimedeltaIndex(dates_in_excel_serial_format, 單位='d') + pd.datetime(1900,1,1)
演示:
> dates_in_excel_serial_format = [29428]
> pd.TimedeltaIndex(dates_in_excel_serial_format, unit='d') + pd.datetime(1900,1,1)
< DatetimeIndex(['1980-07-28'], dtype='datetime64[ns]', freq=None)
添加回答
舉報(bào)