我有一個 Pandas 數(shù)據(jù)框,它有 4909144 行,time作為索引source_name,dest_address, 和tvalue它與time索引相同。我已經(jīng)整理由DF source_name,dest_address以及tvalue使用以下,使它們按時間分組,然后依次是:df = df.sort_values(by=['sourcehostname','destinationaddress','tvalue'])這給了我: source_name dest_address tvalue time 2019-02-06 15:00:54.000 source_1 72.21.215.90 2019-02-06 15:00:54.000 2019-02-06 15:01:00.000 source_1 72.21.215.90 2019-02-06 15:01:00.000 2019-02-06 15:30:51.000 source_1 72.21.215.90 2019-02-06 15:30:51.000 2019-02-06 15:30:51.000 source_1 72.21.215.90 2019-02-06 15:30:51.000 2019-02-06 15:00:54.000 source_1 131.107.0.89 2019-02-06 15:00:54.000 2019-02-06 15:01:14.000 source_1 131.107.0.89 2019-02-06 15:01:14.000 2019-02-06 15:03:02.000 source_2 69.63.191.1 2019-02-06 15:03:02.000 2019-02-06 15:08:02.000 source_2 69.63.191.1 2019-02-06 15:08:02.000 我想要時間之間的差異,所以我然后使用:#Create deltadf['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)這給了我: source_name dest_address tvalue deltatime 2019-02-06 15:00:54.000 source_1 72.21.215.90 2019-02-06 15:00:54.000 00:00:002019-02-06 15:01:00.000 source_1 72.21.215.90 2019-02-06 15:01:00.000 00:00:062019-02-06 15:30:51.000 source_1 72.21.215.90 2019-02-06 15:30:51.000 00:29:512019-02-06 15:30:51.000 source_1 72.21.215.90 2019-02-06 15:30:51.000 00:00:00但我想按source_nameand分組dest_address并獲得差異,tvalue這樣我就不會在第一個條目后遇到 delta喜歡-1 days +23:30:00或delta喜歡00:01:48的source_2時候應(yīng)該是00:00:00。我在嘗試:df.groupby(['sourcehostname','destinationaddress'])['tvalue'].diff().fillna(0)但這需要很長時間,并且可能無法為我提供我正在尋找的結(jié)果。
1 回答

森欄
TA貢獻1810條經(jīng)驗 獲得超5個贊
import datetime as dt
source_changed = df['sourcehostname'] != df['sourcehostname'].shift()
dest_changed = df['destinationaddress'] != df['destinationaddress'].shift()
change_occurred = (source_changed | dest_changed)
time_diff = df['tvalue'].diff()
now = dt.datetime.utcnow()
zero_delta = now - now
df['time_diff'] = time_diff
df['change_occurred'] = change_occurred
# Then do a function
# If df['change_occurred'] is True -> set the value of df['delta'] to zero_delta
# Else set df['delta'] to the value at df['time_dff']
添加回答
舉報
0/150
提交
取消