1 回答

TA貢獻(xiàn)2003條經(jīng)驗(yàn) 獲得超2個(gè)贊
我認(rèn)為最好的方法是首先旋轉(zhuǎn)數(shù)據(jù)框,這樣每個(gè)傳感器都有一個(gè)時(shí)間序列列:
df.pivot(columns="location", values="temperature")
location Garage bedroom outside1 outside2
timestamp
2019-08-22 21:28:56 23.54 NaN NaN NaN
2019-08-22 21:29:44 NaN 23.33 NaN NaN
2019-08-22 21:29:53 23.40 NaN NaN NaN
2019-08-23 22:21:06 NaN NaN 25.0 NaN
2019-08-23 22:21:33 NaN NaN NaN 24.12
然后你可以用插值法填充缺失的數(shù)據(jù)
df.pivot(columns="location", values="temperature").interpolate(method="time", limit_direction="both")
location Garage bedroom outside1 outside2
timestamp
2019-08-22 21:28:56 23.540000 23.33 25.0 24.12
2019-08-22 21:29:44 23.422105 23.33 25.0 24.12
2019-08-22 21:29:53 23.400000 23.33 25.0 24.12
2019-08-23 22:21:06 23.400000 23.33 25.0 24.12
2019-08-23 22:21:33 23.400000 23.33 25.0 24.12
現(xiàn)在你應(yīng)該讓所有數(shù)據(jù)點(diǎn)在時(shí)間上對(duì)齊,你可以重新采樣到一個(gè)恒定的采樣率,比方說“1 分鐘”
df.pivot(columns="location", values="temperature").interpolate(method="time", limit_direction="both").resample("1 min").mean()
location Garage bedroom outside1 outside2
timestamp
2019-08-22 21:28:00 23.540000 23.33 25.0 24.12
2019-08-22 21:29:00 23.411053 23.33 25.0 24.12
2019-08-22 21:30:00 NaN NaN NaN NaN
2019-08-22 21:31:00 NaN NaN NaN NaN
2019-08-22 21:32:00 NaN NaN NaN NaN
... ... ... ... ...
2019-08-23 22:17:00 NaN NaN NaN NaN
2019-08-23 22:18:00 NaN NaN NaN NaN
2019-08-23 22:19:00 NaN NaN NaN NaN
2019-08-23 22:20:00 NaN NaN NaN NaN
2019-08-23 22:21:00 23.400000 23.33 25.0 24.12
你顯然有很多丟失的數(shù)據(jù),采樣間隔這么小,數(shù)據(jù)點(diǎn)稀疏,我猜你的實(shí)際數(shù)據(jù)集中有更多(理想情況下,你希望在每個(gè)重采樣間隔中至少有一個(gè)數(shù)據(jù)點(diǎn))。
現(xiàn)在由您和您的實(shí)際數(shù)據(jù)決定如何進(jìn)行。.nearest()您可以使用而不是填充缺失的數(shù)據(jù).mean()。如果缺少的項(xiàng)目只是少數(shù),您可以用滾動(dòng)平均值填充它們。
添加回答
舉報(bào)