首頁猿問如何使用python...

如何使用python pandas處理傳入的實(shí)時(shí)數(shù)據(jù)

Python

躍然一笑 2021-03-17 17:32:23

哪一種是使用熊貓?zhí)幚韺?shí)時(shí)傳入數(shù)據(jù)的最推薦/ Python方法？每隔幾秒鐘，我就會收到以下格式的數(shù)據(jù)點(diǎn)：{'time' :'2013-01-01 00:00:00', 'stock' : 'BLAH', 'high' : 4.0, 'low' : 3.0, 'open' : 2.0, 'close' : 1.0}我想將其附加到現(xiàn)有的DataFrame上，然后對其進(jìn)行一些分析。問題是，僅將DataFrame.append添加到行中可能會導(dǎo)致所有復(fù)制的性能問題。我嘗試過的事情：一些人建議預(yù)分配一個(gè)大的DataFrame并在數(shù)據(jù)輸入時(shí)對其進(jìn)行更新：In [1]: index = pd.DatetimeIndex(start='2013-01-01 00:00:00', freq='S', periods=5)In [2]: columns = ['high', 'low', 'open', 'close']In [3]: df = pd.DataFrame(index=t, columns=columns)In [4]: dfOut[4]: high low open close2013-01-01 00:00:00 NaN NaN NaN NaN2013-01-01 00:00:01 NaN NaN NaN NaN2013-01-01 00:00:02 NaN NaN NaN NaN2013-01-01 00:00:03 NaN NaN NaN NaN2013-01-01 00:00:04 NaN NaN NaN NaNIn [5]: data = {'time' :'2013-01-01 00:00:02', 'stock' : 'BLAH', 'high' : 4.0, 'low' : 3.0, 'open' : 2.0, 'close' : 1.0}In [6]: data_ = pd.Series(data)In [7]: df.loc[data['time']] = data_In [8]: dfOut[8]: high low open close2013-01-01 00:00:00 NaN NaN NaN NaN2013-01-01 00:00:01 NaN NaN NaN NaN2013-01-01 00:00:02 4 3 2 12013-01-01 00:00:03 NaN NaN NaN NaN2013-01-01 00:00:04 NaN NaN NaN NaN另一種選擇是建立字典列表。只需將傳入的數(shù)據(jù)附加到列表中，然后將其切成較小的DataFrame，即可完成工作。In [9]: ls = []In [10]: for n in range(5): .....: # Naive stuff ahead =) .....: time = '2013-01-01 00:00:0' + str(n) .....: d = {'time' : time, 'stock' : 'BLAH', 'high' : np.random.rand()*10, 'low' : np.random.rand()*10, 'open' : np.random.rand()*10, 'close' : np.random.rand()*10} .....: ls.append(d)In [11]: df = pd.DataFrame(ls[1:3]).set_index('time')In [12]: dfOut[12]: close high low open stocktime 2013-01-01 00:00:01 3.270078 1.008289 7.486118 2.180683 BLAH2013-01-01 00:00:02 3.883586 2.215645 0.051799 2.310823 BLAH或類似的東西，也許要多處理一些輸入。

查看完整描述

3 回答

阿波羅的戰(zhàn)車

TA貢獻(xiàn)1862條經(jīng)驗(yàn) 獲得超6個(gè)贊

我將使用HDF5 / pytables如下：

將數(shù)據(jù)盡可能長地保留為python列表。
將結(jié)果追加到該列表。
當(dāng)它變大時(shí)：

使用pandas io（和一個(gè)可附加的表）推送到HDF5 Store。
清除列表。

重復(fù)。

實(shí)際上，我定義的函數(shù)為每個(gè)“鍵”使用一個(gè)列表，以便您可以在同一過程中將多個(gè)DataFrame存儲到HDF5存儲。

我們定義一個(gè)函數(shù)，您需要在每一行中調(diào)用它d：

CACHE = {}

STORE = 'store.h5' # Note: another option is to keep the actual file open

def process_row(d, key, max_len=5000, _cache=CACHE):

"""

Append row d to the store 'key'.

When the number of items in the key's cache reaches max_len,

append the list of rows to the HDF5 store and clear the list.

"""

# keep the rows for each key separate.

lst = _cache.setdefault(key, [])

if len(lst) >= max_len:

store_and_clear(lst, key)

lst.append(d)

def store_and_clear(lst, key):

"""

Convert key's cache list to a DataFrame and append that to HDF5.

"""

df = pd.DataFrame(lst)

with pd.HDFStore(STORE) as store:

store.append(key, df)

lst.clear()

注意：我們使用with語句在每次寫入后自動關(guān)閉存儲。它可以更快地保持開放，但即便如此我們建議您定期刷新（收盤刷新）。還要注意，使用collection deque而不是列表可能更易讀，但是列表的性能在這里會稍好一些。

要使用此功能，請致電：

process_row({'time' :'2013-01-01 00:00:00', 'stock' : 'BLAH', 'high' : 4.0, 'low' : 3.0, 'open' : 2.0, 'close' : 1.0},

key="df")

注意：“ df”是pytables存儲中使用的存儲鍵。

作業(yè)完成后，請確保您store_and_clear剩余的緩存：

for k, lst in CACHE.items(): # you can instead use .iteritems() in python 2

store_and_clear(lst, k)

現(xiàn)在，您可以通過以下方式使用完整的DataFrame：

with pd.HDFStore(STORE) as store:

df = store["df"] # other keys will be store[key]

反對回復(fù) 2021-03-26

慕斯王

TA貢獻(xiàn)1864條經(jīng)驗(yàn) 獲得超2個(gè)贊

您實(shí)際上是在嘗試解決兩個(gè)問題：捕獲實(shí)時(shí)數(shù)據(jù)并分析該數(shù)據(jù)。第一個(gè)問題可以通過為此目的設(shè)計(jì)的Python日志記錄來解決。然后可以通過讀取相同的日志文件來解決另一個(gè)問題。

反對回復(fù) 2021-03-26

3 回答
0 關(guān)注
273 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如何使用python pandas處理傳入的實(shí)時(shí)數(shù)據(jù)

如何使用python pandas處理傳入的實(shí)時(shí)數(shù)據(jù)

3 回答

添加回答