2 回答

TA貢獻1876條經(jīng)驗 獲得超7個贊
迭代提取信息比使用
pandas.json_normalize
.如示例數(shù)據(jù)所示, 的值
data
是一種str
類型,必須轉(zhuǎn)換為dict
.主要任務(wù)是從和中提取每一
key
value
對,以創(chuàng)建單獨的記錄。'bid'
'ask'
列表理解執(zhí)行創(chuàng)建單獨記錄的任務(wù)。
import json
import pandas
# list of tuples, where the value of data, is a string
transaction_data = [('1599324732926-0', {'data': '{"timestamp":1599324732.767, "receipt_timestamp":1599324732.9256856, "delta":true, "bid":{"338.9":0.06482,"338.67":3.95535}, "ask":{"339.12":2.47578,"339.13":6.43172}}'}),
('1599324732926-1', {'data': '{"timestamp":1599324732.767, "receipt_timestamp":1599324732.9256856, "delta":true, "bid":{"338.9":0.06482,"338.67":3.95535}, "ask":{"339.12":2.47578,"339.13":6.43172}}'}),
('1599324732926-2', {'data': '{"timestamp":1599324732.767, "receipt_timestamp":1599324732.9256856, "delta":true, "bid":{"338.9":0.06482,"338.67":3.95535}, "ask":{"339.12":2.47578,"339.13":6.43172}}'})]
# create a list of lists for each transaction data
# split each side, key value pair into a separate list
data_key_list = [['timestamp', 'receipt_timestamp', 'delta', 'side', 'price', 'size']]
for v in transaction_data: # # iterate through each transaction
data = json.loads(v[1]['data']) # convert the string to a dict
for side in ['bid', 'ask']: # extract each key, value pair as a separate record
data_key_list += [[data['timestamp'], data['receipt_timestamp'], data['delta'], side, float(k), v] for k, v in data[side].items()]
# create a dataframe
df = pd.DataFrame(data_key_list[1:], columns=data_key_list[0])
# display(df.head())
timestamp receipt_timestamp delta side price size
0 1.59932e+09 1.59932e+09 True bid 338.9 0.06482
1 1.59932e+09 1.59932e+09 True bid 338.67 3.95535
2 1.59932e+09 1.59932e+09 True ask 339.12 2.47578
3 1.59932e+09 1.59932e+09 True ask 339.13 6.43172
4 1.59932e+09 1.59932e+09 True bid 338.9 0.06482
轉(zhuǎn)換為字典列表
df.to_dict(orient='records')
[out]:
[{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 338.9,
'size': 0.06482},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'bid',
'price': 338.67,
'size': 3.95535},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 339.12,
'size': 2.47578},
{'timestamp': 1599324732.767,
'receipt_timestamp': 1599324732.9256856,
'delta': True,
'side': 'ask',
'price': 339.13,
'size': 6.43172},
...]

TA貢獻1784條經(jīng)驗 獲得超2個贊
這并不完全是您問題的答案,因為它不是 pandas 或 numpy 的實現(xiàn),但我認(rèn)為它應(yīng)該可以滿足您的需求。
嘗試看看multiprocessing.pool.Pool.map
假設(shè)您有一個函數(shù)從原始列表接收元組并返回您想要的數(shù)據(jù)字典??梢哉f它的簽名看起來像這樣:
def tuple_to_dict(input):
? ? # conversion code goes here
? ? return result_dict
然后您可以像這樣使用 multiprocessing.Pool() :
import multiprocessing
if __name__ == '__main__':
? ? input_list = [...] # your input list
? ? with multiprocessing.Pool() as pool:
? ? ? ? result_list = pool.map(tuple_to_dict, input_list)
? ? ? ? print(result_list)
筆記:
Pool() 對象的創(chuàng)建應(yīng)該放在一個
if __name__ == "__main__"
塊或從那里調(diào)用的函數(shù)內(nèi)(遞歸) - 否則你會得到一個 RuntimeError放置
with ... as...
?在那里,以便在使用結(jié)束或失敗時關(guān)閉 Pool 對象。如果您不使用“with / as”語法,請在 try/catch 塊內(nèi)使用它,并pool.close()
在其finally
塊中添加語句以確保池已關(guān)閉。
添加回答
舉報