2 回答

TA貢獻1864條經(jīng)驗 獲得超2個贊
您必須對每一行進行所有處理 - 您將一行解析為y變量,對其進行處理,而不是將其寫入輸出文件,而是將其存儲在data列表中。當然,您最終會得到內(nèi)存中的所有數(shù)據(jù)(未序列化,從 json 字符串到 Python 對象將占用數(shù)百 GB 的內(nèi)存)。
如果您的代碼已經(jīng)適用于小樣本,請更改它以編寫每一行:
#!/usr/bin/python
import sys
import json as simplejson
from collections import OrderedDict
first_arg = sys.argv[1]
with open(first_arg, "rt") as jsonFile, open("jl2json_new.json", "wt") as write_file:
for line in jsonFile:
y = simplejson.loads(line,object_pairs_hook=OrderedDict)
payload = y['payload']
first = payload.get('first', None)
clearedAt = payload.get('cleared')
last = payload.get('last')
lst = [first, clearedAt, last]
maximum = max((x for x in lst if x is not None))
y['payload']['timestamp']= maximum
write_file.write(simplejson.dumps(y) + "\n")

TA貢獻1906條經(jīng)驗 獲得超10個贊
mmap模塊允許您將內(nèi)存“固定”到文件中。這使您無法閱讀整個內(nèi)容。
import mmap
import json
from collections import OrderedDict
with open("test.json", "r+b") as f:
# memory-map the file, size 0 means whole file
mm = mmap.mmap(f.fileno(), 0)
# read content via standard file methods
json_dict = json.load(f, object_pairs_hook=OrderedDict)
print(json_dict)
# close the map
mm.close()
這個 stackoverflow,關(guān)于一次讀取大塊 json 數(shù)據(jù),可能是另一種嘗試。
添加回答
舉報