首頁猿問更新 90G json...

更新 90G json 格式文件中的每個項目/行（不必使用 python）

Python

慕田峪7331174 2022-10-11 21:13:21

我有一個由 json 項組成的 90G 文件。下面是僅包含 3 行的示例：{"description":"id1","payload":{"cleared":"2020-01-31T10:23:54Z","first":"2020-01-31T01:29:23Z","timestamp":"2020-01-31T09:50:47Z","last":"2020-01-31T09:50:47Z"}}{"description":"id2","payload":{"cleared":"2020-01-31T11:01:54Z","first":"2020-01-31T02:45:23Z","timestamp":"2020-01-31T09:50:47Z","last":"2020-01-31T09:50:47Z"}}{"description":"id3","payload":{"cleared":"2020-01-31T5:33:54Z","first":"2020-01-31T01:29:23Z","timestamp":"2020-01-31T07:50:47Z","last":"2019-01-31T04:50:47Z"}}最終目標是，對于每一行，獲得的最大值first，cleared并用最大值last更新timestamp。然后按時間戳對所有項目進行排序。暫時忽略排序。我最初將文件 jsonified 為 json 文件并使用以下代碼：#!/usr/bin/pythonimport json as simplejsonfrom collections import OrderedDictwith open("input.json", "r") as jsonFile: data = simplejson.load(jsonFile, object_pairs_hook=OrderedDict)for x in data: maximum = max(x['payload']['first'],x['payload']['cleared'],x['payload']['last']) x['payload']['timestamp']= maximumdata_sorted = sorted(data, key = lambda x: x['payload']['timestamp'])with open("output.json", "w") as write_file: simplejson.dump(data_sorted, write_file)上面的代碼適用于一個小測試文件，但是當我為 90G 文件運行它時腳本被殺死了。然后我決定使用以下代碼逐行處理它：#!/usr/bin/pythonimport sysimport json as simplejsonfrom collections import OrderedDictfirst_arg = sys.argv[1]data = []with open(first_arg, "r") as jsonFile: for line in jsonFile: y = simplejson.loads(line,object_pairs_hook=OrderedDict) payload = y['payload'] first = payload.get('first', None) clearedAt = payload.get('cleared') last = payload.get('last') lst = [first, clearedAt, last] maximum = max((x for x in lst if x is not None)) y['payload']['timestamp']= maximum data.append(y)with open("jl2json_new.json", "w") as write_file: simplejson.dump(data, write_file, indent=4)還是被打死了。所以我想知道解決這個問題的最佳方法是什么？我嘗試了以下方法，但沒有幫助： https ://stackoverflow.com/a/21709058/322541

查看完整描述

2 回答

慕斯王

TA貢獻1864條經(jīng)驗獲得超2個贊

您必須對每一行進行所有處理 - 您將一行解析為y變量，對其進行處理，而不是將其寫入輸出文件，而是將其存儲在data列表中。當然，您最終會得到內(nèi)存中的所有數(shù)據(jù)（未序列化，從 json 字符串到 Python 對象將占用數(shù)百 GB 的內(nèi)存）。

如果您的代碼已經(jīng)適用于小樣本，請更改它以編寫每一行：

#!/usr/bin/python

import sys

import json as simplejson

from collections import OrderedDict

first_arg = sys.argv[1]

with open(first_arg, "rt") as jsonFile, open("jl2json_new.json", "wt") as write_file:

for line in jsonFile:

y = simplejson.loads(line,object_pairs_hook=OrderedDict)

payload = y['payload']

first = payload.get('first', None)

clearedAt = payload.get('cleared')

last = payload.get('last')

lst = [first, clearedAt, last]

maximum = max((x for x in lst if x is not None))

y['payload']['timestamp']= maximum

write_file.write(simplejson.dumps(y) + "\n")

反對回復(fù) 2022-10-11

隔江千里

TA貢獻1906條經(jīng)驗獲得超10個贊

mmap模塊允許您將內(nèi)存“固定”到文件中。這使您無法閱讀整個內(nèi)容。

import mmap

import json

from collections import OrderedDict

with open("test.json", "r+b") as f:

# memory-map the file, size 0 means whole file

mm = mmap.mmap(f.fileno(), 0)

# read content via standard file methods

json_dict = json.load(f, object_pairs_hook=OrderedDict)

print(json_dict)

# close the map

mm.close()

這個 stackoverflow，關(guān)于一次讀取大塊 json 數(shù)據(jù)，可能是另一種嘗試。

反對回復(fù) 2022-10-11

2 回答
0 關(guān)注
129 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

更新 90G json 格式文件中的每個項目/行（不必使用 python）

更新 90G json 格式文件中的每個項目/行（不必使用 python）

2 回答

添加回答