我在釋放Python中的內(nèi)存時(shí)遇到問(wèn)題。情況基本上是這樣的:我有一個(gè)大數(shù)據(jù)集,分為4個(gè)文件。每個(gè)文件包含5000個(gè)numpy形狀數(shù)組(3072,412)的列表。我試圖將每個(gè)數(shù)組的第10至20列提取到一個(gè)新列表中。我想要做的是依次讀取每個(gè)文件,提取需要的數(shù)據(jù),并釋放我正在使用的內(nèi)存,然后再繼續(xù)下一個(gè)文件。但是,刪除對(duì)象,將其設(shè)置為None并將其設(shè)置為0,然后再調(diào)用gc.collect()似乎無(wú)效。這是我正在使用的代碼片段:num_files=4start=10end=20 fields = []for j in range(num_files): print("Working on file ", j) source_filename = base_filename + str(j) + ".pkl" print("Memory before: ", psutil.virtual_memory()) partial_db = joblib.load(source_filename) print("GC tracking for partial_db is ",gc.is_tracked(partial_db)) print("Memory after loading partial_db:",psutil.virtual_memory()) for x in partial_db: fields.append(x[:,start:end]) print("Memory after appending to fields: ",psutil.virtual_memory()) print("GC Counts before del: ", gc.get_count()) partial_db = None print("GC Counts after del: ", gc.get_count()) gc.collect() print("GC Counts after collection: ", gc.get_count()) print("Memory after freeing partial_db: ", psutil.virtual_memory())這是幾個(gè)文件后的輸出:Working on file 0Memory before: svmem(total=67509161984, available=66177449984,percent=2.0, used=846712832, free=33569669120, active=27423051776, inactive=5678043136, buffers=22843392, cached=33069936640, shared=15945728)GC tracking for partial_db is TrueMemory after loading partial_db: svmem(total=67509161984, available=40785944576, percent=39.6, used=26238181376, free=8014237696, active=54070542336, inactive=4540620800, buffers=22892544, cached=33233850368, shared=15945728)Memory after appending to fields: svmem(total=67509161984, available=40785944576, percent=39.6, used=26238181376, free=8014237696, active=54070542336, inactive=4540620800, buffers=22892544, cached=33233850368, shared=15945728)GC Counts before del: (0, 7, 3)GC Counts after del: (0, 7, 3)GC Counts after collection: (0, 0, 0)如果我不放手,它將耗盡所有內(nèi)存并觸發(fā)MemoryError異常。有誰(shuí)知道我該怎么做才能確保partial_db釋放使用的數(shù)據(jù)?
1 回答

慕桂英3389331
TA貢獻(xiàn)2036條經(jīng)驗(yàn) 獲得超8個(gè)贊
問(wèn)題是這樣的:
for x in partial_db:
fields.append(x[:,start:end])
切片numpy數(shù)組的原因(與普通的Python列表不同)幾乎不需要時(shí)間,也不會(huì)浪費(fèi)空間,原因是它不會(huì)創(chuàng)建副本,而只是在數(shù)組內(nèi)存中創(chuàng)建另一個(gè)視圖。通常,那很棒。但是在這里,這意味著x即使在釋放內(nèi)存之后,您仍要保留內(nèi)存x,因?yàn)槟肋h(yuǎn)不會(huì)釋放這些內(nèi)存。
還可以采用其他方法,但是最簡(jiǎn)單的方法是僅附加切片的副本:
for x in partial_db:
fields.append(x[:,start:end].copy())
添加回答
舉報(bào)
0/150
提交
取消