1 回答

TA貢獻(xiàn)1827條經(jīng)驗(yàn) 獲得超4個(gè)贊
你有基本的想法。說(shuō)“保存到內(nèi)存”時(shí)要小心。NumPy 數(shù)組保存在內(nèi)存 (RAM) 中。HDF5 數(shù)據(jù)保存在磁盤(pán)上(而不是內(nèi)存/RAM?。缓笤L問(wèn)(使用的內(nèi)存取決于您的訪問(wèn)方式)。在第一步中,您將創(chuàng)建數(shù)據(jù)塊并將其寫(xiě)入磁盤(pán)。在第二步中,您將分塊訪問(wèn)磁盤(pán)中的數(shù)據(jù)。最后提供的工作示例。
使用h5py2 種讀取數(shù)據(jù)的方式讀取數(shù)據(jù)時(shí):
返回 NumPy 數(shù)組:
myArrayNP = myArray[:,:,:]
返回 h5py 數(shù)據(jù)集對(duì)象,其操作類(lèi)似于 NumPy 數(shù)組:
myArrayDS = myArray
區(qū)別:h5py 數(shù)據(jù)集對(duì)象不會(huì)一次全部讀入內(nèi)存。然后,您可以根據(jù)需要對(duì)它們進(jìn)行切片。從上面繼續(xù),這是獲取數(shù)據(jù)子集的有效操作:
myArrayChunkNP = myArrayDS[i*chunkSize):(i+1)*chunkSize),:,:]
我的示例還糾正了塊大小增量方程中的 1 個(gè)小錯(cuò)誤。你有:
myArray[(i*chunkSize):(i*(chunkSize+1)),:,:] = myArrayChunk
你想要:
myArray[(i*chunkSize):(i+1)*chunkSize),:,:] = myArrayChunk
工作示例(寫(xiě)入和讀取):
import h5py
import numpy as np
# Make the file
with h5py.File("SO_61173314.h5", "w") as h5w:
numberOfChunks = 3
chunkSize = 4
print( 'WRITING %d chunks with w/ chunkSize=%d ' % (numberOfChunks,chunkSize) )
# Write dataset to disk
h5Array = h5w.create_dataset("myArray", (numberOfChunks*chunkSize,2,2), compression="gzip")
for i in range(numberOfChunks):
h5ArrayChunk = np.random.random(chunkSize*2*2).reshape(chunkSize,2,2)
print (h5ArrayChunk)
h5Array[(i*chunkSize):((i+1)*chunkSize),:,:] = h5ArrayChunk
with h5py.File("SO_61173314.h5", "r") as h5r:
print( '/nREADING %d chunks with w/ chunkSize=%d/n' % (numberOfChunks,chunkSize) )
# Access myArray dataset - Note: This is NOT a NumpPy array
myArray = h5r['myArray']
for i in range(numberOfChunks):
# Read a chunk into memory (as a NumPy array)
myArrayChunk = myArray[(i*chunkSize):((i+1)*chunkSize),:,:]
# ... Do some calculation on myArrayChunk
print (myArrayChunk)
添加回答
舉報(bào)