使用熊貓讀取和處理數(shù)據(jù)是很普遍的,但存在一些內(nèi)存問(wèn)題。我可以讀取一個(gè)大文件:import pandas as pddf = pd.read_csv('mydata.csv.gz', sep=';')但是,在使用Dask進(jìn)行相同操作時(shí),出現(xiàn)錯(cuò)誤:import dask.dataframe as dddf_base = dd.read_csv('CoilsSampleFiltered.csv.gz', sep=';')追溯:---------------------------------------------------------------------------UnicodeDecodeError Traceback (most recent call last)<ipython-input-7-abc513f2a657> in <module>()----> 1 df_base = dd.read_csv('CoilsSampleFiltered.csv.gz', sep=';')~\AppData\Local\Continuum\Anaconda3\lib\site-packages\dask\dataframe\io\csv.py in read(urlpath, blocksize, collection, lineterminator, compression, sample, enforce, assume_missing, storage_options, **kwargs) 424 enforce=enforce, assume_missing=assume_missing, 425 storage_options=storage_options,--> 426 **kwargs) 427 read.__doc__ = READ_DOC_TEMPLATE.format(reader=reader_name, 428 file_type=file_type)~\AppData\Local\Continuum\Anaconda3\lib\site-packages\dask\dataframe\io\csv.py in read_pandas(reader, urlpath, blocksize, collection, lineterminator, compression, sample, enforce, assume_missing, storage_options, **kwargs) 324 325 # Use sample to infer dtypes--> 326 head = reader(BytesIO(b_sample), **kwargs) 327 328 specified_dtypes = kwargs.get('dtype', {})我正在嘗試找出問(wèn)題所在。該文件由R編寫(xiě),R默認(rèn)情況下使用utf-8。
Dask無(wú)法讀取文件,而Pandas無(wú)法讀取文件
慕無(wú)忌1623718
2021-05-06 18:53:50