我嘗試使用帶有此 pyspark 代碼的 pyspark 讀取 csv 文件:tr_df = spark.read.csv("/data/file.csv", header=True, inferSchema=True )tr_df.head(5)但我得到這個錯誤:ValueError Traceback (most recent call last)<ipython-input-53-03432bbf269d> in <module>----> 1 tr_df.head(5)~/anaconda3/envs/naboo-env/lib/python3.6/site-packages/pyspark/sql/dataframe.py在 head(self, n) 1250 rs = self.head(1) 1251 return rs[0] if rs else None -> 1252 return self.take(n) 1253 1254 @ignore_unicode_prefix~/anaconda3/envs/naboo-env/lib/python3.6/site-packages/pyspark/sql/dataframe.pyin take(self, num) 569 [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')] 570 """ --> 571 return self.limit(數(shù)).collect() 572 573 @since(1.3)~/anaconda3/envs/naboo-env/lib/python3.6/site-packages/pyspark/sql/dataframe.py在 collect(self) 532 with SCCallSiteSync(self._sc) as css: 533 sock_info = self._jdf.collectToPython() --> 534 return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer()))) 535 536 @ignore_unicode_prefix~/anaconda3/envs/naboo-env/lib/python3.6/site-packages/pyspark/serializers.py在 load_stream(self, stream) 145 while True: 146 try: --> 147 yield self._read_with_length(stream) 148 except EOFError: 149 return~/anaconda3/envs/naboo-env/lib/python3.6/site-packages/pyspark/serializers.py在 _read_with_length(self, stream) 170 if len(obj) < length: 171 raise EOFError --> 172 return self.loads(obj) 173 174 def dumps(self, obj):~/anaconda3/envs/naboo-env/lib/python3.6/site-packages/pyspark/serializers.py在加載(自我,obj,編碼)578 如果 sys.version >= '3':579 def 加載(自我,obj,編碼 =“字節(jié)”):-> 580 返回pickle.loads(obj,編碼=編碼) 581 else: 582 def 加載(self, obj, encoding=None):誰能幫我解決這個問題?
1 回答

慕尼黑的夜晚無繁華
TA貢獻(xiàn)1864條經(jīng)驗 獲得超6個贊
似乎您的一列中的數(shù)據(jù)類型存在問題。因此它的拋出錯誤。閱讀時刪除 inferSchema =True 選項。讀取數(shù)據(jù)后,嘗試分析數(shù)據(jù)類型并根據(jù)需要進(jìn)行任何更正,然后應(yīng)用您自己的模式。
添加回答
舉報
0/150
提交
取消