首頁猿問在PySpark中爆炸

在PySpark中爆炸

Python

牧羊人nacy 2019-10-21 10:18:06

我想從包含單詞列表的DataFrame轉(zhuǎn)換為每個(gè)單詞都在其自己行中的DataFrame。如何在DataFrame中的列上爆炸？這是我嘗試的一些示例，您可以在其中取消注釋每個(gè)代碼行并獲取以下注釋中列出的錯(cuò)誤。我在帶有Spark 1.6.1的Python 2.7中使用PySpark。from pyspark.sql.functions import split, explodeDF = sqlContext.createDataFrame([('cat \n\n elephant rat \n rat cat', )], ['word'])print 'Dataset:'DF.show()print '\n\n Trying to do explode: \n'DFsplit_explode = ( DF .select(split(DF['word'], ' '))# .select(explode(DF['word'])) # AnalysisException: u"cannot resolve 'explode(word)' due to data type mismatch: input to function explode should be array or map type, not StringType;"# .map(explode) # AttributeError: 'PipelinedRDD' object has no attribute 'show'# .explode() # AttributeError: 'DataFrame' object has no attribute 'explode').show()# Trying without splitprint '\n\n Only explode: \n'DFsplit_explode = ( DF .select(explode(DF['word'])) # AnalysisException: u"cannot resolve 'explode(word)' due to data type mismatch: input to function explode should be array or map type, not StringType;").show()請(qǐng)指教

查看完整描述