我使用的是舊的pyspark腳本。我正在嘗試將數(shù)據(jù)幀df轉(zhuǎn)換為rdd。#Importing the required librariesimport pandas as pdfrom pyspark.sql.types import *from pyspark.ml.regression import RandomForestRegressorfrom pyspark.mllib.util import MLUtilsfrom pyspark.ml import Pipelinefrom pyspark.ml.tuning import CrossValidator, ParamGridBuilderfrom pyspark.ml.evaluation import RegressionEvaluatorfrom pyspark.ml.linalg import Vectorsfrom pyspark.ml import Pipelinefrom pyspark.ml.tuning import CrossValidator, ParamGridBuilderfrom pyspark.mllib.fpm import *from pyspark.sql import SparkSessionspark = SparkSession .builder .appName("Python Spark") .config("spark.some.config.option", "some-value")# read the datadf = pd.read_json("events.json")df = (df.rdd.map(lambda x: (x[1],[x[0]])).reduceByKey(lambda x,y: x+y).sortBy(lambda k_v: (k_v[0], sorted(k_v[1], key=lambda x: x[1], reverse=True))).collect()) 繼承人錯誤輸出: AttributeError:'DataFrame'對象沒有屬性'rdd'我想念什么?如何將數(shù)據(jù)幀轉(zhuǎn)換為rdd?我安裝了anaconda 3.6.1和spark 2.3.1
添加回答
舉報
0/150
提交
取消