在 jupyter 中使用以下命令下載數(shù)據(jù)。 !7z x stackoverflow.com-Posts.7z -oposts# load xml file into spark data frame.posts = spark.read.format("xml").option("rowTag", "row").load("./posts/Posts.xml")出現(xiàn)以下錯誤:Py4JJavaError: An error occurred while calling o532.load.: java.lang.ClassNotFoundException: Failed to find data source: xml. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
1 回答

絕地無雙
TA貢獻1946條經(jīng)驗 獲得超4個贊
您需要將 jar 傳遞給 sparkContext
pyspark?--jars?/home/Downloads/spark_jars/spark-xml_2.11-0.9.0.jar df?=?spark.read.format("com.databricks.spark.xml").option("rowTag",?"row").load("./posts/Posts.xml")
添加回答
舉報
0/150
提交
取消