我正在嘗試廢棄一個(gè) XML 文件并從 XML 文件上的標(biāo)簽創(chuàng)建一個(gè)數(shù)據(jù)框。我使用 pyspark 處理 Databricks。XML文件:<?xml version="1.0" encoding="UTF-8"?><note> <shorttitle>shorttitle_1</shorttitle> <shorttitle>shorttitle_2</shorttitle> <shorttitle>shorttitle_3</shorttitle> <shorttitle>shorttitle_4</shorttitle></note>我的代碼似乎從頁面中刪除了 XML 并從標(biāo)簽創(chuàng)建了一個(gè)列表,但是當(dāng)我創(chuàng)建我的數(shù)據(jù)框并嘗試輸入所述列表時(shí),我只看到一個(gè)包含空值的數(shù)據(jù)框。代碼:from pyspark.sql.types import *from pyspark.sql.functions import *import requestsfrom bs4 import BeautifulSoupres = requests.get("http://files.fakeaddress.com/files01.xml")soup = BeautifulSoup(res.content,'html.parser')short_title = soup.find_all('shorttitle')[0:2]field = [StructField("Short_Title",StringType(), True)]schema = StructType(field)df = spark.createDataFrame(short_title, schema)輸出:+-----------+|Short_Title|+-----------+| null|| null|+-----------+想要的輸出:+-------------+|Short_Title |+-------------+|shorttitle_1 ||shorttitle_2 |+-------------+
添加回答
舉報(bào)
0/150
提交
取消