1 回答

TA貢獻(xiàn)1785條經(jīng)驗(yàn) 獲得超4個(gè)贊
您可以在 pySpark 中合并兩個(gè) dataframeS,如下所示:
>>> df1.show()
+---+---------+
| ID| Role|
+---+---------+
| 1| Author|
| 1| Editor|
| 2| Author|
| 2|Publisher|
| 3| Editor|
| 3|Assistant|
+---+---------+
>>> df2.show()
+---+-----------+
| ID| Name|
+---+-----------+
| 1| John Smith|
| 2| John Doe|
| 3|Bob Jim Bob|
+---+-----------+
>>> df3 = df2.join(df1,"ID")
>>> df3.show()
+---+-----------+---------+
| ID| Name| Role|
+---+-----------+---------+
| 1| John Smith| Author|
| 1| John Smith| Editor|
| 2| John Doe| Author|
| 2| John Doe|Publisher|
| 3|Bob Jim Bob| Editor|
| 3|Bob Jim Bob|Assistant|
+---+-----------+---------+
注意:我假設(shè)"ID"為外鍵,如有任何疑問(wèn),請(qǐng)發(fā)表評(píng)論。
添加回答
舉報(bào)