1 回答

TA貢獻(xiàn)1864條經(jīng)驗(yàn) 獲得超6個(gè)贊
使用 F.expr() 可以進(jìn)行類(lèi)連接。在您的情況下,您需要將它與內(nèi)部聯(lián)接一起使用。嘗試這個(gè),
#%%
import pyspark.sql.functions as F
test1 =sqlContext.createDataFrame([("Mike","apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)" ),("kate","Whitewine,greenbeans,pineapple"),("Ben","Water,Spaghetti")],schema=["name","groceries"])
test2 = sqlContext.createDataFrame([("001","redwine"),("002","greenbeans"),("003","cd")],schema=["id","item"])
#%%
test_join =test1.join(test2,F.expr("""groceries rlike item"""),how='inner')
結(jié)果:
test_join.show(truncate=False)
+----+-------------------------------------------------------------------------------------------------+---+----------+
|name|groceries |id |item |
+----+-------------------------------------------------------------------------------------------------+---+----------+
|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|001|redwine |
|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|002|greenbeans|
|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|003|cd |
|kate|Whitewine,greenbeans,pineapple |002|greenbeans|
+----+-------------------------------------------------------------------------------------------------+---+----------+
對(duì)于您的復(fù)雜數(shù)據(jù)集,contains() 函數(shù)必須有效
import pyspark.sql.functions as F
test1 = spark.createDataFrame([("Mike","apple, oranges, red wine,green beans"),("Kate","Whitewine, green beans waterrr, pineapple, red wine"), ("Leah", "red wine, juice, rice, grapes, green beans"),("Ben","Water,Spaghetti, the little prince 70th anniversary gift set (book/cd/downloadable audio)")],schema=["name","groceries"])
test2 = spark.createDataFrame([("001","red wine"),("002","green beans waterrr"), ("003", "the little prince 70th anniversary gift set (book/cd/downloadable audio)")],schema=["id","item"])
#%%
test_join =test1.join(test2,F.col('groceries').contains(F.col('item')),how='inner')
結(jié)果:
+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+
|name|groceries |id |item |
+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+
|Mike|apple, oranges, red wine,green beans |001|red wine |
|Kate|Whitewine, green beans waterrr, pineapple, red wine |001|red wine |
|Kate|Whitewine, green beans waterrr, pineapple, red wine |002|green beans waterrr |
|Leah|red wine, juice, rice, grapes, green beans |001|red wine |
|Ben |Water,Spaghetti, the little prince 70th anniversary gift set (book/cd/downloadable audio)|003|the little prince 70th anniversary gift set (book/cd/downloadable audio)|
+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+
添加回答
舉報(bào)