第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號(hào)安全,請(qǐng)及時(shí)綁定郵箱和手機(jī)立即綁定
已解決430363個(gè)問(wèn)題,去搜搜看,總會(huì)有你想問(wèn)的

pyspark 使用正則表達(dá)式搜索關(guān)鍵字,然后加入其他數(shù)據(jù)框

pyspark 使用正則表達(dá)式搜索關(guān)鍵字,然后加入其他數(shù)據(jù)框

一只名叫tom的貓 2023-02-22 13:53:43
我有兩個(gè)數(shù)據(jù)框數(shù)據(jù)幀Aname       groceries Mike       apple, orange, banana, noodle, red wineKate       white wine, green beans, extra pineapple hawaiian pizzaLeah       red wine, juice, rice, grapes, green beansBen        water, spaghetti數(shù)據(jù)幀Bid       item0001     red wine0002     green beans我逐行瀏覽 B,并使用正則表達(dá)式搜索數(shù)據(jù)框 A 的雜貨店中是否存在項(xiàng)目df = Nonefor keyword in B.select('item').rdd.flatMap(lambda x : x).collect():    if keyword == None:        continue    pattern = '(?i)^'    start = '(?=.*\\b'    end = '\\b)'    for word in re.split('\\s+', keyword):        pattern = pattern + start + word + end    pattern = pattern + '.*$'        if df == None:        df = A.filter(A['groceries'].rlike(pattern)).withColumn('item', F.lit(keyword))    else:        df = df.unionAll(A.filter(A['groceries'].rlike(pattern)).withColumn('item', F.lit(keyword)))我想要的輸出是 A 中的行,其中包含 B 中的項(xiàng)目,但也將 item 關(guān)鍵字作為新列插入name       groceries                                                     itemMike       apple, orange, banana, noodle, red wine                       red wineLeah       red wine, juice, rice, grapes, green beans                    red wineKate       white wine, green beans, extra pineapple hawaiian pizza       green beansLeah       red wine, juice, rice, grapes, green beans                    green beans實(shí)際輸出不是我想要的,我不明白這種方法有什么不對(duì)。我還想知道是否有一種方法可以使用 rlike 直接連接 A 和 B,這樣只有當(dāng) A 中的項(xiàng)目存在于 B 的雜貨店中時(shí),行才會(huì)連接。謝謝!
查看完整描述

1 回答

?
慕尼黑的夜晚無(wú)繁華

TA貢獻(xiàn)1864條經(jīng)驗(yàn) 獲得超6個(gè)贊

使用 F.expr() 可以進(jìn)行類(lèi)連接。在您的情況下,您需要將它與內(nèi)部聯(lián)接一起使用。嘗試這個(gè),


    #%%

import pyspark.sql.functions as F

test1 =sqlContext.createDataFrame([("Mike","apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)" ),("kate","Whitewine,greenbeans,pineapple"),("Ben","Water,Spaghetti")],schema=["name","groceries"])

test2 = sqlContext.createDataFrame([("001","redwine"),("002","greenbeans"),("003","cd")],schema=["id","item"])

#%%

test_join =test1.join(test2,F.expr("""groceries rlike item"""),how='inner')

結(jié)果:


 test_join.show(truncate=False)

   +----+-------------------------------------------------------------------------------------------------+---+----------+

|name|groceries                                                                                        |id |item      |

+----+-------------------------------------------------------------------------------------------------+---+----------+

|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|001|redwine   |

|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|002|greenbeans|

|Mike|apple,greenbeans,redwine,the little prince 70th anniversary gift set (book/cd/downloadable audio)|003|cd        |

|kate|Whitewine,greenbeans,pineapple                                                                   |002|greenbeans|

+----+-------------------------------------------------------------------------------------------------+---+----------+

對(duì)于您的復(fù)雜數(shù)據(jù)集,contains() 函數(shù)必須有效


import pyspark.sql.functions as F

test1 = spark.createDataFrame([("Mike","apple, oranges, red wine,green beans"),("Kate","Whitewine, green beans waterrr, pineapple, red wine"), ("Leah", "red wine, juice, rice, grapes, green beans"),("Ben","Water,Spaghetti, the little prince 70th anniversary gift set (book/cd/downloadable audio)")],schema=["name","groceries"])

test2 = spark.createDataFrame([("001","red wine"),("002","green beans waterrr"), ("003", "the little prince 70th anniversary gift set (book/cd/downloadable audio)")],schema=["id","item"])

#%%

test_join =test1.join(test2,F.col('groceries').contains(F.col('item')),how='inner')

結(jié)果:


+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+

|name|groceries                                                                                |id |item                                                                    |

+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+

|Mike|apple, oranges, red wine,green beans                                                     |001|red wine                                                                |

|Kate|Whitewine, green beans waterrr, pineapple, red wine                                      |001|red wine                                                                |

|Kate|Whitewine, green beans waterrr, pineapple, red wine                                      |002|green beans waterrr                                                     |

|Leah|red wine, juice, rice, grapes, green beans                                               |001|red wine                                                                |

|Ben |Water,Spaghetti, the little prince 70th anniversary gift set (book/cd/downloadable audio)|003|the little prince 70th anniversary gift set (book/cd/downloadable audio)|

+----+-----------------------------------------------------------------------------------------+---+------------------------------------------------------------------------+



查看完整回答
反對(duì) 回復(fù) 2023-02-22
  • 1 回答
  • 0 關(guān)注
  • 108 瀏覽
慕課專(zhuān)欄
更多

添加回答

舉報(bào)

0/150
提交
取消
微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢(xún)優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)