首頁(yè) 猿問(wèn) 在PySpark中的Grouped...

在PySpark中的GroupedData上應(yīng)用UDF（具有可運(yùn)行的python示例）

Python

開(kāi)心每一天1111 2019-12-12 14:10:42

我有在python數(shù)據(jù)幀中本地運(yùn)行的以下python代碼：df_result = pd.DataFrame(df .groupby('A') .apply(lambda x: myFunction(zip(x.B, x.C), x.name))我想在PySpark中運(yùn)行它，但是在處理pyspark.sql.group.GroupedData對(duì)象時(shí)遇到了麻煩。我嘗試了以下方法：sparkDF .groupby('A') .agg(myFunction(zip('B', 'C'), 'A')) 哪個(gè)返回KeyError: 'A'我猜想是因?yàn)椤?A”不再是一列，而且我找不到x.name的等效項(xiàng)。接著sparkDF .groupby('A') .map(lambda row: Row(myFunction(zip('B', 'C'), 'A'))) .toDF()但出現(xiàn)以下錯(cuò)誤：AttributeError: 'GroupedData' object has no attribute 'map'任何建議將不勝感激！

查看完整描述

3 回答

慕森王

TA貢獻(xiàn)1777條經(jīng)驗(yàn) 獲得超3個(gè)贊

我將超越答案。

因此，您可以使用@pandas_udf在pyspark中實(shí)現(xiàn)類(lèi)似pandas.groupby（）。apply的邏輯，這是矢量化方法，并且比簡(jiǎn)單的udf更快。

from pyspark.sql.functions import pandas_udf,PandasUDFType

df3 = spark.createDataFrame(

[("a", 1, 0), ("a", -1, 42), ("b", 3, -1), ("b", 10, -2)],

("key", "value1", "value2")

)

from pyspark.sql.types import *

schema = StructType([

StructField("key", StringType()),

StructField("avg_value1", DoubleType()),

StructField("avg_value2", DoubleType()),

StructField("sum_avg", DoubleType()),

StructField("sub_avg", DoubleType())

])

@pandas_udf(schema, functionType=PandasUDFType.GROUPED_MAP)

def g(df):

gr = df['key'].iloc[0]

x = df.value1.mean()

y = df.value2.mean()

w = df.value1.mean() + df.value2.mean()

z = df.value1.mean() - df.value2.mean()

return pd.DataFrame([[gr]+[x]+[y]+[w]+[z]])

df3.groupby("key").apply(g).show()

您將獲得以下結(jié)果：

+---+----------+----------+-------+-------+

+---+----------+----------+-------+-------+

| b| 6.5| -1.5| 5.0| 8.0|

| a| 0.0| 21.0| 21.0| -21.0|

+---+----------+----------+-------+-------+

因此，您可以在分組數(shù)據(jù)中的其他字段之間進(jìn)行更多計(jì)算，并將它們以列表格式添加到數(shù)據(jù)框中。

反對(duì) 回復(fù) 2019-12-13

3 回答
0 關(guān)注
529 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書(shū)簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢(xún)優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

在PySpark中的GroupedData上應(yīng)用UDF（具有可運(yùn)行的python示例）

在PySpark中的GroupedData上應(yīng)用UDF（具有可運(yùn)行的python示例）

3 回答

添加回答