首頁猿問對(duì)組對(duì)象應(yīng)用VS轉(zhuǎn)換

對(duì)組對(duì)象應(yīng)用VS轉(zhuǎn)換

Python

冉冉說 2019-07-16 15:40:59

對(duì)組對(duì)象應(yīng)用VS轉(zhuǎn)換考慮以下數(shù)據(jù)： A B C D0 foo one 0.162003 0.0874691 bar one -1.156319 -1.5262722 foo two 0.833892 -1.6663043 bar three -2.026673 -0.3220574 foo two 0.411452 -0.9543715 bar two 0.765878 -0.0959686 foo one -0.654890 0.6780917 foo three -1.789842 -1.130922以下命令起作用：> df.groupby('A').apply(lambda x: (x['C'] - x['D']))> df.groupby('A').apply(lambda x: (x['C'] - x['D']).mean())但下列任何一項(xiàng)工作都沒有：> df.groupby('A').transform(lambda x: (x['C'] - x['D']))ValueError: could not broadcast input array from shape (5) into shape (5,3)> df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean()) TypeError: cannot concatenate a non-NDFrame object為什么？關(guān)于文檔的示例似乎意味著transform在組中允許進(jìn)行逐行操作處理：# Note that the following suggests row-wise operation (x.mean is the column mean)zscore = lambda x: (x - x.mean()) / x.std()transformed = ts.groupby(key).transform(zscore)換句話說，我認(rèn)為轉(zhuǎn)換本質(zhì)上是一種特定類型的應(yīng)用(不聚合)。我哪里錯(cuò)了？以下是上述原始數(shù)據(jù)的構(gòu)造，供參考：df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : randn(8), 'D' : randn(8)})

查看完整描述

3 回答

慕村225694

TA貢獻(xiàn)1880條經(jīng)驗(yàn) 獲得超4個(gè)贊

我同樣感到困惑.transform手術(shù)與手術(shù).apply我找到了一些關(guān)于這個(gè)問題的答案。這個(gè)答案例如，非常有用。

到目前為止我的外賣是.transform將工作(或處理)Series(欄)與世隔絕..這意味著在你最后兩個(gè)電話里：

df.groupby('A').transform(lambda x: (x['C'] - x['D']))

df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())

你問.transform從兩列中獲取值，而“it”實(shí)際上并不同時(shí)“查看”這兩個(gè)列(可以這么說)。transform將逐一查看dataframe列，并返回由重復(fù)的標(biāo)量組成的序列(或序列組)。len(input_column)時(shí)代。

所以這個(gè)標(biāo)量，應(yīng)該被.transform使Series是對(duì)輸入應(yīng)用某種約簡(jiǎn)函數(shù)的結(jié)果。Series(一次只能在一個(gè)系列/列上)。

請(qǐng)考慮這個(gè)示例(在您的dataframe上)：

zscore = lambda x: (x - x.mean()) / x.std() # Note that it does not reference anything outside of 'x' and for transform 'x' is one column.

df.groupby('A').transform(zscore)

將產(chǎn)生：

C D

0 0.989 0.128

1 -0.478 0.489

2 0.889 -0.589

3 -0.671 -1.150

4 0.034 -0.285

5 1.149 0.662

6 -1.404 -0.907

7 -0.509 1.653

這與每次只在一列上使用它完全相同：

df.groupby('A')['C'].transform(zscore)

屈服：

0 0.989

1 -0.478

2 0.889

3 -0.671

4 0.034

5 1.149

6 -1.404

7 -0.509

請(qǐng)注意.apply在最后一個(gè)例子中(df.groupby('A')['C'].apply(zscore))將以完全相同的方式工作，但如果您嘗試在dataframe上使用它，則會(huì)失?。?/p>

df.groupby('A').apply(zscore)

給出錯(cuò)誤：

ValueError: operands could not be broadcast together with shapes (6,) (2,)

所以還有別的地方.transform有用嗎？最簡(jiǎn)單的情況是嘗試將約簡(jiǎn)函數(shù)的結(jié)果分配回原始數(shù)據(jù)。

df['sum_C'] = df.groupby('A')['C'].transform(sum)

df.sort('A') # to clearly see the scalar ('sum') applies to the whole column of the group

屈服：

A B C D sum_C

1 bar one 1.998 0.593 3.973

3 bar three 1.287 -0.639 3.973

5 bar two 0.687 -1.027 3.973

4 foo two 0.205 1.274 4.373

2 foo two 0.128 0.924 4.373

6 foo one 2.113 -0.516 4.373

7 foo three 0.657 -1.179 4.373

0 foo one 1.270 0.201 4.373

用同樣的方法.apply會(huì)給NaNs在……里面sum_C..因?yàn)?apply會(huì)退貨Series，它不知道如何廣播：

df.groupby('A')['C'].apply(sum)

給予：

A

bar 3.973

foo 4.373

在某些情況下.transform用于篩選數(shù)據(jù)：

df[df.groupby(['B'])['D'].transform(sum) < -1]

A B C D

3 bar three 1.287 -0.639

7 foo three 0.657 -1.179

我希望這能增加一點(diǎn)清晰度。

反對(duì) 回復(fù) 2019-07-16

吃雞游戲

TA貢獻(xiàn)1829條經(jīng)驗(yàn) 獲得超7個(gè)贊

兩大區(qū)別apply和transform

之間有兩個(gè)主要的區(qū)別。transform和apply群方法

apply隱式地將每個(gè)組的所有列作為DataFrame到自定義函數(shù)，同時(shí)transform將每個(gè)組的每一列作為系列到自定義函數(shù)

傳遞給apply可以返回標(biāo)量、系列或DataFrame(或numpy數(shù)組甚至列表)。傳遞給transform必須返回與組相同長度的序列(一維序列、數(shù)組或列表)。

所以,transform一次只做一個(gè)系列的作品apply同時(shí)處理整個(gè)DataFrame。

檢查自定義函數(shù)

檢查傳遞給您的自定義函數(shù)的輸入會(huì)有很大幫助。apply或transform.

實(shí)例

讓我們創(chuàng)建一些示例數(shù)據(jù)并檢查組，這樣您就可以看到我在說什么：

df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'],

'a':[4,5,1,3], 'b':[6,10,3,11]})

df

讓我們創(chuàng)建一個(gè)簡(jiǎn)單的自定義函數(shù)，它輸出隱式傳遞對(duì)象的類型，然后引發(fā)一個(gè)錯(cuò)誤，以便可以停止執(zhí)行。

def inspect(x):

print(type(x))

raise

現(xiàn)在讓我們把這個(gè)函數(shù)傳遞給groupbyapply和transform方法來查看傳遞給它的對(duì)象：

df.groupby('State').apply(inspect)

RuntimeError

如您所見，DataFrame被傳遞到inspect功能。您可能想知道為什么類型DataFrame被打印了兩次。第一組熊貓跑兩次。它這樣做是為了確定是否有一種快速的方法來完成計(jì)算。這是一個(gè)你不應(yīng)該擔(dān)心的小細(xì)節(jié)。

現(xiàn)在，讓我們做同樣的事情transform

df.groupby('State').transform(inspect)

RuntimeError

它被傳遞了一個(gè)系列-一個(gè)完全不同的熊貓對(duì)象。

所以,transform一次只能使用一個(gè)系列。它不可能同時(shí)對(duì)兩列采取行動(dòng)。所以，如果我們嘗試減去列a從…b在我們的自定義函數(shù)中，我們將得到一個(gè)錯(cuò)誤transform..見下文：

def subtract_two(x):

return x['a'] - x['b']

df.groupby('State').transform(subtract_two)

KeyError: ('a', 'occurred at index a')

當(dāng)熊貓?jiān)噲D找到系列索引時(shí)，我們得到了一個(gè)KeyErrora并不存在。您可以用apply因?yàn)樗鼡碛姓麄€(gè)DataFrame：

df.groupby('State').apply(subtract_two)

State

Florida 2 -2

3 -8

Texas 0 -2

1 -5

dtype: int64

輸出是一個(gè)Series，由于保留了原始索引，所以有點(diǎn)混亂，但是我們可以訪問所有列。

顯示傳遞的熊貓對(duì)象

它可以幫助更多地顯示整個(gè)熊貓對(duì)象的自定義功能，這樣你就可以準(zhǔn)確地看到你在操作什么。你可以用print語句，我喜歡使用display函數(shù)的IPython.display模塊，以便在jupyter筆記本中以HTML格式很好地輸出DataFrame：

from IPython.display import display

def subtract_two(x):

display(x)

return x['a'] - x['b']

截圖：enter image description here

轉(zhuǎn)換必須返回與組大小相同的一維序列。

另一個(gè)區(qū)別是transform必須返回與組大小相同的單維度序列。在這個(gè)特定的實(shí)例中，每個(gè)組有兩行，因此transform必須返回兩行的序列。如果沒有，則會(huì)引發(fā)錯(cuò)誤：

def return_three(x):

return np.array([1, 2, 3])

df.groupby('State').transform(return_three)

ValueError: transform must return a scalar value for each group

錯(cuò)誤消息并不真正描述問題。必須返回與組長度相同的序列。所以，像這樣的函數(shù)會(huì)起作用：

def rand_group_len(x):

return np.random.rand(len(x))

df.groupby('State').transform(rand_group_len)

a b

0 0.962070 0.151440

1 0.440956 0.782176

2 0.642218 0.483257

3 0.056047 0.238208

返回單個(gè)標(biāo)量對(duì)象也適用于transform

如果您只從自定義函數(shù)返回一個(gè)標(biāo)量，那么transform將用于組中的每一行：

def group_sum(x):

return x.sum()

df.groupby('State').transform(group_sum)

a b

0 9 16

1 9 16

2 4 14

3 4 14

反對(duì) 回復(fù) 2019-07-16

哆啦的時(shí)光機(jī)

TA貢獻(xiàn)1779條經(jīng)驗(yàn) 獲得超6個(gè)贊

我將用一個(gè)非常簡(jiǎn)單的片段來說明兩者之間的區(qū)別：

test = pd.DataFrame({'id':[1,2,3,1,2,3,1,2,3], 'price':[1,2,3,2,3,1,3,1,2]})

grouping = test.groupby('id')['price']

DataFrame如下所示：

id price

0 1 1

1 2 2

2 3 3

3 1 2

4 2 3

5 3 1

6 1 3

7 2 1

8 3 2

本表中有3個(gè)客戶ID，每個(gè)客戶進(jìn)行了三次交易，每次支付1，2，3美元。

現(xiàn)在，我想找到每個(gè)客戶的最低付款。有兩種方法：

使用apply:

Grouping.min()

回報(bào)如下：

id

1 1

2 1

3 1

Name: price, dtype: int64

pandas.core.series.Series # return type

Int64Index([1, 2, 3], dtype='int64', name='id') #The returned Series' index

# lenght is 3

使用transform:

分組變換(MIN)

回報(bào)如下：

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

Name: price, dtype: int64

pandas.core.series.Series # return type

RangeIndex(start=0, stop=9, step=1) # The returned Series' index

# length is 9

兩個(gè)方法都返回一個(gè)Series對(duì)象，但是length第一個(gè)是3，而length第二個(gè)是9。

如果你想回答What is the minimum price paid by each customer，然后apply方法是比較適合選擇的方法。

如果你想回答What is the difference between the amount paid for each transaction vs the minimum payment，然后你想用transform，因?yàn)椋?/p>

test['minimum'] = grouping.transform(min) # ceates an extra column filled with minimum payment

test.price - test.minimum # returns the difference for each row

Apply在這里工作并不僅僅是因?yàn)樗祷匾粋€(gè)3大小的系列，但是原始df的長度是9，您不能輕松地將它集成回原始df。

反對(duì) 回復(fù) 2019-07-16

3 回答
0 關(guān)注
557 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

對(duì)組對(duì)象應(yīng)用VS轉(zhuǎn)換

對(duì)組對(duì)象應(yīng)用VS轉(zhuǎn)換

3 回答

添加回答