首頁猿問為什么在 pandas...

為什么在 pandas 中獲取索引的反向速度如此之慢？

Python

慕運維8079593 2023-03-30 16:41:31

我有一個用于存儲網(wǎng)絡(luò)數(shù)據(jù)的熊貓數(shù)據(jù)框；看起來像：from_id, to_id, countX, Y, 3Z, Y, 4Y, X, 2...我正在嘗試添加一個新列，inverse_count它獲取和從當(dāng)前行反轉(zhuǎn)的count行的值。from_idto_id我正在采取以下方法。我以為它會很快，但它比我預(yù)期的要慢得多，而且我不明白為什么。def get_inverse_val(x): # Takes the inverse of the index for a given row # When passed to apply with axis = 1, the index becomes the name try: return df.loc[(x.name[1], x.name[0]), 'count'] except KeyError: return 0df = df.set_index(['from_id', 'to_id'])df['inverse_count'] = df.apply(get_inverse_val, axis = 1)

查看完整描述

2 回答

HUX布斯

TA貢獻1876條經(jīng)驗獲得超6個贊

為什么不為此做一個簡單的合并？

df = pd.DataFrame({'from_id': ['X', 'Z', 'Y'], 'to_id': ['Y', 'Y', 'X'], 'count': [3,4,2]})

pd.merge(

? left = df,?

? right = df,?

? how = 'left',?

? left_on = ['from_id', 'to_id'],?

? right_on = ['to_id', 'from_id']

)

? from_id_x to_id_x? count_x from_id_y to_id_y? count_y

0? ? ? ? ?X? ? ? ?Y? ? ? ? 3? ? ? ? ?Y? ? ? ?X? ? ? 2.0

1? ? ? ? ?Z? ? ? ?Y? ? ? ? 4? ? ? ?NaN? ? ?NaN? ? ? NaN

2? ? ? ? ?Y? ? ? ?X? ? ? ? 2? ? ? ? ?X? ? ? ?Y? ? ? 3.0

這里我們合并 from (from, to) -> (to, from) 得到反向匹配對。一般來說，你應(yīng)該避免使用，apply()因為它很慢。（要理解為什么，意識到它不是矢量化操作。）

反對回復(fù) 2023-03-30

慕斯709654

TA貢獻1840條經(jīng)驗獲得超5個贊

您可以使用.set_indextwice 創(chuàng)建兩個具有相反索引順序的數(shù)據(jù)幀，并分配以創(chuàng)建您的 inverse_count 列。

df = (df.set_index(['from_id','to_id'])

? ? ? ? .assign(inverse_count=df.set_index(['to_id','from_id'])['count'])

? ? ? ? .reset_index())

? from_id to_id? count? inverse_count

0? ? ? ?X? ? ?Y? ? ? 3? ? ? ? ? ? 2.0

1? ? ? ?Z? ? ?Y? ? ? 4? ? ? ? ? ? NaN

2? ? ? ?Y? ? ?X? ? ? 2? ? ? ? ? ? 3.0

由于問題是關(guān)于速度的，讓我們看看在更大數(shù)據(jù)集上的性能：

設(shè)置：

import pandas as pd

import string

import itertools

df = pd.DataFrame(list(itertools.permutations(string.ascii_uppercase, 2)), columns=['from_id', 'to_id'])

df['count'] = df.index % 25 + 1

print(df)

? ? from_id to_id? count

0? ? ? ? ?A? ? ?B? ? ? 1

1? ? ? ? ?A? ? ?C? ? ? 2

2? ? ? ? ?A? ? ?D? ? ? 3

3? ? ? ? ?A? ? ?E? ? ? 4

4? ? ? ? ?A? ? ?F? ? ? 5

..? ? ? ...? ?...? ? ...

645? ? ? ?Z? ? ?U? ? ?21

646? ? ? ?Z? ? ?V? ? ?22

647? ? ? ?Z? ? ?W? ? ?23

648? ? ? ?Z? ? ?X? ? ?24

649? ? ? ?Z? ? ?Y? ? ?25

設(shè)置索引：

%timeit (df.set_index(['from_id','to_id'])

? ? ? ? ? ?.assign(inverse_count=df.set_index(['to_id','from_id'])['count'])

? ? ? ? ? ?.reset_index())

6 ms ± 24.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

合并：

%timeit pd.merge(

? ? ? ? ? left = df,

? ? ? ? ? right = df,

? ? ? ? ? how = 'left',

? ? ? ? ? left_on = ['from_id', 'to_id'],

? ? ? ? ? right_on = ['to_id', 'from_id'] )

1.73 ms ± 57.5 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

因此，看起來合并方法是更快的選擇。

反對回復(fù) 2023-03-30

2 回答
0 關(guān)注
134 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

為什么在 pandas 中獲取索引的反向速度如此之慢？

為什么在 pandas 中獲取索引的反向速度如此之慢？

2 回答

添加回答

為什么在 pandas 中獲取索引的反向速度如此之慢？