首頁猿問 Dask 相當(dāng)于...

Dask 相當(dāng)于 pandas.DataFrame.update

Python

蕪湖不蕪 2023-03-08 15:45:24

我有一些使用pandas.DataFrame.update方法的函數(shù)，我正嘗試轉(zhuǎn)而使用Dask數(shù)據(jù)集，但 Dask Pandas API 沒有update實現(xiàn)該方法。是否有其他方法可以在中獲得相同的結(jié)果Dask？以下是我使用的方法update：前向用最后已知值填充數(shù)據(jù)df.update(df.filter(like='/').mask(lambda x: x == 0).ffill(1))輸入id .. .. ..(some cols) 1/1/20 1/2/20 1/3/20 1/4/20 1/5/20 1/6/20 ....1 10 20 0 40 0 502 10 30 30 0 0 50..輸出id .. .. ..(some cols) 1/1/20 1/2/20 1/3/20 1/4/20 1/5/20 1/6/20 ....1 10 20 20 40 40 502 10 30 30 30 30 50..根據(jù) id/index 列將數(shù)據(jù)框中的值替換為另一個數(shù)據(jù)框中的值def replace_names(df1, df2, idxCol = 'id', srcCol = 'name', dstCol = 'name'): df1 = df1.set_index(idxCol) df1[dstCol].update(df2.set_index(idxCol)[srcCol]) return df1.reset_index()df_new = replace_names(df1, df2)輸入df1id name ...123 city a456 city b789 city c789 city c456 city b123 city a...df2id name ...123 City A456 City B789 City C...輸出id name ...123 City A456 City B789 City C789 City C456 City B123 City A...

查看完整描述

1 回答

墨色風(fēng)雨

TA貢獻1853條經(jīng)驗獲得超6個贊

問題2

有一種方法可以部分解決這個問題。我假設(shè)它df2比它小得多df1并且它實際上適合內(nèi)存所以我們可以讀取作為 pandas 數(shù)據(jù)幀。df1如果是這種情況，如果是一個pandas或一個數(shù)據(jù)幀，則以下函數(shù)可以工作dask，但df2應(yīng)該是pandas一個。

import pandas as pd

import dask.dataframe as dd

def replace_names(df1, # can be pandas or dask dataframe

df2, # this should be pandas.

idxCol='id',

srcCol='name',

dstCol='name'):

diz = df2[[idxCol, srcCol]].set_index(idxCol).to_dict()[srcCol]

out = df1.copy()

out[dstCol] = out[idxCol].map(diz)

return out

問題一

關(guān)于第一個問題，以下代碼適用于pandas和dask

df = pd.DataFrame({'a': {0: 1, 1: 2},

'b': {0: 3, 1: 4},

'1/1/20': {0: 10, 1: 10},

'1/2/20': {0: 20, 1: 30},

'1/3/20': {0: 0, 1: 30},

'1/4/20': {0: 40, 1: 0},

'1/5/20': {0: 0, 1: 0},

'1/6/20': {0: 50, 1: 50}})

# if you want to try with dask

# df = dd.from_pandas(df, npartitions=2)

cols = [col for col in df.columns if "/" in col]

df[cols] = df[cols].mask(lambda x: x==0).ffill(1) #.astype(int)

如果您希望輸出為整數(shù)，請刪除最后一行中的注釋。

更新問題 2 如果您想要一個dask唯一的解決方案，您可以嘗試以下方法。

數(shù)據(jù)

import numpy as np

import pandas as pd

import dask.dataframe as dd

df1 = pd.DataFrame({'id': {0: 123, 1: 456, 2: 789, 3: 789, 4: 456, 5: 123},

'name': {0: 'city a',

1: 'city b',

2: 'city c',

3: 'city c',

4: 'city b',

5: 'city a'}})

df2 = pd.DataFrame({'id': {0: 123, 1: 456, 2: 789},

'name': {0: 'City A', 1: 'City B', 2: 'City C'}})

df1 = dd.from_pandas(df1, npartitions=2)

df2 = dd.from_pandas(df2, npartitions=2)

情況1

在這種情況下，如果一個id存在于df1但不存在于中，df2則將名稱保留在df1.

def replace_names_dask(df1, df2,

idxCol='id',

srcCol='name',

dstCol='name'):

if srcCol == dstCol:

df2 = df2.rename(columns={srcCol:f"{srcCol}_new"})

srcCol = f"{srcCol}_new"

def map_replace(x, srcCol, dstCol):

x[dstCol] = np.where(x[srcCol].notnull(),

x[srcCol],

x[dstCol])

return x

df = dd.merge(df1, df2, on=idxCol, how="left")

df = df.map_partitions(lambda x: map_replace(x, srcCol, dstCol))

df = df.drop(srcCol, axis=1)

return df

df = replace_names_dask(df1, df2)

案例二

在這種情況下，如果一個id存在于df1但不存在于df2則name輸出df將是NaN（如在標準左連接中）

def replace_names_dask(df1, df2,

idxCol='id',

srcCol='name',

dstCol='name'):

df1 = df1.drop(dstCol, axis=1)

df2 = df2.rename(columns={srcCol: dstCol})

df = dd.merge(df1, df2, on=idxCol, how="left")

return df

df = replace_names_dask(df1, df2)

反對回復(fù) 2023-03-08

1 回答
0 關(guān)注
174 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

Dask 相當(dāng)于 pandas.DataFrame.update

Dask 相當(dāng)于 pandas.DataFrame.update

1 回答

添加回答