首頁猿問 MultiIndex...

MultiIndex DataFrame：如何根據(jù)其他列中的值創(chuàng)建新列？

Python

一只萌萌小番薯 2022-04-27 13:08:56

我有一個不平衡的 Pandas MultiIndex DataFrame，其中每一行都存儲一個firm-year觀察值。樣本期（變量year）范圍從 2013 年到 2017 年。數(shù)據(jù)集包括變量，如果事件發(fā)生在給定中event，則設(shè)置為變量。1year樣本數(shù)據(jù)集：#Create datasetimport pandas as pddf = pd.DataFrame({'id' : [1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,5], 'year' : [2013,2014,2015,2016,2017,2014,2015,2016,2017, 2016,2017,2013,2014,2015,2014,2015,2016,2017], 'event' : [1,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1]})df.set_index(['id', 'year'], inplace = True)df.sort_index(inplace = True)我想status根據(jù)現(xiàn)有列創(chuàng)建一個新列event，如下所示：每當(dāng)事件在列中第一次發(fā)生時，列event的值status應(yīng)該從0到更改1為所有后續(xù)年份（包括事件發(fā)生的年份）。具有預(yù)期變量的 DataFrame status： event status id year1 2013 1 1 2014 0 1 2015 0 1 2016 0 1 2017 0 12 2014 0 0 2015 0 0 2016 1 1 2017 0 13 2016 1 1 2017 0 14 2013 0 0 2014 1 1 2015 0 15 2014 0 0 2015 0 0 2016 0 0 2017 1 1到目前為止，我還沒有找到任何有用的解決方案，所以任何建議都將不勝感激。謝謝！

查看完整描述

3 回答

長風(fēng)秋雁

TA貢獻1757條經(jīng)驗獲得超7個贊

我們可以groupby在您的索引（id）的第一級，然后標記所有的行eq。然后使用cumsumwhich 也轉(zhuǎn)換True為1and Falseto 0：

df['status'] = df.groupby(level=0).apply(lambda x: x.eq(1).cumsum())

輸出

event status

id year

1 2013 1 1

2014 0 1

2015 0 1

2016 0 1

2017 0 1

2 2014 0 0

2015 0 0

2016 1 1

2017 0 1

3 2016 1 1

2017 0 1

4 2013 0 0

2014 1 1

2015 0 1

5 2014 0 0

2015 0 0

2016 0 0

2017 1 1

反對回復(fù) 2022-04-27

翻閱古今

TA貢獻1780條經(jīng)驗獲得超5個贊

關(guān)鍵是使用cumsum下groupby

df = pd.DataFrame({'id' : [1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,5],

'year' : [2013,2014,2015,2016,2017,2014,2015,2016,2017,

2016,2017,2013,2014,2015,2014,2015,2016,2017],

'event' : [1,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1]})

(df.assign(status = lambda x: x.event.eq(1).mul(1).groupby(x['id']).cumsum())

.set_index(['id','year']))

輸出

event status

id year

1 2013 1 1

2014 0 1

2015 0 1

2016 0 1

2017 0 1

2 2014 0 0

2015 0 0

2016 1 1

2017 0 1

3 2016 1 1

2017 0 1

4 2013 0 0

2014 1 1

2015 0 1

5 2014 0 0

2015 0 0

2016 0 0

2017 1 1

反對回復(fù) 2022-04-27

呼喚遠方

TA貢獻1856條經(jīng)驗獲得超11個贊

帶有段落解釋的基本答案：

import pandas as pd

df = pd.DataFrame({'id' : [1,1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,5],

'year' : [2013,2014,2015,2016,2017,2014,2015,2016,2017,

2016,2017,2013,2014,2015,2014,2015,2016,2017],

'event' : [1,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1]})

# extract unique IDs as list

ids = list(set(df["id"]))

# initialize a list to keep the results

list_event_years =[]

#open a loop on IDs

for id in ids :

# set happened to 0

event_happened = 0

# open a loop on DF pertaining to the actual ID

for index, row in df[df["id"] == id].iterrows() :

# if event happened set the variable to 1

if row["event"] == 1 :

event_happened = 1

# add the var to the list of results

list_event_years.append(event_happened)

# add the list of results as DF column

df["event-happened"] = list_event_years

### OUTPUT

>>> df

id year event event-year

0 1 2013 1 1

1 1 2014 0 1

2 1 2015 0 1

3 1 2016 0 1

4 1 2017 0 1

5 2 2014 0 0

6 2 2015 0 0

7 2 2016 1 1

8 2 2017 0 1

9 3 2016 1 1

10 3 2017 0 1

11 4 2013 0 0

12 4 2014 1 1

13 4 2015 0 1

14 5 2014 0 0

15 5 2015 0 0

16 5 2016 0 0

17 5 2017 1 1

如果您需要像示例中那樣對它們進行索引，請執(zhí)行以下操作：

df.set_index(['id', 'year'], inplace = True)

df.sort_index(inplace = True)

### OUTPUT

>>> df

event event-year

id year

1 2013 1 1

2014 0 1

2015 0 1

2016 0 1

2017 0 1

2 2014 0 0

2015 0 0

2016 1 1

2017 0 1

3 2016 1 1

2017 0 1

4 2013 0 0

2014 1 1

2015 0 1

5 2014 0 0

2015 0 0

2016 0 0

2017 1 1

反對回復(fù) 2022-04-27

3 回答
0 關(guān)注
151 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

MultiIndex DataFrame：如何根據(jù)其他列中的值創(chuàng)建新列？

MultiIndex DataFrame：如何根據(jù)其他列中的值創(chuàng)建新列？

3 回答

添加回答

MultiIndex DataFrame：如何根據(jù)其他列中的值創(chuàng)建新列？