首頁(yè) 猿問(wèn) 僅刪除組內(nèi)的重復(fù)項(xiàng)

僅刪除組內(nèi)的重復(fù)項(xiàng)

Python

動(dòng)漫人物 2021-09-25 21:40:41

我只想從數(shù)據(jù)框中刪除特定子集中的重復(fù)項(xiàng)。在“A”列中的每個(gè)“規(guī)范”下，我想刪除重復(fù)項(xiàng)，但我想在整個(gè)數(shù)據(jù)框中保留重復(fù)項(xiàng)（第一個(gè)“規(guī)范”下可能有一些行與第二個(gè)“規(guī)范”，但在“規(guī)范”下直到下一個(gè)“規(guī)范”我想刪除重復(fù)項(xiàng)）這是數(shù)據(jù)框df A B C spec first second test text1 text2 act text12 text13 act text14 text15 test text32 text33 act text34 text35 test text85 text86 act text87 text88 test text1 text2 act text12 text13 act text14 text15 test text85 text86 act text87 text88 spec third fourth test text1 text2 act text12 text13 act text14 text15 test text85 text86 act text87 text88 test text1 text2 act text12 text13 act text14 text15 test text85 text86 act text87 text88這就是我想要的：df A B C spec first second test text1 text2 act text12 text13 act text14 text15 test text32 text33 act text34 text35 test text85 text86 act text87 text88 spec third fourth test text1 text2 act text12 text13 act text14 text15 test text85 text86 act text87 text88我可以將數(shù)據(jù)幀拆分為“小”數(shù)據(jù)幀，然后在 for 循環(huán)中為每個(gè)“小”數(shù)據(jù)幀刪除重復(fù)項(xiàng)，最后將它們連接起來(lái)，但我想知道是否還有其他解決方案。我也嘗試過(guò)并成功了：dfList = df.index[df["A"] == "spec"].tolist()dfList = np.asarray(dfList)for dfL in dfList: idx = np.where(dfList == dfL) if idx[0][0]!=(len(dfList)-1): df.loc[dfList[idx[0][0]]:dfList[idx[0][0]+1]-1] = df.loc[dfList[idx[0][0]]:dfList[idx[0][0]+1]-1].drop_duplicates() else: df.loc[dfList[idx[0][0]]:] = df.loc[dfList[idx[0][0]]:].drop_duplicates()編輯：我必須將其添加到最后：df.dropna(how='all', inplace=True)但我只是想知道是否還有其他解決方案。

查看完整描述

3 回答

侃侃無(wú)極

TA貢獻(xiàn)2051條經(jīng)驗(yàn) 獲得超10個(gè)贊

這應(yīng)該有效：

df2 = df.drop_duplicates(subset=['A', 'B','C'])

反對(duì) 回復(fù) 2021-09-25

湖上湖

TA貢獻(xiàn)2003條經(jīng)驗(yàn) 獲得超2個(gè)贊

使用groupby+ duplicated：

df[~df.groupby(df.A.eq('spec').cumsum()).apply(lambda x: x.duplicated()).values]

A B C

0 spec first second

1 test text1 text2

2 act text12 text13

3 act text14 text15

4 test text32 text33

5 act text34 text35

6 test text85 text86

7 act text87 text88

13 spec third fourth

14 test text1 text2

15 act text12 text13

16 act text14 text15

17 test text85 text86

18 act text87 text88

細(xì)節(jié)

我們使用cumsum. 組標(biāo)簽是：

df.A.eq('spec').cumsum()

0 1

1 1

2 1

3 1

4 1

5 1

6 1

7 1

8 1

9 1

10 1

11 1

12 1

13 2

14 2

15 2

16 2

17 2

18 2

19 2

20 2

21 2

22 2

23 2

Name: A, dtype: int64

然后在此系列上完成分組，并計(jì)算每組的重復(fù)項(xiàng)：

df.groupby(df.A.eq('spec').cumsum()).apply(lambda x: x.duplicated()).values

array([False, False, False, False, False, False, False, False, True,

True, True, True, True, False, False, False, False, False,

False, True, True, True, True, True])

由此，剩下的就是保留對(duì)應(yīng)于“False”的那些行（即不重復(fù)）。

反對(duì) 回復(fù) 2021-09-25

狐的傳說(shuō)

TA貢獻(xiàn)1804條經(jīng)驗(yàn) 獲得超3個(gè)贊

另一個(gè)可能的解決方案可能是......您可以擁有一個(gè)計(jì)數(shù)器并從 A 列創(chuàng)建一個(gè)帶有計(jì)數(shù)器值的新列，每當(dāng)您在列值中遇到規(guī)范時(shí)，您就會(huì)增加計(jì)數(shù)器值。

counter = 0

def counter_fun(val):

if val == 'spec': counter+=1

return counter

df['new_col'] = df.A.apply(counter_fun)

然后在 new_col 上分組，并刪除重復(fù)項(xiàng)。

反對(duì) 回復(fù) 2021-09-25

3 回答
0 關(guān)注
219 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書(shū)簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

僅刪除組內(nèi)的重復(fù)項(xiàng)

僅刪除組內(nèi)的重復(fù)項(xiàng)

3 回答

添加回答