2 回答

TA貢獻1796條經(jīng)驗 獲得超4個贊
使用Series.duplicated或DataFrame.duplicated指定列和參數(shù)keep='last',然后將反轉(zhuǎn)掩碼轉(zhuǎn)換為整數(shù)以True/False進行1/0映射或使用numpy.where:
df['Last_dup1'] = (~df['Policy_id'].duplicated(keep='last')).astype(int)
df['Last_dup1'] = np.where(df['Policy_id'].duplicated(keep='last'), 0, 1)
或者:
df['Last_dup1'] = (~df.duplicated(subset=['Policy_id'], keep='last')).astype(int)
df['Last_dup1'] = np.where(df.duplicated(subset=['Policy_id'], keep='last'), 0, 1)
print (df)
Id Policy_id Start_Date Last_dup Last_dup1
0 0 b123 2019/02/24 0 0
1 1 b123 2019/03/24 0 0
2 2 b123 2019/04/24 1 1
3 3 c123 2018/09/01 0 0
4 4 c123 2018/10/01 1 1
5 5 d123 2017/02/24 0 0
6 6 d123 2017/03/24 1 1

TA貢獻1796條經(jīng)驗 獲得超7個贊
也可以通過下面提到的方式完成(不使用Series.duplicated):
dictionary = df[['Id','Policy_id']].set_index('Policy_id').to_dict()['Id']
#here the dictionary values contains the most recent Id's
df['Last_dup'] = df.Id.apply(lambda x: 1 if x in list(dictionary.values()) else 0)
添加回答
舉報