1 回答

TA貢獻(xiàn)1752條經(jīng)驗(yàn) 獲得超4個贊
對由 生成的每一列使用GroupBy.transform
with?:GroupBy.last
Index.difference
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%m/%d/%y')
for c in df.columns.difference(['project_id','timestamp']):
? ? df[c] = df.groupby(['project_id',c], sort=False)['timestamp'].transform('last')
print (df)
? ?project_id project_name? ? ?region? ? ? style? ? ?effect representative? \
0? ? ? ? ? ?1? ?2020-09-05 2020-04-06 2019-10-15 2020-09-05? ? ?2020-04-06? ?
1? ? ? ? ? ?1? ?2020-09-05 2020-04-06 2019-10-15 2020-09-05? ? ?2020-04-06? ?
2? ? ? ? ? ?1? ?2020-08-20 2020-04-06 2019-10-15 2019-10-15? ? ?2020-04-06? ?
3? ? ? ? ? ?1? ?2019-10-15 2020-04-06 2019-10-15 2019-10-15? ? ?2020-04-06? ?
4? ? ? ? ? ?1? ?2019-10-15 2019-10-15 2019-10-15 2019-10-15? ? ?2019-10-15? ?
5? ? ? ? ? ?1? ?2019-10-15 2019-10-15 2019-10-15 2019-10-15? ? ?2019-10-15? ?
? ?timestamp??
0 2020-10-01??
1 2020-09-05??
2 2020-08-20??
3 2020-04-06??
4 2019-12-31??
如果需要原始格式添加Series.dt.strftime
:
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%m/%d/%y')
for c in df.columns.difference(['project_id','timestamp']):
? ? df[c] = (df.groupby(['project_id',c], sort=False)['timestamp'].transform('last')
? ? ? ? ? ? ? ?.dt.strftime('%m/%d/%y'))
print (df)
? ?project_id project_name? ? region? ? ?style? ? effect representative? \
0? ? ? ? ? ?1? ? ?09/05/20? 04/06/20? 10/15/19? 09/05/20? ? ? ?04/06/20? ?
1? ? ? ? ? ?1? ? ?09/05/20? 04/06/20? 10/15/19? 09/05/20? ? ? ?04/06/20? ?
2? ? ? ? ? ?1? ? ?08/20/20? 04/06/20? 10/15/19? 10/15/19? ? ? ?04/06/20? ?
3? ? ? ? ? ?1? ? ?10/15/19? 04/06/20? 10/15/19? 10/15/19? ? ? ?04/06/20? ?
4? ? ? ? ? ?1? ? ?10/15/19? 10/15/19? 10/15/19? 10/15/19? ? ? ?10/15/19? ?
5? ? ? ? ? ?1? ? ?10/15/19? 10/15/19? 10/15/19? 10/15/19? ? ? ?10/15/19? ?
? ?timestamp??
0 2020-10-01??
1 2020-09-05??
2 2020-08-20??
3 2020-04-06??
4 2019-12-31??
5 2019-10-15??
編輯:fillna按最小時間戳添加:
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%m/%d/%y')
min1 = df['timestamp'].min()
for c in df.columns.difference(['project_id','timestamp']):
? ? df[c] = df.groupby(['project_id',c], sort=False)['timestamp'].transform('last').fillna(min1)
print (df)
? ?project_id project_name? ? ?region? ? ? style? ? ?effect representative? \
0? ? ? ? ? ?1? ?2020-09-05 2020-04-06 2019-10-15 2020-09-05? ? ?2020-04-06? ?
1? ? ? ? ? ?1? ?2020-09-05 2020-04-06 2019-10-15 2020-09-05? ? ?2020-04-06? ?
2? ? ? ? ? ?1? ?2020-08-20 2020-04-06 2019-10-15 2019-10-15? ? ?2020-04-06? ?
3? ? ? ? ? ?1? ?2019-10-15 2020-04-06 2019-10-15 2019-10-15? ? ?2020-04-06? ?
4? ? ? ? ? ?1? ?2019-10-15 2019-10-15 2019-10-15 2019-10-15? ? ?2019-10-15? ?
5? ? ? ? ? ?1? ?2019-10-15 2019-10-15 2019-10-15 2019-10-15? ? ?2019-10-15? ?
? ? ? ? lazy? timestamp??
0 2020-09-05 2020-10-01??
1 2020-09-05 2020-09-05??
2 2019-10-15 2020-08-20??
3 2019-10-15 2020-04-06??
4 2019-10-15 2019-12-31??
5 2019-10-15 2019-10-15??
添加回答
舉報