2 回答

TA貢獻(xiàn)1853條經(jīng)驗 獲得超6個贊
不是悸子
result = []
for i in df['ID'].unique():
adf = df[df['ID'] == i].sort_values(by="Date").reset_index(drop=True)
i = adf.where(adf['Start_flag'] == 1).last_valid_index()
result.append(adf.iloc[range(i, len(adf))])
print (pd.concat(result).reset_index(drop=True))
輸出:
Date ID Start_flag end
0 2019-01-09 100 1 0
1 2019-01-10 100 0 0
2 2019-01-11 100 0 0
3 2019-01-12 100 0 0
4 2019-01-03 500 1 0
5 2019-01-04 500 0 0
6 2019-01-05 500 0 0
7 2019-01-06 500 0 0
8 2019-01-07 500 0 0
9 2019-01-08 500 0 0
10 2019-01-09 700 1 0
11 2019-01-10 700 0 0
12 2019-01-11 700 0 1
注意:我們可以通過將邏輯移動到函數(shù)并通過 調(diào)用函數(shù)來避免循環(huán)。但是,在拳頭組上運(yùn)行函數(shù)兩次,因此我們必須確保我們的函數(shù)沒有副作用。applygroupbygroupby
使用分組:
def fun(adf):
adf = adf.sort_values(by="Date").reset_index(drop=True)
i = adf.where(adf['Start_flag'] == 1).last_valid_index()
return adf.iloc[range(i, len(adf))]
print (df.groupby('ID').apply(fun).reset_index(drop=True))

TA貢獻(xiàn)1845條經(jīng)驗 獲得超8個贊
最終更正的解決方案是:
def validateData(adf):
adf = adf.sort_values(by="Date").reset_index(drop=True)
indx = adf.where(((adf['Start_flag']==0) & (adf['Date']==adf['Date'].min())) | (adf['Start_flag'] == 1)).last_valid_index()
return adf.iloc[range(indx, len(adf))]
def filterData(df):
start_time = datetime.now()
print('Start_time=', start_time)
RESULT_DF = df.groupby('ID').apply(lambda x: validateData(x))
print("--- %s seconds ---" % (datetime.now() - start_time))
return RESULT_DF
要應(yīng)用于數(shù)據(jù):RESULT_DF = filterData(df)
添加回答
舉報