我一直在清理這個(gè)銷售數(shù)據(jù)庫(kù),其中的數(shù)據(jù)是從多個(gè)來(lái)源收集的,賬單編號(hào)很混亂,但它們是唯一將多個(gè)訂單引用到同一張賬單的列,但隨著時(shí)間的推移使用不同的系統(tǒng)會(huì)導(dǎo)致賬單編號(hào)重復(fù)。要解決此問(wèn)題,我需要為日期不同的帳單編號(hào)單元格提供一個(gè)新編號(hào),例如,如果我有一張帳單編號(hào)為 1,日期為 2019 年,而另一張帳單的帳單編號(hào)相同,但在 2018 年,我需要給它一個(gè)不同的賬單號(hào)碼。df 的樣本: bill_no item_ser date item size price0 1 111 2018-12-15 15:09:50 Rockla Salad R 39.001 1 111 2018-12-15 15:09:50 Rockla Salad R 39.002 1 112 2018-12-15 15:10:16 Tea R 8.003 1 112 2018-12-15 15:10:16 Tea R 8.004 1 309 2019-02-21 10:02:24 Eggs Toast R 35.005 1 309 2019-02-21 10:02:24 Eggs Toast R 35.006 1 1 2020-07-20 12:38:16 Nody's Sfilatino R 99.757 1 1 2020-07-20 12:38:16 Nody's Sfilatino R 99.758 1 2715 2020-05-06 01:13:41 Basilico Buffalo - R R 110.009 1 2715 2020-05-06 01:13:41 Basilico Buffalo - R R 110.0010 1 2716 2020-05-06 01:13:41 Timmy's Merguez - R R 130.0011 1 2716 2020-05-06 01:13:41 Timmy's Merguez - R R 130.0012 1 2717 2020-05-06 01:13:41 Funghi - R R 105.0013 1 2717 2020-05-06 01:13:41 Funghi - R R 105.0014 1 2718 2020-05-06 01:13:41 Extra Cheese R 20.0015 1 2718 2020-05-06 01:13:41 Extra Cheese R 20.0016 1 8 2020-07-05 16:27:37 Margherita - R R 65.0017 1 8 2020-07-05 16:27:37 Margherita - R R 65.0018 1 9 2020-07-05 16:27:39 Extra Vegetables R 10.0019 1 9 2020-07-05 16:27:39 Extra Vegetables R 10.00我嘗試過(guò) for 循環(huán),但有 150K 行,這需要很多時(shí)間。
1 回答

肥皂起泡泡
TA貢獻(xiàn)1829條經(jīng)驗(yàn) 獲得超6個(gè)贊
# Get new_bill_no on the basis of [bill_no, date]
df1 = df[['bill_no', 'date']].drop_duplicates().reset_index()
df1.rename({'index': 'new_bill_no'}, axis=1, inplace=True)
# On Merging you will get new_bill_no in original df
df = df.merge(df1, on=['bill_no', 'date'], how='left'])
添加回答
舉報(bào)
0/150
提交
取消