4 回答

TA貢獻(xiàn)1880條經(jīng)驗(yàn) 獲得超4個(gè)贊
用于isin檢查每組的日期范圍user以及agg.sum每組返回的布爾掩碼
df['dt'] = pd.to_datetime(df['dt']) #if `dt` columns already in datetime dtype, ignore this
check_dates = pd.date_range('2015-12-31', '2016-01-10', freq='D')
s = df.groupby('user').dt.agg(lambda x: (~check_dates.isin(x)).sum())
Out[920]:
user
a 5
b 4
c 0
Name: dt, dtype: int64

TA貢獻(xiàn)1951條經(jīng)驗(yàn) 獲得超3個(gè)贊
### Convert your dates to datetime
df['dt'] = pd.to_datetime(df['dt'], infer_datetime_format=True)
### Create the list of dates per user
user_days = df.groupby('user')['dt'].apply(list)
### Initialize the final dataframe
df_miss_dates = pd.DataFrame(user_days)
all_dates = pd.date_range('2015-12-31', '2016-01-10', freq='D')
### Find the number of missing dates per user
df_miss_dates['missing_days'] = df_miss_dates['dt'].apply(lambda x: len(set(all_dates) - set(x)))
df_miss_dates.drop(columns='dt', inplace=True)
print(df_miss_dates)
輸出:
missing_days
user
a 5
b 4
c 0

TA貢獻(xiàn)1831條經(jīng)驗(yàn) 獲得超9個(gè)贊
定義以下函數(shù):
def missingDates(grp : pd.Series, d1 : pd.Timestamp, d2 : pd.Timestamp):
ndTotal = (d2 - d1).days + 1
ndPresent = grp[grp.between(d1, d2)].index.size
return ndTotal - ndPresent
然后將其應(yīng)用到每個(gè)組并更改為 DataFrame (正如我從您的帖子中看到的,您只需要一個(gè)DataFrame,有 2 列):
result = df.groupby('user')['dt'].apply(missingDates,
pd.to_datetime('2015-12-31'), pd.to_datetime('2016-01-10'))\
.rename('missing_days').reset_index()
結(jié)果是:
user missing_days
0 a 5
1 b 4
2 c 0
我的解決方案依賴于這樣一個(gè)事實(shí):每個(gè)組中的日期都是唯一的,并且所有日期都沒(méi)有時(shí)間部分。如果不滿足這些條件,則應(yīng)添加日期規(guī)范化和調(diào)用唯一 函數(shù)。
補(bǔ)充說(shuō)明:將dt(列名)更改為其他名稱,因?yàn)閐t是Pandas中日期訪問(wèn)器的名稱。用列名或變量名“覆蓋”標(biāo)準(zhǔn)pandasonic名稱是一種不好的做法。

TA貢獻(xiàn)1833條經(jīng)驗(yàn) 獲得超4個(gè)贊
你可以這樣做
from datetime import date, timedelta
sdate = date(2015, 12, 31) # start date
edate = date(2016, 1, 10) # end date
delta = edate - sdate # as timedelta
days=[]
for i in range(delta.days + 1):
day = sdate + timedelta(days=i)
days.append(str(day))
user=[]
missing_days = []
for user_n in df.user.unique():
user_days = df.loc[df.user ==user_n,'dt' ].to_list()
md = len([day for day in days if day not in user_days])
user.append(user_n)
missing_days.append(md)
new_df = pd.DataFrame({'user': user,'missing_days': missing_days})
new_df
輸出
user missing_days
a 5
b 4
添加回答
舉報(bào)