2 回答

TA貢獻(xiàn)1829條經(jīng)驗(yàn) 獲得超9個(gè)贊
groupy()與_np.where()
考慮這個(gè)樣本:
>>> df = pd.DataFrame({'id':[1,2,3,4,5], 'tag': ['a','a','a','d','e']})
>>> df
id tag
0 1 a
1 2 a
2 3 a
3 4 d
4 5 e
>>> df['counter'] = df.groupby(['tag'])['tag'].transform('count')
>>> df
id tag counter
0 1 a 3
1 2 a 3
2 3 a 3
3 4 d 1
4 5 e 1
>>> df['counter'] = np.where(df['counter'] > 2, ['Retain'], ['Remove'])
>>> df
id tag counter
0 1 a Retain
1 2 a Retain
2 3 a Retain
3 4 d Remove
4 5 e Remove
>>> df = df[df['counter'].isin(['Retain'])]
>>> df
id tag counter
0 1 a Retain
1 2 a Retain
2 3 a Retain

TA貢獻(xiàn)1895條經(jīng)驗(yàn) 獲得超3個(gè)贊
添加一列標(biāo)記要保留的值,然后按此過(guò)濾:
# Make a boolean series as a mapping of values with more than 2 counts
more_than_2_values = df1.b.value_counts() > 2
# Add a new column that indicates which values should be kept
df1["more_than_2"] = df["b"].map(more_than_2_values).fillna(False)
# Filter the data, drop the label column if desired
desired_result = df1[df1["more_than_2"].drop(columns="more_than_2"]
添加回答
舉報(bào)