我正在使用一個(gè)數(shù)據(jù)集,該數(shù)據(jù)集的第一列中包含情感或類別標(biāo)簽。但是,由于數(shù)據(jù)集不平衡,我需要為每個(gè)類別提取相同數(shù)量的行。也就是說,如果有 10 個(gè)類別,我只需從每個(gè)類別中選擇 100 行樣本。結(jié)果將是 1000 行樣本。我嘗試過的:def append_new_rows(df, new_df, s): c = 0 for index, row in df.iterrows(): if s == row[0]: if c <= 100: new_df.append(row) c += 1 return df_2for s in sorted(list(set(df.category))): new_df = append_new_rows(df, new_df, s)數(shù)據(jù)集----------------------------| category | A | B | C | D |----------------------------| happy | ...| ...|...|...|| ... | ...| ...|...|...|| sadness | ...| ...|...|...|預(yù)期產(chǎn)出----------------------------| category | A | B | C | D |----------------------------| happy | ...| ...|...|...|... 100 samples of happy| ... | ...| ...|...|...|| sadness | ...| ...|...|...|... 100 samples of sadness......1000 sampple rows
1 回答

Helenr
TA貢獻(xiàn)1780條經(jīng)驗(yàn) 獲得超4個(gè)贊
def append_new_df(df, df_2, s, n):
c = 1
for index, row in df.iterrows():
if s == row[0]:
if c <= n:
df_2 = df_2.append(row)
c += 1
return df_2
你就在那里,你只需要做這樣的事情
添加回答
舉報(bào)
0/150
提交
取消