2 回答

TA貢獻(xiàn)1946條經(jīng)驗(yàn) 獲得超3個(gè)贊
也許您可以嘗試制作索引數(shù)組并首先對(duì)其進(jìn)行洗牌。然后將前 80 個(gè)索引用于第一個(gè) CSV,其余 (20) 個(gè)用于第二個(gè):
from random import shuffle
indices = list(range(1,101))
shuffle(indices)
with open('C:\\train.csv', 'w') as outf:
print('x:data,y:label', file=outf)
for i in indices[:80]:
print('./1/a_%s.csv, 1' % i, file=outf)
with open('C:\\test.csv', 'w') as outf:
print('x:data,y:label', file=outf)
for i in indices[80:]:
print('./1/a_%s.csv, 1' % i, file=outf)

TA貢獻(xiàn)1829條經(jīng)驗(yàn) 獲得超6個(gè)贊
這是機(jī)器學(xué)習(xí)中的常見問題。scikit-learn有幾個(gè)工具可以處理這個(gè)問題,例如train_test_split
from sklearn.model_selection import train_test_split
indices = list(range(1, 101))
i_a, i_b = train_test_split(indices, train_size=0.8, test_size=0.2)
現(xiàn)在您可以像原始代碼一樣迭代i_a(80 個(gè)隨機(jī)索引)和i_b(20 個(gè)隨機(jī)索引):
with open('C:\\train.csv', 'w') as outf:
print('x:data,y:label', file=outf)
for i in i_a:
print('./1/a_%s.csv, 1' % i, file=outf)
with open('C:\\test.csv', 'w') as outf:
print('x:data,y:label', file=outf)
for i in i_b:
print('./1/a_%s.csv, 1' % i, file=outf)
添加回答
舉報(bào)