2 回答

TA貢獻(xiàn)1803條經(jīng)驗(yàn) 獲得超6個(gè)贊
您可以使用RepeatedStratifiedKFold
,顧名思義,重復(fù) K 折交叉驗(yàn)證器n
時(shí)間。要重復(fù)處理10
時(shí)間,設(shè)置,并在/大小中具有大約 n_repeats
的比例,我們可以設(shè)置:9:1
train
test
n_splits=10
from sklearn.model_selection import RepeatedStratifiedKFold
X = a[:,:-1]
y = a[:,-1]
rskf = RepeatedStratifiedKFold(n_splits=10, n_repeats=10, random_state=2)
for train_index, test_index in rskf.split(X, y):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
print(f'\nClass 1: {((y_train==1).sum()/len(y_train))*100:.0f}%')
print(f'\nShape of train: {X_train.shape[0]}')
print(f'Shape of test: {X_test.shape[0]}')
Class 1: 73%
Shape of train: 33
Shape of test: 4
Class 1: 73%
Shape of train: 33
Shape of test: 4
Class 1: 73%
Shape of train: 33
Shape of test: 4
Class 1: 73%
Shape of train: 33
Shape of test: 4
...

TA貢獻(xiàn)1845條經(jīng)驗(yàn) 獲得超8個(gè)贊
將數(shù)據(jù)拆分為訓(xùn)練和測(cè)試的一種眾所周知的方法是 scikit-learn train_test_split
。
model_selection.train_test_split的 API 文檔。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=42)
您可以使用random_state
變量(種子),直到您的類之間的比例正確。雖然train_test_split
不會(huì)強(qiáng)制執(zhí)行比例,但它通常遵循人口比例。
添加回答
舉報(bào)