我正在為 11 個(gè)標(biāo)簽設(shè)計(jì)一個(gè)多類(lèi)分類(lèi)器。我正在SMOTE用來(lái)解決采樣問(wèn)題。但是我面臨以下錯(cuò)誤:-SMOTE 錯(cuò)誤from imblearn.over_sampling import SMOTEsm = SMOTE(random_state=42)X_res, Y_res = sm.fit_sample(X_f, Y_f)錯(cuò)誤~/.local/lib/python3.6/site-packages/sklearn/neighbors/base.py in kneighbors(self, X, n_neighbors, return_distance) 414 "Expected n_neighbors <= n_samples, " 415 " but n_samples = %d, n_neighbors = %d" %--> 416 (train_size, n_neighbors) 417 ) 418 n_samples, _ = X.shapeValueError: Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6為什么它說(shuō)我只有 1 個(gè) n_samples?當(dāng)我為 10 萬(wàn)行(只有 4 個(gè)標(biāo)簽)的小得多的數(shù)據(jù)集嘗試相同的代碼時(shí),它運(yùn)行得很好。有關(guān)輸入的詳細(xì)信息輸入?yún)?shù)X_farray([[1.43347000e+05, 1.00000000e+00, 2.03869492e+03, ..., 1.00000000e+00, 1.00000000e+00, 1.35233019e+03], [5.09050000e+04, 0.00000000e+00, 0.00000000e+00, ..., 5.09050000e+04, 0.00000000e+00, 5.09050000e+04], [1.43899000e+05, 2.00000000e+00, 2.11447368e+03, ..., 1.00000000e+00, 2.00000000e+00, 1.39707767e+03], ..., [8.50000000e+01, 0.00000000e+00, 0.00000000e+00, ..., 8.50000000e+01, 0.00000000e+00, 8.50000000e+01], [2.33000000e+02, 4.00000000e+00, 4.90000000e+01, ..., 4.00000000e+00, 4.00000000e+00, 7.76666667e+01], [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ..., 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])輸入?yún)?shù)的維度print(X_f.shape, Y_f.shape)(2087620, 31) (2087620, 11)我嘗試使用其他imblearn包裝技術(shù)調(diào)試 SMOTE fit_resample() 方法 我知道 SMOTE 通過(guò)使用少數(shù)數(shù)據(jù)點(diǎn)的最近鄰居之間的歐幾里德距離合成少數(shù)樣本來(lái)工作。所以我打印了 ../python3.6/site-packages/sklearn/neighbors/base.py 文件中的 n_samples 變量。它顯示樣本從 5236 -> 103 -> 3 穩(wěn)步減少,然后我得到了錯(cuò)誤。我不明白發(fā)生了什么。使用SVMSMOTE:- 計(jì)算時(shí)間太長(zhǎng)(超過(guò) 2 天),并且 PC 崩潰。使用RandomOverSampler:- 模型的準(zhǔn)確度很差,為 45%使用不同的sampling_strategy:-minority僅適用于。此處和此處提供的建議也未成功。老實(shí)說(shuō),我無(wú)法理解他們。當(dāng)我將數(shù)據(jù)集減少到 100k、1k 和 5k 行時(shí),收到了同樣的錯(cuò)誤。盡管?chē)L試過(guò),但我還是不太明白。我是采樣的新手。你能幫我解決這個(gè)問(wèn)題嗎?
查看完整描述