我已將連續(xù)數(shù)據(jù)集轉(zhuǎn)換為分類數(shù)據(jù)集。當(dāng)轉(zhuǎn)換后連續(xù)數(shù)據(jù)的值為 0.0 時,我會得到 nan 值。下面是我的代碼import pandas as pdimport matplotlib as pltdf = pd.read_csv('NSL-KDD/KDDTrain+.txt',header=None)data = df[33]bins = [0.000,0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40,0.45,0.50,0.55,0.60,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1.00]category = pd.cut(data,bins)category = category.to_frame()print (category)如何轉(zhuǎn)換這些值,以免我得到 NaN 值。我附上了兩個屏幕截圖,以便更好地了解實際數(shù)據(jù)的外觀以及轉(zhuǎn)換數(shù)據(jù)的外觀。這是主要數(shù)據(jù)集。這是使用 bins 和 pandas.cut() 后的結(jié)果?!?.00”如何與數(shù)據(jù)集中的其他值保持一致。
1 回答

白板的微信
TA貢獻1883條經(jīng)驗 獲得超3個贊
使用時pd.cut,可以指定參數(shù)include_lowest = True。這將使第一個內(nèi)部左包含(它將包括 0 值,因為您的第一個間隔以 0 開頭)。
因此,在您的情況下,您可以將代碼調(diào)整為
import pandas as pd
import matplotlib as plt
df = pd.read_csv('NSL-KDD/KDDTrain+.txt',header=None)
data = df[33]
bins = [0.000,0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40,0.45,0.50,0.55,0.60,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1.00]
category = pd.cut(data,bins,include_lowest=True)
category = category.to_frame()
print (category)
添加回答
舉報
0/150
提交
取消