首頁猿問 python 中的無限 while...

python 中的無限 while 循環(huán)，用 pandas 計(jì)算標(biāo)準(zhǔn)差

Python

繁華開滿天機(jī) 2023-06-27 14:14:52

我們正在嘗試刪除異常值，但出現(xiàn)了無限循環(huán)對(duì)于一個(gè)學(xué)校項(xiàng)目，我們（我和一個(gè)朋友）認(rèn)為創(chuàng)建一個(gè)基于數(shù)據(jù)科學(xué)的工具是個(gè)好主意。為此，我們開始清理數(shù)據(jù)庫（我不會(huì)在這里導(dǎo)入它，因?yàn)樗螅▁lsx 文件、csv 文件））。我們現(xiàn)在嘗試使用“duration_分鐘”列的“標(biāo)準(zhǔn)差*3 + 平均值”規(guī)則刪除異常值。這是我們用來計(jì)算標(biāo)準(zhǔn)差和平均值的代碼：def calculateSD(database, column): column = database[[column]] SD = column.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None) return SDdef calculateMean(database, column): column = database[[column]] mean = column.mean() return mean我們認(rèn)為要做到以下幾點(diǎn)：#Now we have to remove the outliers using the code from the SD.py and SDfunction.py filesminutes = trainsData['duration_minutes'].tolist() #takes the column duration_minutes and puts it in a listSD = int(calculateSD(trainsData, 'duration_minutes')) #calculates the SD of the columnmean = int(calculateMean(trainsData, 'duration_minutes'))SDhigh = mean+3*SD上面的代碼計(jì)算起始值。然后我們啟動(dòng)一個(gè) while 循環(huán)來刪除異常值。刪除異常值后，我們重新計(jì)算標(biāo)準(zhǔn)差、均值和 SDhigh。這是 while 循環(huán)：while np.any(i >= SDhigh for i in minutes): #used to be >=, it doesnt matter for the outcome trainsData = trainsData[trainsData['duration_minutes'] < SDhigh] #used to be >=, this caused an infinite loop so I changed it to <=. Then to < minutes = trainsData['duration_minutes'].tolist() SD = int(calculateSD(trainsData, 'duration_minutes')) #calculates the SD of the column mean = int(calculateMean(trainsData, 'duration_minutes')) SDhigh = mean+3*SD print(SDhigh) #to see how the values changed and to confirm it is an infinite loop輸出如下：611652428354322308300296296296296它繼續(xù)打印 296，經(jīng)過幾個(gè)小時(shí)的嘗試解決這個(gè)問題，我們得出的結(jié)論是我們沒有我們希望的那么聰明。

查看完整描述

1 回答

呼啦一陣風(fēng)

TA貢獻(xiàn)1802條經(jīng)驗(yàn) 獲得超6個(gè)贊

你讓事情變得比原本應(yīng)該的更加困難。計(jì)算標(biāo)準(zhǔn)差以消除異常值，然后重新計(jì)算等等過于復(fù)雜（并且統(tǒng)計(jì)上不合理）。使用百分位數(shù)而不是標(biāo)準(zhǔn)差會(huì)更好

import numpy as np

import pandas as pd

# create data

nums = np.random.normal(50, 8, 200)

df = pd.DataFrame(nums, columns=['duration'])

# set threshold based on percentiles

threshold = df['duration'].quantile(.95) * 2

# now only keep rows that are below the threshold

df = df[df['duration']<threshold]

反對(duì) 回復(fù) 2023-06-27

1 回答
0 關(guān)注
216 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

python 中的無限 while 循環(huán)，用 pandas 計(jì)算標(biāo)準(zhǔn)差

python 中的無限 while 循環(huán)，用 pandas 計(jì)算標(biāo)準(zhǔn)差

1 回答

添加回答