第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號(hào)安全,請(qǐng)及時(shí)綁定郵箱和手機(jī)立即綁定
已解決430363個(gè)問(wèn)題,去搜搜看,總會(huì)有你想問(wèn)的

更正 Pandas DataFrame 中的混亂日期

更正 Pandas DataFrame 中的混亂日期

瀟湘沐 2022-12-27 09:55:49
我有一個(gè)百萬(wàn)行的時(shí)間序列數(shù)據(jù)框,其中 Date 列中的某些值具有混亂的日/月值。我如何有效地理清它們而又不破壞那些正確的東西?# this creates a dataframe with muddled datesimport pandas as pdimport numpy as npfrom pandas import Timestampstart = Timestamp(2013,1,1)dates = pd.date_range(start, periods=942)[::-1]muddler = {}for d in dates:    if d.day < 13:        muddler[d] = Timestamp(d.year, d.day, d.month)    else:        muddler[d] = Timestamp(d.year, d.month, d.day)df = pd.DataFrame()df['Date'] = datesdf['Date'] =  df['Date'].map(muddler)# now what? (assuming I don't know how the dates are muddled)
查看完整描述

2 回答

?
小唯快跑啊

TA貢獻(xiàn)1863條經(jīng)驗(yàn) 獲得超2個(gè)贊

一個(gè)選項(xiàng)可能是計(jì)算時(shí)間戳的擬合度,并修改那些偏離擬合度大于特定閾值的時(shí)間戳。例子:


import pandas as pd

import numpy as np


start = pd.Timestamp(2013,1,1)

dates = pd.date_range(start, periods=942)[::-1]


muddler = {}

for d in dates:

    if d.day < 13:

        muddler[d] = pd.Timestamp(d.year, d.day, d.month)

    else:

        muddler[d] = pd.Timestamp(d.year, d.month, d.day)


df = pd.DataFrame()

df['Date'] = dates

df['Date'] =  df['Date'].map(muddler)


# convert date col to posix timestamp

df['ts'] = df['Date'].values.astype(np.float) / 10**9


# calculate a linear fit for ts col

x = np.linspace(df['ts'].iloc[0], df['ts'].iloc[-1], df['ts'].size)

df['ts_linfit'] = np.polyval(np.polyfit(x, df['ts'], 1), x)


# set a thresh and derive a mask that masks differences between 

# fit and timestamp greater than thresh:

thresh = 1.2e6 # you might want to tweak this...

m = np.absolute(df['ts']-df['ts_linfit']) > thresh


# create new date col as copy of original

df['Date_filtered'] = df['Date']


# modify values that were caught in the mask

df.loc[m, 'Date_filtered'] = df['Date_filtered'][m].apply(lambda x: pd.Timestamp(x.year, x.day, x.month))


# also to posix timestamp

df['ts_filtered'] = df['Date_filtered'].values.astype(np.float) / 10**9



ax = df['ts'].plot(label='original')

ax = df['ts_filtered'].plot(label='filtered')

ax.legend()

http://img1.sycdn.imooc.com//63aa51390001443e06020437.jpg

查看完整回答
反對(duì) 回復(fù) 2022-12-27
?
翻翻過(guò)去那場(chǎng)雪

TA貢獻(xiàn)2065條經(jīng)驗(yàn) 獲得超14個(gè)贊

在嘗試創(chuàng)建一個(gè)最小的可重現(xiàn)示例時(shí),我實(shí)際上已經(jīng)解決了我的問(wèn)題——但我希望有一種更有效的方法來(lái)做我想做的事情……


# i first define a function to examine the dates


def disordered_muddle(date_series, future_first=True):

    """Check whether a series of dates is disordered or just muddled"""

    disordered = []

    muddle = []

    dates = date_series

    different_dates = pd.Series(dates.unique())

    date = different_dates[0]

    for i, d in enumerate(different_dates[1:]):

        # we expect the date's dayofyear to decrease by one

        if d.dayofyear!=date.dayofyear-1:

            # unless the year is changing

            if d.year!=date.year-1:

                try:

                    # we check if the day and month are muddled

                    # if d.day > 12 this will cause an Exception

                    unmuddle = Timestamp(d.year,d.day,d.month)

                    if unmuddle.dayofyear==date.dayofyear-1:

                        muddle.append(d)

                        d = unmuddle

                    elif unmuddle.year==date.year-1:

                        muddle.append(d)

                        d = unmuddle

                    else:

                        disordered.append(d)

                except:

                    disordered.append(d)

        date=d

    if len(disordered)==0 and len(muddle)==0:

        return False

    else:

        return disordered, muddle


disorder, muddle = disordered_muddle(df['Date'])


# finally unmuddle the dates

date_correction = {}


for d in df['Date']:

    if d in muddle:

        date_correction[d] = Timestamp(d.year, d.day, d.month)

    else:

        date_correction[d] = Timestamp(d.year, d.month, d.day)


df['CorrectedDate'] = df['Date'].map(date_correction)


disordered_muddle(df['CorrectedDate'])


查看完整回答
反對(duì) 回復(fù) 2022-12-27
  • 2 回答
  • 0 關(guān)注
  • 151 瀏覽
慕課專欄
更多

添加回答

舉報(bào)

0/150
提交
取消
微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)