第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號(hào)安全,請(qǐng)及時(shí)綁定郵箱和手機(jī)立即綁定
已解決430363個(gè)問(wèn)題,去搜搜看,總會(huì)有你想問(wèn)的

在第一個(gè)數(shù)據(jù)框給出的兩個(gè)日期之間找到第二個(gè)數(shù)據(jù)框的最小值和最大值

在第一個(gè)數(shù)據(jù)框給出的兩個(gè)日期之間找到第二個(gè)數(shù)據(jù)框的最小值和最大值

qq_笑_17 2023-02-15 16:48:52
我有這 2 個(gè)虛擬數(shù)據(jù)框np.random.seed(12345)df1=pd.DataFrame({'name'    :  ['A']*4+['B']*4,                'start_date':   pd.to_datetime(['2000-03-15', '2000-06-12','2000-09-01', '2001-01-17','2000-03-19', '2000-06-14','2000-09-14', '2001-01-22']),                'end_date':pd.to_datetime(['2000-06-12','2000-09-01', '2001-01-17','2001-03-19', '2000-06-14','2000-09-14', '2001-01-22','2001-02-01'])})date=pd.date_range('2000-01-01','2002-01-01')name=['A']*len(date)+['B']*len(date)date=date.append(date)import numpy as nplow=np.random.rand(len(date))high=low+np.random.rand(len(date))df2=pd.DataFrame({'name': name, 'date': date, 'low':low,'high':high})對(duì)于 df1 中的每一行,我都給出了名稱(chēng)、開(kāi)始日期和結(jié)束日期。我想在 high 中找到最大值,在 low 中找到與名稱(chēng)相同并且在 df2 中的開(kāi)始日期和結(jié)束日期之間的最小值以下是我目前的解決方案。df1=df1.set_index('name')df2=df2.set_index(['name','date'])df2=df2.sort_index()df1['max']=-1df1['min']=-1for name in df1.index.unique():    df=df2.loc[name]    tmphigh=[]    tmplow=[]    for (_,start_date,end_date,_,_) in df1.loc[name].itertuples(name=None):        newdf=df.iloc[df.index.searchsorted(start_date): df.index.searchsorted(end_date)]        tmphigh.append(newdf.high.max())        tmplow.append(newdf.low.min())    df1.loc[[name],['max']]=tmphigh    df1.loc[[name],['min']]=tmplow然而,應(yīng)用超過(guò)百萬(wàn)的行仍然需要相當(dāng)長(zhǎng)的時(shí)間。我想知道是否有更快的方法來(lái)做到這一點(diǎn)。[編輯]:感謝 Pramote Kuacharoen,我能夠調(diào)整他的一些代碼并實(shí)現(xiàn)比我現(xiàn)有代碼快 6 倍的速度。分成循環(huán)的原因是我發(fā)現(xiàn)在 apply 函數(shù)中包含 df2[name] 的生成會(huì)導(dǎo)致計(jì)算時(shí)間顯著增加。因此我分開(kāi)計(jì)算它可能有助于減少函數(shù)調(diào)用以提取 df2 中名稱(chēng)下的所有值。如果有人能提出比我的方法更好的方法,我會(huì)很高興。但這對(duì)我來(lái)說(shuō)已經(jīng)足夠了。以下是我目前的解決方案from tqdm import tqdmdf1a=df1.groupby('name')df2a=df2.groupby('name')mergedf=df1mergedf['maximum']=-1mergedf['minimum']=-1def get_min_max(row):    dfx=df2x.iloc[df2x.index.searchsorted(row['start_date']): df2x.index.searchsorted(row['end_date'])]    maximum = dfx['high'].max()    minimum = dfx['low'].min()     return pd.Series({'maximum': maximum, 'minimum': minimum})for name,df in tqdm(df1a):    df2x=df2a.get_group(name)    mergedf.loc[[name],['maximum','minimum']]=df.apply(get_min_max,axis=1)
查看完整描述

1 回答

?
慕雪6442864

TA貢獻(xiàn)1812條經(jīng)驗(yàn) 獲得超5個(gè)贊

import pandas as pd

df1=pd.DataFrame({'name'    :  ['A']*4+['B']*4,

                'start_date':   pd.to_datetime(['2000-03-15', '2000-06-12','2000-09-01', '2001-01-17','2000-03-19', '2000-06-14','2000-09-14', '2001-01-22']),

                'end_date':pd.to_datetime(['2000-06-12','2000-09-01', '2001-01-17','2001-03-19', '2000-06-14','2000-09-14', '2001-01-22','2001-02-01'])})


date=pd.date_range('2000-01-01','2002-01-01')

name=['A']*len(date)+['B']*len(date)

date=date.append(date)

import numpy as np

low=np.random.rand(len(date))

high=low+np.random.rand(len(date))

df2=pd.DataFrame({'name': name, 'date': date, 'low':low,'high':high})


df2 = df2.set_index('date')


def find_max(row):

    return df2[df2['name'] == row['name']].loc[row['start_date']:row['end_date'], 'high'].max()


def find_min(row):

    return df2[df2['name'] == row['name']].loc[row['start_date']:row['end_date'], 'low'].min()


df1['maximum'] = df1.apply(find_max, axis=1)

df1['minimum'] = df1.apply(find_min, axis=1)

嘗試一次找到最小值和最大值。它可能會(huì)節(jié)省一些時(shí)間。


def find_min_max(row):

    dfx = df2[df2['name'] == row['name']].loc[row['start_date']:row['end_date'], ['high', 'low']]

    maximum = dfx['high'].max()

    minimum = dfx['low'].min()

    return pd.Series({'maximum': maximum, 'minimum': minimum})


df1.merge(df1.apply(find_min_max, axis=1), left_index=True, right_index=True)

試試這個(gè):多處理和共享內(nèi)存。將其保存在 .py 文件中并使用命令行運(yùn)行它。它應(yīng)該快得多。我將 n_workers 設(shè)置為 4。您可以更改它。


import numpy as np

import pandas as pd

from multiprocessing.shared_memory import SharedMemory

from concurrent.futures import ProcessPoolExecutor, as_completed



def find_min_max(name, data_info):


    shm_name, shape, dtype = data_info[0]

    shm1 = SharedMemory(shm_name)

    np1 = np.recarray(shape=shape, dtype=dtype, buf=shm1.buf)


    shm_name, shape, dtype = data_info[1]

    shm2 = SharedMemory(shm_name)

    np2 = np.recarray(shape=shape, dtype=dtype, buf=shm2.buf)


    data1 = np1[np1['name'] == name]

    data2 = np2[np2['name'] == name]


    for rec in data1:

        idx1 = np.searchsorted(data2['date'], rec['start_date'])

        idx2 = np.searchsorted(data2['date'], rec['end_date'])

        data = data2[idx1:idx2]

        np1[rec['index']]['maximum'] = data['high'].max()

        np1[rec['index']]['minimum'] = data['low'].min()



def main():


    np.random.seed(12345)


    df1 = pd.DataFrame({'name':  ['A']*4+['B']*4,

                        'start_date':   pd.to_datetime(['2000-03-15', '2000-06-12', '2000-09-01', '2001-01-17', '2000-03-19', '2000-06-14', '2000-09-14', '2001-01-22']),

                        'end_date': pd.to_datetime(['2000-06-12', '2000-09-01', '2001-01-17', '2001-03-19', '2000-06-14', '2000-09-14', '2001-01-22', '2001-02-01'])})


    date = pd.date_range('2000-01-01', '2002-01-01')

    name = ['A']*len(date)+['B']*len(date)

    date = date.append(date)

    low = np.random.rand(len(date))

    high = low+np.random.rand(len(date))

    df2 = pd.DataFrame({'name': name, 'date': date, 'low': low, 'high': high})


    df1 = df1.sort_values('name')

    df2 = df2.sort_values(['name', 'date'])

    df1['maximum'] = -1.0

    df1['minimum'] = -1.0


    np1 = df1.to_records(column_dtypes={

        'name': '|S20', 'start_date': '<M8[ns]', 'end_date': '<M8[ns]'})

    np2 = df2.to_records(column_dtypes={

        'name': '|S20', 'date': '<M8[ns]', 'low': '<f8', 'high': '<f8'})


    names = [str.encode(name) for name in df1['name'].unique()]

    del df1

    del df2


    shm1 = SharedMemory(name='d1', create=True, size=np1.nbytes)

    shm2 = SharedMemory(name='d2', create=True, size=np2.nbytes)


    shm1_np_array = np.recarray(

        shape=np1.shape, dtype=np1.dtype, buf=shm1.buf)

    np.copyto(shm1_np_array, np1)

    shm2_np_array = np.recarray(

        shape=np2.shape, dtype=np2.dtype, buf=shm2.buf)

    np.copyto(shm2_np_array, np2)


    data_info = [

        (shm1.name, np1.shape, np1.dtype),

        (shm2.name, np2.shape, np2.dtype)

    ]


    del np1

    del np2


    # Set number of workers

    n_workers = 4


    with ProcessPoolExecutor(n_workers) as exe:

        fs = [exe.submit(find_min_max, name, data_info)

              for name in names]

        for _ in as_completed(fs):

            pass


    print(shm1_np_array)


    shm1.close()

    shm2.close()

    shm1.unlink()

    shm2.unlink()



if __name__ == "__main__":

    main()



查看完整回答
反對(duì) 回復(fù) 2023-02-15
  • 1 回答
  • 0 關(guān)注
  • 129 瀏覽
慕課專(zhuān)欄
更多

添加回答

舉報(bào)

0/150
提交
取消
微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢(xún)優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)