第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號(hào)安全,請(qǐng)及時(shí)綁定郵箱和手機(jī)立即綁定
已解決430363個(gè)問題,去搜搜看,總會(huì)有你想問的

具有 2 列的 Groupby - “pandas.core.groupby.generic”

具有 2 列的 Groupby - “pandas.core.groupby.generic”

qq_花開花謝_0 2023-03-16 10:53:49
對(duì)于當(dāng)前的項(xiàng)目,我計(jì)劃將 Pandas DataFrame 分組為stock_symbol第一標(biāo)準(zhǔn)和quarter第二標(biāo)準(zhǔn)。從其他線程中,我已經(jīng)看到類似的結(jié)構(gòu)group_data = df.groupby(['stock_symbol', 'quarter'])可能是這一點(diǎn)的可能解決方案。在給定的情況下,我只收到終端輸出<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11fdcbf10>。有沒有人發(fā)現(xiàn)我這條線的思維錯(cuò)誤?相關(guān)代碼部分如下所示:# Datetime conversiondf['date'] = pd.to_datetime(df['date'])# Adding of 'Quarter' columndf['quarter'] = df['date'].dt.to_period('Q')# Grouping both the Stock Symbol and the Quarter columngroup_data = df.groupby(['stock_symbol', 'quarter'])print(group_data)在操作中要調(diào)用的函數(shù)突出顯示如下:# Word frequency analysisdef get_top_n_bigram(corpus, n=None):    vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)    bag_of_words = vec.transform(corpus)    sum_words = bag_of_words.sum(axis=0)    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)    return words_freq[:n]
查看完整描述

1 回答

?
慕斯王

TA貢獻(xiàn)1864條經(jīng)驗(yàn) 獲得超2個(gè)贊

這是實(shí)現(xiàn)您所追求的目標(biāo)的一種方法:


自定義函數(shù):


def get_top_n_bigram(row):

    corpus = row['txt_main'] + row['txt_pro'] + row['txt_con'] + row['txt_adviceMgmt']

    n = 2 % the top n

    vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)

    bag_of_words = vec.transform(corpus)

    sum_words = bag_of_words.sum(axis=0)

    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]

    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)

    return words_freq[:n]

使用定義的函數(shù)調(diào)用groupbywith :apply


df['date'] = pd.to_datetime(df['date'])

df['quarter'] = df['date'].dt.to_period('Q')

newdf = df.groupby(['stock_symbol', 'quarter']).apply(get_top_n_bigram).to_frame(name = 'frequencies')


print(newdf)

                                                  frequencies

stock_symbol quarter                                             

AMG          2011Q3         [(smart driven, 2), (driven risk, 2)]

             2013Q1   [(asset management, 2), (smart working, 1)]

             2014Q1     [(audit firm, 3), (employment agency, 2)]

MMM          2017Q2               [(working 3m, 1), (3m time, 1)]


查看完整回答
反對(duì) 回復(fù) 2023-03-16
  • 1 回答
  • 0 關(guān)注
  • 132 瀏覽
慕課專欄
更多

添加回答

舉報(bào)

0/150
提交
取消
微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)