第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號(hào)安全,請(qǐng)及時(shí)綁定郵箱和手機(jī)立即綁定
已解決430363個(gè)問題,去搜搜看,總會(huì)有你想問的

考慮各種列條件對(duì)獨(dú)特元素進(jìn)行分類和計(jì)數(shù)

考慮各種列條件對(duì)獨(dú)特元素進(jìn)行分類和計(jì)數(shù)

一只斗牛犬 2023-10-31 16:13:11
你好,我正在使用 python 對(duì)一些數(shù)據(jù)進(jìn)行分類:Articles                                       FilenameA New Marine Ascomycete from Brunei.    Invasive Species.csvA new genus and four new species        Forestry.csvA new genus and four new species        Invasive Species.csv我想知道每個(gè)“文件名”有多少個(gè)獨(dú)特的“文章”。所以我想要的輸出是這樣的:Filename                             Count_UniqueInvasive Species.csv                 1Forestry.csv                         0另一件事,我也想得到這個(gè)輸出:Filename1                        Filename2                         Count_Common articlesForestry.csv                     Invasive Species.csv               1我連接了數(shù)據(jù)集并最終計(jì)算了每個(gè)“文件名”中存在的元素。有誰愿意幫忙嗎?我已經(jīng)嘗試過unique(), drop_duplicates()等,但似乎我無法得到我想要的輸出。無論如何,這是我的代碼的最后幾行:concatenated = pd.concat(data, ignore_index =True)concatenatedconcatenated.groupby(['Title','Filename']).count().reset_index()res = {col:concatenated[col].value_counts() for col in concatenated.columns}res ['Filename']
查看完整描述

1 回答

?
胡說叔叔

TA貢獻(xiàn)1804條經(jīng)驗(yàn) 獲得超8個(gè)贊

沒有魔法。只是一些常規(guī)操作。


(1) 統(tǒng)計(jì)文件中“獨(dú)特”的文章


編輯:添加(快速而骯臟)代碼以包含計(jì)數(shù)為零的文件名


# prevent repetitive counting

df = df.drop_duplicates()


# articles to be removed (the ones appeared more than once)

dup_articles = df["Articles"].value_counts()

dup_articles = dup_articles[dup_articles > 1].index

# remove duplicate articles and count

mask_dup_articles = df["Articles"].isin(dup_articles)

df_unique = df[~mask_dup_articles]

df_unique["Filename"].value_counts()


# N.B. all filenames not shown here of course has 0 count.

#      I will add this part later on.


Out[68]: 

Invasive Species.csv    1

Name: Filename, dtype: int64


# unique article count with zeros

df_unique_nonzero_count = df_unique["Filename"].value_counts().to_frame().reset_index()

df_unique_nonzero_count.columns = ["Filename", "count"]


df_all_filenames = pd.DataFrame(

    data={"Filename": df["Filename"].unique()}

)

# join: all filenames with counted filenames

df_unique_count = df_all_filenames.merge(df_unique_nonzero_count, on="Filename", how="outer")

# postprocess

df_unique_count.fillna(0, inplace=True)

df_unique_count["count"] = df_unique_count["count"].astype(int)

# print

df_unique_count


Out[119]: 

               Filename  count

0  Invasive Species.csv      1

1          Forestry.csv      0

(2)統(tǒng)計(jì)文件之間的共同文章


# pick out records containing duplicate articles

df_dup = df[mask_dup_articles]

# merge on articles and then discard self- and duplicate pairs

df_merge = df_dup.merge(df_dup, on=["Articles"], suffixes=("1", "2"))

df_merge = df_merge[df_merge["Filename1"] > df_merge["Filename2"]] # alphabetical ordering

# count

df_ans2 = df_merge.groupby(["Filename1", "Filename2"]).count()

df_ans2.reset_index(inplace=True)  # optional

df_ans2


Out[70]: 

              Filename1     Filename2  Articles

0  Invasive Species.csv  Forestry.csv         1


查看完整回答
反對(duì) 回復(fù) 2023-10-31
  • 1 回答
  • 0 關(guān)注
  • 164 瀏覽
慕課專欄
更多

添加回答

舉報(bào)

0/150
提交
取消
微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)