第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

如何檢查是否存在并在CSV蟒蛇中提取年份和百分比

如何檢查是否存在并在CSV蟒蛇中提取年份和百分比

明月笑刀無情 2022-09-13 09:56:08
我有一個CSV文件,新聞.csv,其中包含許多數(shù)據(jù)。我想檢查該行是否包含任何年份,如果是,則為 1,否則為 0。這也適用于百分比,如果行包含百分比,則返回 1,否則為 0。并且還要提取它們。以下是到目前為止我的代碼。我遇到錯誤(值錯誤:通過的項目數(shù)量錯誤2,放置意味著1),當我嘗試提取百分比news=pd.read_csv("news.csv")news['year']= news['STORY'].str.extract(r'(?!\()\b(\d+){1}')news["howmanyyear"] = news["STORY"].str.count(r'(?!\()\b(\d+){1}')news["existyear"] = news["howmany"] != 0news["existyear"] = news["existyear"].astype(int)news['percentage']= news['STORY'].str.extract(r'(\s100|\s\d{1})(\.\d+)+%')news.to_csv('news.csv')提取年份的代碼似乎有效,但是,它也提取普通數(shù)字,并且只提取其中一個年份。我的 CSV 文件示例ID  STORY                                                            1   There are a total of 2,070 people died in 2001 due to the virus                         2   20% of people in the village have diabetes in 2007                        3   About 70 percent of them still believe the rumor                            4  In 2003 and 2020, the pneumonia pandemic spread in the world以下是我想要的輸出:ID  STORY                                                            existyear  year    existpercentage  percentage1   There are a total of 2,070 people died in 2001 due to the virus    1        2001      0              -2   20% of people in the village have diabetes in 2007                 1        2007      1              20%3   About 70 percent of them still believe the rumor                   0         -        1              704  In 2003 and 2020, the pneumonia pandemic spread in the world        1       2003,2020  0              -
查看完整描述

1 回答

?
MYYA

TA貢獻1868條經(jīng)驗 獲得超4個贊

創(chuàng)建示例數(shù)據(jù)幀:


c = [1,2,3,4]

d = ["There are a total of 2,070 people died in 2001 due to the virus" , "20% of people in the village have diabetes in 2007 ",

    "About 70 percent of them still believe the rumor", "In 2003 and 2020, the pneumonia pandemic spread in the world"] 

f = ['2001', '2007', '-', '2003,2020']

g = ['-', '20%', '70', '-']

df = pd.DataFrame([c,d,f,g]).T

df.rename(columns = {0:'ID ', 1:'STORY', 2:'year', 3:'percentage'}, inplace = True)

斷續(xù)器:


ID  STORY                                                           year    percentage

1   There are a total of 2,070 people died in 2001 due to the virus 2001    -

2   20% of people in the village have diabetes in 2007              2007    20%

3   About 70 percent of them still believe the rumor                -       70

4   In 2003 and 2020, the pneumonia pandemic spread in the world    2003,2020 -

法典:


def year_exits_or_not(row):

    if re.match(r'.*([1-3][0-9]{3})', row):

        return 1

    else:

        return 0


def perc_or_not(row):

    if re.match(r'.*\d+', row):

        return 1

    else:

        return 0


df['existyear'] = df.year.apply(year_exits_or_not)

df['existpercentage'] = df.percentage.apply(perc_or_not)

斷續(xù)器:


ID  STORY                                                            existyear  year    existpercentage  percentage

1   There are a total of 2,070 people died in 2001 due to the virus    1        2001      0              -

2   20% of people in the village have diabetes in 2007                 1        2007      1              20%

3   About 70 percent of them still believe the rumor                   0         -        1              70

4   In 2003 and 2020, the pneumonia pandemic spread in the world       1       2003,2020  0              -

編輯:


df.year = df.STORY.apply(lambda row: str(re.findall(r'.*?([1-3][0-9]{3})', row))[1:-1])


df.percentage = df.STORY.apply(lambda row: str(re.findall(r"(\d+)(?:%| percent)", row))[1:-1])

斷續(xù)器:


    ID  STORY                                                year          percentage

0   1   There are a total of 2,070 people died in 2001...   '2001'  

1   2   20% of people in the village have diabetes in ...   '2007'         '20'

2   3   About 70 percent of them still believe the rumor                   '70'

3   4   In 2003 and 2020, the pneumonia pandemic sprea...   '2003', '2020'  


查看完整回答
反對 回復 2022-09-13
  • 1 回答
  • 0 關注
  • 86 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網(wǎng)微信公眾號