首頁猿問如果前三個(gè)句子包含關(guān)鍵字，如何過濾字符串

如果前三個(gè)句子包含關(guān)鍵字，如何過濾字符串

Python

qq_遁去的一_1 2022-11-29 17:17:09

我有一個(gè)名為df. 它有一個(gè)名為的列article。該article列包含 600 個(gè)字符串，每個(gè)字符串代表一篇新聞文章。我只想保留前四句包含關(guān)鍵字“COVID-19”AND（“中國”或“中文”）的文章。但是我無法找到一種方法來自己進(jìn)行此操作。（在字符串中，句子以 . 分隔\n。示例文章如下所示：）\nChina may be past the worst of the COVID-19 pandemic, but they aren’t taking any chances.\nWorkers in Wuhan in service-related jobs would have to take a coronavirus test this week, the government announced, proving they had a clean bill of health before they could leave the city, Reuters reported.\nThe order will affect workers in security, nursing, education and other fields that come with high exposure to the general public, according to the edict, which came down from the country’s National Health Commission.\ .......

查看完整描述

4 回答

梵蒂岡之花

TA貢獻(xiàn)1900條經(jīng)驗(yàn) 獲得超5個(gè)贊

首先，我們定義一個(gè)函數(shù)來根據(jù)您的關(guān)鍵字是否出現(xiàn)在給定句子中返回一個(gè)布爾值：

def contains_covid_kwds(sentence):

kw1 = 'COVID19'

kw2 = 'China'

kw3 = 'Chinese'

return kw1 in sentence and (kw2 in sentence or kw3 in sentence)

然后，我們通過將此函數(shù)（使用）應(yīng)用于您專欄Series.apply的句子來創(chuàng)建一個(gè)布爾系列。df.article

請注意，我們使用 lambda 函數(shù)來截?cái)鄠鬟f給contains_covid_kwds第五次出現(xiàn)的'\n'句子，即您的前四個(gè)句子（有關(guān)其工作原理的更多信息，請點(diǎn)擊此處）：

series = df.article.apply(lambda s: contains_covid_kwds(s[:s.replace('\n', '#', 4).find('\n')]))

然后我們將布爾系列傳遞給df.loc，以便將系列被評(píng)估為的行本地化True：

filtered_df = df.loc[series]

反對(duì) 回復(fù) 2022-11-29

手掌心

TA貢獻(xiàn)1942條經(jīng)驗(yàn) 獲得超3個(gè)贊

您可以使用 pandas apply 方法并按照我的方式進(jìn)行操作。

string = "\nChina may be past the worst of the COVID-19 pandemic, but they aren’t taking any chances.\nWorkers in Wuhan in service-related jobs would have to take a coronavirus test this week, the government announced, proving they had a clean bill of health before they could leave the city, Reuters reported.\nThe order will affect workers in security, nursing, education and other fields that come with high exposure to the general public, according to the edict, which came down from the country’s National Health Commission."

df = pd.DataFrame({'article':[string]})

def findKeys(string):

string_list = string.strip().lower().split('\n')

flag=0

keywords=['china','covid-19','wuhan']

# Checking if the article has more than 4 sentences

if len(string_list)>4:

# iterating over string_list variable, which contains sentences.

for i in range(4):

# iterating over keywords list

for key in keywords:

# checking if the sentence contains any keyword

if key in string_list[i]:

flag=1

break

# Else block is executed when article has less than or equal to 4 sentences

else:

# Iterating over string_list variable, which contains sentences

for i in range(len(string_list)):

# iterating over keywords list

for key in keywords:

# Checking if sentence contains any keyword

if key in string_list[i]:

flag=1

break

if flag==0:

return False

else:

return True

然后在 df 上調(diào)用 pandas apply 方法：-

df['Contains Keywords?'] = df['article'].apply(findKeys)

反對(duì) 回復(fù) 2022-11-29

萬千封印

TA貢獻(xiàn)1891條經(jīng)驗(yàn) 獲得超3個(gè)贊

首先，我創(chuàng)建了一個(gè)系列，其中僅包含原始 `df['articles'] 列的前四個(gè)句子，并將其轉(zhuǎn)換為小寫，假設(shè)搜索應(yīng)該與大小寫無關(guān)。

articles = df['articles'].apply(lambda x: "\n".join(x.split("\n", maxsplit=4)[:4])).str.lower()

然后使用一個(gè)簡單的布爾掩碼僅過濾在前四個(gè)句子中找到關(guān)鍵字的那些行。

df[(articles.str.contains("covid")) & (articles.str.contains("chinese") | articles.str.contains("china"))]

反對(duì) 回復(fù) 2022-11-29

慕工程0101907

TA貢獻(xiàn)1887條經(jīng)驗(yàn) 獲得超5個(gè)贊

這里：

found = []

s1 = "hello"

s2 = "good"

s3 = "great"

for string in article:

if s1 in string and (s2 in string or s3 in string):

found.append(string)

反對(duì) 回復(fù) 2022-11-29

4 回答
0 關(guān)注
182 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如果前三個(gè)句子包含關(guān)鍵字，如何過濾字符串

如果前三個(gè)句子包含關(guān)鍵字，如何過濾字符串

4 回答

添加回答

如果前三個(gè)句子包含關(guān)鍵字，如何過濾字符串

如果前三個(gè)句子包含關(guān)鍵字，如何過濾字符串