首頁(yè) 猿問(wèn) 使用 beautifulsoup...

使用 beautifulsoup 替換表格內(nèi)容

Html5

繁花不似錦 2023-10-24 15:06:05

我想使用 beautiful soup 解析一個(gè) HTML 文檔，其中也包含表格數(shù)據(jù)。我正在對(duì)此做一些 NLP。表格單元格可能只有數(shù)字，也可能有大量文本。因此，在執(zhí)行 soup.get_text() 之前，我希望根據(jù)以下條件更改表格數(shù)據(jù)的內(nèi)容。條件：如果單元格有兩個(gè)以上的單詞（我們可以認(rèn)為一個(gè)數(shù)字是一個(gè)單詞），則只保留它，否則將單元格內(nèi)容更改為空字符串。<code to change table data based on condition>soup = BeautifulSoup(html)text = soup.get_text()這是我嘗試過(guò)的。 tables = soup.find_all('table') for table in tables: table_body = table.find('tbody') rows = table_body.find_all('tr') for row in rows: cols = row.find_all('td') for ele in cols: if len(ele.text.split(' ')<3): ele.text = ''但是，我們無(wú)法設(shè)置 ele.text，因此它會(huì)引發(fā)錯(cuò)誤。這是一個(gè)帶有表格的簡(jiǎn)單 HTML 結(jié)構(gòu)<!DOCTYPE html><html> <head> <title>HTML Tables</title> </head> <body> <table border = "1"> <tr> <td>Row 1, Column 1, This should be kept because it has more than two tokens</td> <td>not kept</td> </tr> <tr> <td>Row 2, Column 1, should be kept</td> <td>Row 2, Column 2, should be kept</td> </tr> </table> </body></html>

查看完整描述

1 回答

慕絲7291255

TA貢獻(xiàn)1859條經(jīng)驗(yàn) 獲得超6個(gè)贊

一旦找到該元素，然后使用ele.string.replace_with("")

基于您的示例 html

html='''<html>

<head>

<title>HTML Tables</title>

</head>

<body>

<tr>

<td>Row 1, Column 1, This should be kept because it has more than two tokens</td>

</tr>

<tr>

<td>Row 2, Column 1, should be kept</td>

<td>Row 2, Column 2, should be kept</td>

</tr>

</table>

</body>

</html>'''

soup=BeautifulSoup(html,'html.parser')

tables = soup.find_all('table')

for table in tables:

rows = table.find_all('tr')

for row in rows:

cols = row.find_all('td')

for ele in cols:

if len(ele.text.split(' '))<3:

ele.string.replace_with("")

print(soup)

輸出：

<html>

<head>

<title>HTML Tables</title>

</head>

<body>

<tr>

<td>Row 1, Column 1, This should be kept because it has more than two tokens</td>

</tr>

<tr>

<td>Row 2, Column 1, should be kept</td>

<td>Row 2, Column 2, should be kept</td>

</tr>

</table>

</body>

</html>

反對(duì) 回復(fù) 2023-10-24

1 回答
0 關(guān)注
117 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

使用 beautifulsoup 替換表格內(nèi)容

使用 beautifulsoup 替換表格內(nèi)容

1 回答

添加回答