2 回答

TA貢獻(xiàn)2021條經(jīng)驗(yàn) 獲得超8個(gè)贊
這是可行的——不是很優(yōu)雅,但是有效。我擴(kuò)展了您的示例 html,引入了一些更多有問(wèn)題的節(jié)點(diǎn):
test = """
<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<table>
<tr>
<th>test</th>
<td>abc</td>
</tr>
<tr>
<th>test1</th>
<td>abc</td>
<td>abc</td>
</tr>
<tr>
<th>test2</th>
<td>abc</td>
</tr>
<tr>
<a>test3</a>
<td>abcd</td>
</tr>
<tr>
<td>test4</td>
<td>abcd</td>
</tr>
</table>
</body> """
import lxml.html
doc = lxml.html.fromstring(test)
good_tags = ['th','td']
targs = doc.xpath('//tr')
for targ in targs:
tr = targ.xpath('.//*')
if len(tr)==2 and (tr[0].tag != tr[1].tag) and tr[0].tag in good_tags and tr[1].tag in good_tags:
print(lxml.html.tostring(targ).decode())
輸出:
<tr>
<th>test</th>
<td>abc</td>
</tr>
<tr>
<th>test2</th>
<td>abc</td>
</tr>

TA貢獻(xiàn)1946條經(jīng)驗(yàn) 獲得超4個(gè)贊
一班 XPath :
//tr[count (./*)=2 and count(./th)=1 and count(./td)=1]
- 2 回答
- 0 關(guān)注
- 168 瀏覽
添加回答
舉報(bào)