2 回答

TA貢獻1878條經驗 獲得超4個贊
下面是一段代碼,它使用 XPath 到達最深的“有效”標簽,然后從那里getchildren一直tail深入到實際文本。
import lxml
xml=""" <claim id="CLM-00027" num="00027">
<claim-text> <?insert-start id="REI-00005" date="20191203" ?>27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys. <?insert-end id="REI-00005" ?></claim-text>
</claim>"""
root = lxml.etree.fromstring(xml)
e = root.xpath("/claim/claim-text")
res = e[0].getchildren()[0].tail
print(res)
輸出:
'27。24.根據權利要求23所述的方法,其中所述非晶態(tài)金屬選自Zr基合金、Ti基合金、Al基合金、Fe基合金、La基合金、Cu基合金、Mg基合金、Pt基合金,和Pd基合金。

TA貢獻1872條經驗 獲得超4個贊
通過索引訪問特定的子節(jié)點。
from xml.etree import ElementTree as ET
tree = ET.parse('path_to_your.xml')
root = tree.getroot()
print(root[0].text)
輸出:
27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys.
添加回答
舉報