2 回答
TA貢獻1878條經(jīng)驗 獲得超4個贊
下面是一段代碼,它使用 XPath 到達最深的“有效”標簽,然后從那里getchildren一直tail深入到實際文本。
import lxml
xml=""" <claim id="CLM-00027" num="00027">
<claim-text> <?insert-start id="REI-00005" date="20191203" ?>27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys. <?insert-end id="REI-00005" ?></claim-text>
</claim>"""
root = lxml.etree.fromstring(xml)
e = root.xpath("/claim/claim-text")
res = e[0].getchildren()[0].tail
print(res)
輸出:
'27。24.根據(jù)權利要求23所述的方法,其中所述非晶態(tài)金屬選自Zr基合金、Ti基合金、Al基合金、Fe基合金、La基合金、Cu基合金、Mg基合金、Pt基合金,和Pd基合金。
TA貢獻1872條經(jīng)驗 獲得超4個贊
通過索引訪問特定的子節(jié)點。
from xml.etree import ElementTree as ET
tree = ET.parse('path_to_your.xml')
root = tree.getroot()
print(root[0].text)
輸出:
27. The method according to claim 23 wherein the amorphous metal is selected from the group consisting of Zr based alloys, Ti based alloys, Al based alloys, Fe based alloys, La based alloys, Cu based alloys, Mg based alloys, Pt based alloys, and Pd based alloys.
添加回答
舉報
