2 回答

TA貢獻(xiàn)1851條經(jīng)驗 獲得超3個贊
創(chuàng)建一個只有標(biāo)簽的新文檔。
<p>
迭代
<body>
原始文檔中標(biāo)記的后代。如果遇到
<h5>
標(biāo)簽;將<h5>
標(biāo)簽添加到<p>
標(biāo)簽并將后續(xù)標(biāo)簽作為后代添加到它(
<h5>
)將標(biāo)簽從原始文檔添加到新文檔 - 作為其
<p>
標(biāo)簽 的后代

TA貢獻(xiàn)1789條經(jīng)驗 獲得超10個贊
這是使用 lxml 的 xslt 解決方案。它將處理卸載到 libxml。我在轉(zhuǎn)換樣式表中添加了注釋:
from lxml import etree
xsl = etree.XML('''
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<p>
<xsl:apply-templates select="html/body"/>
</p>
</xsl:template>
<!-- match body, but do not add content; this excludes /html/body elements -->
<xsl:template match="body">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="h5">
<!-- record the current h5 title -->
<xsl:variable name="title" select="."/>
<h5>
<xsl:attribute name="title">
<xsl:value-of select="$title" />
</xsl:attribute>
<xsl:for-each select="following-sibling::div[preceding-sibling::h5[1] = $title]">
<!-- deep copy of each consecutive div following the current h5 element -->
<xsl:copy-of select="." />
</xsl:for-each>
</h5>
</xsl:template>
<!-- match div, but do not output anything since we are copying it into the new h5 element -->
<xsl:template match="div" />
</xsl:stylesheet>
''')
transform = etree.XSLT(xsl)
with open("doc.xml") as f:
print(transform(etree.parse(f)), end='')
如果樣式表存儲在文件名 doc.xsl 中,則可以使用 libxml 實用程序 xsltproc 獲得相同的結(jié)果:
xsltproc doc.xsl doc.xml
結(jié)果:
<?xml version="1.0"?>
<p>
<h5 title="Fruits">
<div>This is some <span attr="foo">Text</span>.</div>
<div>Some <span>more</span> text.</div>
</h5>
<h5 title="Vegetables">
<div>Yet another line <span attr="bar">of</span> text.</div>
<div>This span will get <span attr="foo">removed</span> as well.</div>
<div>Nested elements <span attr="foo">will <b>be</b> left</span> alone.</div>
<div>Unless <span attr="foo">they <span attr="foo">also</span> match</span>.</div>
</h5>
</p>
添加回答
舉報