1 回答

TA貢獻(xiàn)1864條經(jīng)驗(yàn) 獲得超2個(gè)贊
如果表的結(jié)構(gòu)是<td class="attrLabels"> ...Attribute... </td><td> ...Attribute value... </td>,您可以執(zhí)行以下操作(txt是您的 HTML 片段):
from pprint import pprint
from bs4 import BeautifulSoup
soup = BeautifulSoup(txt, 'html.parser')
out = {}
for attr, txt in zip(soup.select('td.attrLabels'), soup.select('td.attrLabels + td')):
out[attr.get_text(strip=True)] = txt.get_text(strip=True).split(':')[0]
# pretty print to screen:
pprint(out)
印刷:
{'Brand:': 'MyBrand',
'Condition:': 'New',
'MPN:': 'Does Not Apply',
'UPC:': 'Does not apply'}
添加回答
舉報(bào)