1 回答

TA貢獻1820條經(jīng)驗 獲得超9個贊
使用BeautifulSoup,lxml或類似模塊代替regex.
美麗湯:
from bs4 import BeautifulSoup
text = '<div class="ds"><div title="Today" class="dh">...<div title="Pazartesi" class="dh">26 Agu Pzt'
soup = BeautifulSoup(text, 'html.parser')
for item in soup.select('.ds div[title]'):
print(item['title'])
# or as list comprehensions
titles = [item['title'] for item in soup.select('.ds div[title]')]
print(titles)
lxml:
import lxml.html
text = '<div class="ds"><div title="Today" class="dh">...<div title="Pazartesi" class="dh">26 Agu Pzt'
soup = lxml.html.fromstring(text)
for item in soup.cssselect('.ds div[title]'):
print(item.attrib['title'])
# or as list comprehensions
titles = [item.attrib['title'] for item in soup.cssselect('.ds div[title]')]
print(titles)
查詢:
import pyquery
text = '<div class="ds"><div title="Today" class="dh">...<div title="Pazartesi" class="dh">26 Agu Pzt'
soup = pyquery.PyQuery(text)
for item in soup('.ds div[title]'):
print(item.attrib['title'])
# or as list comprehensions
titles = [item.attrib['title'] for item in soup('.ds div[title]')]
print(titles)
parsel : (由Scrapy 的 Selectors使用)
import parsel
sel = parsel.Selector(text)
for item in sel.css('.ds div[title]'):
print(item.attrib['title'])
titles = [item.attrib['title'] for item in sel.css('.ds div[title]')]
print(titles)
添加回答
舉報