在抓取一個頁面時HTMLParser.HTMLParseError: malformed start tag
在采用BeautifulSoup提取html頁面時,出現(xiàn)HTMLParser.HTMLParseError: malformed start tag的錯誤,請問如何解決?
在采用BeautifulSoup提取html頁面時,出現(xiàn)HTMLParser.HTMLParseError: malformed start tag的錯誤,請問如何解決?
2016-10-11
舉報
2016-10-11
$ pip install beautifulsoup4
$ pip install html5lib
Python:
from bs4 import BeautifulSoup
import urllib2
url = 'http://www.example.com'
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), 'html5lib')
links = soup.findAll('a')for link in links:
? ?print link.string, link['href']