課程
                    
                        /后端開發(fā)
                        
                            /Python
                        
                        /Python開發(fā)簡(jiǎn)單爬蟲

nameerror:name 'html_doc' is not defined

nameerror:name 'html_doc' is not defined是怎么回事?。?/p>

qq_夜航船_0

2016-02-29

源自：Python開發(fā)簡(jiǎn)單爬蟲 6-4

關(guān)注問(wèn)題我要回答

7038

操作

收起

1 回答

追電回答被采納 +3 積分
2016-02-29

需要先定義

#encoding:utf-8

from?bs4?import?BeautifulSoup
import?re

html_doc?=?"""
<html><head><title>The?Dormouse's?story</title></head>
<body>
<p?class="title"><b>The?Dormouse's?story</b></p>
<p?class="story">Once?upon?a?time?there?were?three?little?sisters;?and?their?names?were
<a?>Elsie</a>,
<a?>Lacie</a>?and
<a?>Tillie</a>;
and?they?lived?at?the?bottom?of?a?well.</p>
<p?class="story">...</p>
"""

soup?=?BeautifulSoup(
????????html_doc,
????????'html.parser',
????????from_encoding?=?'utf-8')
links?=?soup.find_all('a')

#?獲取所有超鏈接的元素
for?link?in?links:
????print?link.name,link['href'],link.get_text()

#?獲取某個(gè)超鏈接的元素
link_node?=?soup.find('a',?href?=?'http://example.com/tillie')
print?'某個(gè)元素：',link.name,link['href'],link.get_text()

#?正則表達(dá)式模糊匹配獲取元素
link_node?=?soup.find('a',?re.compile(r'sie'))
print?'正則匹配：',link.name,link['href'],link.get_text()

#?獲取指定的屬性的節(jié)點(diǎn)
link_node?=?soup.find('p',?class_?=?'title')
print?'屬性匹配：',link.name,link.get_text()