課程
/后端開發(fā)
/Python
/Python開發(fā)簡(jiǎn)單爬蟲
nameerror:name 'html_doc' is not defined是怎么回事?。?/p>
2016-02-29
源自:Python開發(fā)簡(jiǎn)單爬蟲 6-4
正在回答
需要先定義
#encoding:utf-8 from?bs4?import?BeautifulSoup import?re html_doc?=?""" <html><head><title>The?Dormouse's?story</title></head> <body> <p?class="title"><b>The?Dormouse's?story</b></p> <p?class="story">Once?upon?a?time?there?were?three?little?sisters;?and?their?names?were <a?>Elsie</a>, <a?>Lacie</a>?and <a?>Tillie</a>; and?they?lived?at?the?bottom?of?a?well.</p> <p?class="story">...</p> """ soup?=?BeautifulSoup( ????????html_doc, ????????'html.parser', ????????from_encoding?=?'utf-8') links?=?soup.find_all('a') #?獲取所有超鏈接的元素 for?link?in?links: ????print?link.name,link['href'],link.get_text() #?獲取某個(gè)超鏈接的元素 link_node?=?soup.find('a',?href?=?'http://example.com/tillie') print?'某個(gè)元素:',link.name,link['href'],link.get_text() #?正則表達(dá)式模糊匹配獲取元素 link_node?=?soup.find('a',?re.compile(r'sie')) print?'正則匹配:',link.name,link['href'],link.get_text() #?獲取指定的屬性的節(jié)點(diǎn) link_node?=?soup.find('p',?class_?=?'title') print?'屬性匹配:',link.name,link.get_text()
qq_夜航船_0 提問(wèn)者
舉報(bào)
本教程帶您解開python爬蟲這門神奇技術(shù)的面紗
Copyright ? 2025 imooc.com All Rights Reserved | 京ICP備12003892號(hào)-11 京公網(wǎng)安備11010802030151號(hào)
購(gòu)課補(bǔ)貼聯(lián)系客服咨詢優(yōu)惠詳情
慕課網(wǎng)APP您的移動(dòng)學(xué)習(xí)伙伴
掃描二維碼關(guān)注慕課網(wǎng)微信公眾號(hào)
2016-02-29
需要先定義