課程
/后端開發(fā)
/Python
/Python開發(fā)簡單爬蟲
name 'new_urls' is not defined
這個怎么解決
2016-12-01
源自:Python開發(fā)簡單爬蟲 7-5
正在回答
#解析器 from?bs4?import?BeautifulSoup import?re import?urllib.parse class?HtmlParser(object): ????def?_get_new_urls(self,?page_url,?soup): ????????#/view/123.htm ????????new_urls?=?set() ????????links?=?soup.find_all('a',?href?=?re.compile(r"/view/\d+\.htm")) ????????for?link?in?links: ????????????new_url?=?link['href'] ????????????new_full_url?=?urllib.parse.urljoin(page_url,new_url) ????????????new_urls.add(new_full_url) ????????????#print(new_urls) ????????????return?new_urls ????def?_get_new_data(self,?page_url,?soup): ????????res_data?=?{} ????????#?url ????????res_data['url']?=?page_url ????????#?<> ????????title_node?=?soup.find('dd',?class_?=?"lemmaWgt-lemmaTitle-title").find("h1") ????????res_data['title']?=?title_node.get_text() ????????#<> ????????summary_node?=?soup.find('div',class_?=?"lemma-summary") ????????res_data['summary']?=?summary_node.get_text() ????????return?res_data ????def?parse(self,?page_url,?html_cont): ????????if?page_url?is?None?or?html_cont?is?None: ????????????return ????????soup?=?BeautifulSoup(html_cont,?'html.parser',?from_encoding='utf-8') ????????new_urls?=?self._get_new_urls(page_url,?soup) ????????new_data?=?self._get_new_data(page_url,?soup) ????????return?new_urls,?new_data
這個代碼哪兒出錯了,我沒找到
意思是'new_urls'沒有定義啊 你看看你單詞寫錯沒.
舉報
本教程帶您解開python爬蟲這門神奇技術(shù)的面紗
Copyright ? 2025 imooc.com All Rights Reserved | 京ICP備12003892號-11 京公網(wǎng)安備11010802030151號
購課補貼聯(lián)系客服咨詢優(yōu)惠詳情
慕課網(wǎng)APP您的移動學(xué)習(xí)伙伴
掃描二維碼關(guān)注慕課網(wǎng)微信公眾號
2016-12-01
這個代碼哪兒出錯了,我沒找到
2016-12-01
意思是'new_urls'沒有定義啊 你看看你單詞寫錯沒.