課程
/后端開發(fā)
/Python
/Python開發(fā)簡單爬蟲
我感覺沒啥錯啊,都是對著老師的代碼敲得,哭泣
2017-09-17
源自:Python開發(fā)簡單爬蟲 7-2
正在回答
#?載入url管理器,解析器,下載器,輸出器模塊 from?SpiderBaike?import?url_manager,?html_parser,?html_downloader,?html_output #?定義爬蟲類 class?Spider(object): ????def?__init__(self): ????????#?url管理器類等生成實例對象 ????????self.url_manager?=?url_manager.UrlManager() ????????self.parser?=?html_parser.Parser() ????????self.downloader?=?html_downloader.Downloader() ????????self.output?=?html_output.Output() ????def?crawl(self,?root_url): ????????count?=?1 ????????self.url_manager.add(root_url) ????????while?self.url_manager.has_new(): ????????????try: ????????????????new_url?=?self.url_manager.get_url() ????????????????print(count) ????????????????print(new_url) ????????????????html_cont?=?self.downloader.download(new_url) ????????????????new_urls,?data?=?self.parser.parse(new_url,?html_cont) ????????????????self.output.get_data(data) ????????????????self.url_manager.add_list(new_urls) ????????????????if?count?==?10: ????????????????????break ????????????????count?=?count?+?1 ????????????except: ????????????????print('Craw?failed') if?__name__?==?'__main__': ????root_url?=?'https://baike.baidu.com/item/Python' ????obj_spider?=?Spider() ????obj_spider.crawl(root_url)
程序媛_ 提問者
#?載入url管理器,解析器,下載器,輸出器模塊 from?SpiderBaike?import?url_manager,?html_parser,?html_downloader,?html_output #?定義爬蟲類 class?Spider(object): ????def?__int__(self): ????????#?url管理器類等生成實例對象 ????????self.m_url_manager?=?url_manager.UrlManager() ????????self.m_parser?=?html_parser.Parser() ????????self.m_downloader?=?html_downloader.Downloader() ????????self.m_output?=?html_output.Output() ????def?crawl(self,?root_url): ????????print(root_url) ????????#?count?=?1 ????????self.m_url_manager.add(root_url) ????????#?while?self.m_url_manager.has_new(): ????????#?????try: ????????#?????????new_url?=?self.m_url_manager.get_url() ????????#?????????print(count) ????????#?????????print(new_url) ????????#?????????html_cont?=?self.m_downloader.download(new_url) ????????#?????????new_urls,?data?=?self.m_parser.parse(new_url,?html_cont) ????????#?????????self.m_output.get_data(data) ????????#?????????self.m_url_manager.add_list(new_urls) ????????#?????????if?count?==?10: ????????#?????????????break ????????#?????????count?=?count?+?1 ????????#?????except: ????????#?????????print('Craw?failed') if?__name__?==?'__main__': ????root_url?=?'https://baike.baidu.com/item/Python' ????obj_spider?=?Spider() ????obj_spider.crawl(root_url)
?我的和你一樣報錯,怎么解決?
他說我第14行出錯,哪里錯了?傷心
慕桂英4333026
SpiderMain中的urls變量沒有初聲明吧
舉報
本教程帶您解開python爬蟲這門神奇技術(shù)的面紗
Copyright ? 2025 imooc.com All Rights Reserved | 京ICP備12003892號-11 京公網(wǎng)安備11010802030151號
購課補貼聯(lián)系客服咨詢優(yōu)惠詳情
慕課網(wǎng)APP您的移動學習伙伴
掃描二維碼關(guān)注慕課網(wǎng)微信公眾號
2017-09-24
2017-09-24
?我的和你一樣報錯,怎么解決?
2017-09-18
他說我第14行出錯,哪里錯了?傷心
2017-09-17
SpiderMain中的urls變量沒有初聲明吧