課程
/后端開發(fā)
/Python
/Python開發(fā)簡單爬蟲
如題。好像是突然出現(xiàn)的,main里面沒有,也不是傳入的參數(shù)
2016-07-28
源自:Python開發(fā)簡單爬蟲 7-5
正在回答
spider_main中的 ?root_url ?通過urls.get_new_url()獲取的可爬取的URL
def?craw(self,root_url): ????????count=1 ????????self.urls.add_new_url(root_url) ????????while?self.urls.has_new_url(): ????????????try: ????????????????new_url=self.urls.get_new_url() ????????????????print?'craw?%d:%s'?%(count,new_url) ????????????????html_cont=self.downloader.download(new_url) ????????????????new_urls,new_data=self.parser.parser(new_url,html_cont) ????????????????self.urls.add_new_urls(new_urls) ????????????????self.outputer.collect_data(new_data) ????????????????if?count==1000: ????????????????????break ????????????????count=count+1 ????????????except: ????????????????print?'craw?feild' ????????self.outputer.output_html() ???????? if?__name__=="__main__": ????root_url="http://baike.baidu.com/view/21087.htm" ????obj_spider=SpiderMain() ????obj_spider.craw(root_url)
舉報
本教程帶您解開python爬蟲這門神奇技術(shù)的面紗
Copyright ? 2025 imooc.com All Rights Reserved | 京ICP備12003892號-11 京公網(wǎng)安備11010802030151號
購課補貼聯(lián)系客服咨詢優(yōu)惠詳情
慕課網(wǎng)APP您的移動學習伙伴
掃描二維碼關(guān)注慕課網(wǎng)微信公眾號
2016-07-28
spider_main中的 ?root_url ?通過urls.get_new_url()獲取的可爬取的URL