課程
/后端開發(fā)
/Python
/Python開發(fā)簡(jiǎn)單爬蟲
代碼點(diǎn)擊運(yùn)行,控制臺(tái)不到一秒就說已結(jié)束,但是又不報(bào)錯(cuò),也沒文本生成。。。。。。求幫助0.0謝謝啦
2016-01-16
源自:Python開發(fā)簡(jiǎn)單爬蟲 7-7
正在回答
你和我遇到的問題一樣 ?主要是縮進(jìn)問題;我現(xiàn)在只能出一條就停止了。
念墨
念墨 回復(fù) 念墨
PhoebeChen 回復(fù) 念墨
HHelloWWorld
HHelloWWorld 回復(fù) 念墨
new_urls, new_data = self.parser.parser(new_url, html_cont)
這句代碼怎么理解,分別給new_urls,new_data復(fù)制么?
http://baike.baidu.com/link?url=sP2Dq8raiXUsDeUd8GbiC1C1HcvMO8I8dkoUi5UcIiDwFckEBG9G4KHTKVsCPWoPh1y4LDZZKtEeIA59EYISx_
換成http://baike.baidu.com/view/4072022.htm
牛妖妖 提問者
#conding:utf8from spirder import manager, downloader, parser, outputclass spirdermain(object):??? def __init__(self):??????? self.manager=manager.UrlManager()??????? self.downloader=downloader.HtmlDownloader()??????? self.parser=parser.HtmlParser()??????? self.output=output.HtmlOutput()?????? ???? def craw(self, root_url):??????? count=1??????? self.manager.add_new_url(root_url)??????? while self.manager.has_new_url():??????????? try:??????????????? new_url=self.manager.get_new_url()??????????????? print 'craw %d:%s'%(count,new_url)??????????????? html_cont=self.downloader.downloader(new_url)??????????????? new_urls,new_data=self.parser.parser(new_url,html_cont)??????????????? self.manager.add_new_urls(new_urls)??????????????? self.output.collect_data(new_data)??????????????? if count==1000:??????????????????? break??????????????? count=count+1??????????? except:??????????????? print 'craw failed'??????? self.output.output_html()?? ?if __name__=="_main_":??? root_url="http://baike.baidu.com/link?url=sP2Dq8raiXUsDeUd8GbiC1C1HcvMO8I8dkoUi5UcIiDwFckEBG9G4KHTKVsCPWoPh1y4LDZZKtEeIA59EYISx_"??? obj_spirder=spirdermain()??? obj_spirder.craw(root_url)
額,這是代碼。。。。。英文打錯(cuò)了,打成spirder了,但是應(yīng)該沒問題的
舉報(bào)
本教程帶您解開python爬蟲這門神奇技術(shù)的面紗
Copyright ? 2025 imooc.com All Rights Reserved | 京ICP備12003892號(hào)-11 京公網(wǎng)安備11010802030151號(hào)
購(gòu)課補(bǔ)貼聯(lián)系客服咨詢優(yōu)惠詳情
慕課網(wǎng)APP您的移動(dòng)學(xué)習(xí)伙伴
掃描二維碼關(guān)注慕課網(wǎng)微信公眾號(hào)
2016-01-30
你和我遇到的問題一樣 ?主要是縮進(jìn)問題;我現(xiàn)在只能出一條就停止了。
2016-01-17
new_urls, new_data = self.parser.parser(new_url, html_cont)
這句代碼怎么理解,分別給new_urls,new_data復(fù)制么?
2016-01-16
http://baike.baidu.com/link?url=sP2Dq8raiXUsDeUd8GbiC1C1HcvMO8I8dkoUi5UcIiDwFckEBG9G4KHTKVsCPWoPh1y4LDZZKtEeIA59EYISx_
換成http://baike.baidu.com/view/4072022.htm
2016-01-16
#conding:utf8
from spirder import manager, downloader, parser, output
class spirdermain(object):
??? def __init__(self):
??????? self.manager=manager.UrlManager()
??????? self.downloader=downloader.HtmlDownloader()
??????? self.parser=parser.HtmlParser()
??????? self.output=output.HtmlOutput()
?????? ?
??? def craw(self, root_url):
??????? count=1
??????? self.manager.add_new_url(root_url)
??????? while self.manager.has_new_url():
??????????? try:
??????????????? new_url=self.manager.get_new_url()
??????????????? print 'craw %d:%s'%(count,new_url)
??????????????? html_cont=self.downloader.downloader(new_url)
??????????????? new_urls,new_data=self.parser.parser(new_url,html_cont)
??????????????? self.manager.add_new_urls(new_urls)
??????????????? self.output.collect_data(new_data)
??????????????? if count==1000:
??????????????????? break
??????????????? count=count+1
??????????? except:
??????????????? print 'craw failed'
??????? self.output.output_html()?? ?
if __name__=="_main_":
??? root_url="http://baike.baidu.com/link?url=sP2Dq8raiXUsDeUd8GbiC1C1HcvMO8I8dkoUi5UcIiDwFckEBG9G4KHTKVsCPWoPh1y4LDZZKtEeIA59EYISx_"
??? obj_spirder=spirdermain()
??? obj_spirder.craw(root_url)