沒去try前只能爬取一條 去掉后發(fā)生這個錯誤 大神求教!!!!
craw 1 : http://baike.baidu.com/item/Python/407313
Traceback (most recent call last):
? File "D:/pythonlesson1/baike_spider/spider_main.py", line 33, in <module>
? ? obj_spider.craw(root_url)
? File "D:/pythonlesson1/baike_spider/spider_main.py", line 19, in craw
? ? html_cont = self.downloader.download(new_url)
? File "D:\pythonlesson1\baike_spider\html_downloader.py", line 9, in download
? ? response = urllib.request.urlopen(url)
? File "D:\anaconda3.7\lib\urllib\request.py", line 222, in urlopen
? ? return opener.open(url, data, timeout)
? File "D:\anaconda3.7\lib\urllib\request.py", line 531, in open
? ? response = meth(req, response)
? File "D:\anaconda3.7\lib\urllib\request.py", line 641, in http_response
? ? 'http', request, response, code, msg, hdrs)
? File "D:\anaconda3.7\lib\urllib\request.py", line 563, in error
? ? result = self._call_chain(*args)
? File "D:\anaconda3.7\lib\urllib\request.py", line 503, in _call_chain
? ? result = func(*args)
? File "D:\anaconda3.7\lib\urllib\request.py", line 755, in http_error_302
? ? return self.parent.open(new, timeout=req.timeout)
? File "D:\anaconda3.7\lib\urllib\request.py", line 525, in open
? ? response = self._open(req, data)
? File "D:\anaconda3.7\lib\urllib\request.py", line 548, in _open
? ? 'unknown_open', req)
? File "D:\anaconda3.7\lib\urllib\request.py", line 503, in _call_chain
? ? result = func(*args)
? File "D:\anaconda3.7\lib\urllib\request.py", line 1387, in unknown_open
? ? raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: https>
2019-03-16
或者在'html_parser' 中,改成:
2019-03-15
試試在 'html_downloader' 中加上:
2019-02-21
https://github.com/DaddySheng/Python_craw_test1/blob/master/Python3_craw_code.py
try的作用原本就是跳過無法爬取的東西,你剔除了自然會出事
這個網(wǎng)站的代碼可以爬最新的,而且是PYTHON3