循環(huán)抓取后頁的問題
2018-08-13 11:37:59 [scrapy.core.scraper] ERROR: Spider error processing <GET https://movie.douban.com/top250> (referer: None)
Traceback (most recent call last):
? File "/usr/local/lib/python3.7/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
? ? yield next(it)
? File "/usr/local/lib/python3.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output
? ? for x in result:
? File "/usr/local/lib/python3.7/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
? ? return (_set_referer(r) for r in result or ())
? File "/usr/local/lib/python3.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
? ? return (r for r in result or () if _filter(r))
? File "/usr/local/lib/python3.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
? ? return (r for r in result or () if _filter(r))
? File "/usr/local/douban/douban/spiders/douban_spider.py", line 36, in parse
? ? next_link = response.xpath("http://span[@class='next']/link/@href").extarct()
AttributeError: 'SelectorList' object has no attribute 'extarct'
大壯老師,我根據你的教程后親測了一下,發(fā)現在抓取后頁URL時,不能正確獲取到,拿到的數據只有前25條。
請大壯老師賜教一番。
2018-09-05
貼上你的代碼