我的結(jié)果示例如下,“元谷”重復(fù)了5次,其他的也重復(fù)了4次(而且我爬的時(shí)候后面還出現(xiàn)了429禁止我爬?。Qa(bǔ):后來(lái)再試試發(fā)現(xiàn)不管多層爬取的事,把loupan_detail_parse去掉也出現(xiàn)重復(fù),而且loupan_item不能print出來(lái)控制臺(tái)上顯示重復(fù)了兩次我的代碼如下#?-*-?coding:?utf-8?-*-
import?scrapy
from?papa.items?import?PapaItem
import?re
class?PappSpider(scrapy.Spider):
????name?=?'papp'
????allowed_domains?=?['xa.fang.ke.com']
????start_urls?=?['https://xa.fang.ke.com/loupan/nhs1/']
????count=1
????page_end=48
????def?parse(self,?response):
????????loupan_iist=response.xpath("http://div[@class='resblock-desc-wrapper']")
????????for?i_item?in?loupan_iist:
????????????loupan_item=PapaItem()
????????????quwei?=i_item.xpath(".//a[@class='resblock-location']/text()").extract()
????????????loupan_item['quwei']?=?quwei[1].replace("\t",?"").replace("\n",?"")
????????????loupan_item['loupan_name']?=?i_item.xpath(".//div[@class='resblock-name']/a/text()").extract()
????????????loupan_item['resblock_type']?=?i_item.xpath(".//div[@class='resblock-name']/span[1]/text()").extract()
????????????loupan_item['loupan_type']?=?i_item.xpath(".//div[@class='resblock-name']/span[2]/text()").extract()
????????????loupan_item['resblock_type']=?i_item.xpath(".//div[@class='resblock-tag']/span/text()").extract()
????????????loupan_tag=i_item.xpath(".//div[@class='resblock-tag']/span/text()").extract()
????????????loupan_item['loupan_tag']="/".join(loupan_tag)
????????????loupan_item['jun_jia']=i_item.xpath(".//div[@class='resblock-price']/div[@class='main-price']/span[@class='number']/text()").extract()
????????????xiangqing_url=i_item.xpath(".//div[@class='resblock-name']/a/@href").extract()
????????????xiang_url='https://'+self.allowed_domains[0]+xiangqing_url[0]+"xiangqing"
????????????yield?scrapy.Request(xiang_url,?meta={'item':?loupan_item},?callback=self.loupan_detail_parse)
????????self.count?=?self.count?+?1
????????if?self.count<=self.page_end:
????????????nextPage=self.start_urls[0][:-1]+'pg%s'%self.count+'/'
????????????yield?scrapy.Request(nextPage,callback=self.parse)
????????else:
????????????return?None
????def?loupan_detail_parse(self,response):
????????item=response.meta['item']
????????xiangqing=response.xpath("http://ul[@class='x-box']")
????????kaifashang=xiangqing[0].xpath(".//li[7]/span[@class='label-val']/text()").extract()
????????guihua=response.xpath("http://ul[@class='x-box'][2]/li/span[@class='label-val']/text()").extract()
????????guihua=?["".join(re.findall('[".:\w+"]',i))?for?i?in?guihua]
????????peitao=xiangqing[2].xpath(".//span[@class='label-val']/text()").extract()
????????peitao=["".join(re.findall('[".:\w+"]',i))?for?i?in?peitao]
????????item['kaifashang']=kaifashang
????????item['jianzhu_type']=guihua[0]
????????item['lvhua']=guihua[1]
????????item['zhandi_area']=guihua[2]
????????item['rongji']=guihua[3]
????????item['jianzhu_area']=guihua[4]
????????item['wuye_type']=guihua[5]
????????item['hushu']=guihua[6]
????????item['chanquan_year']=guihua[7]
????????item['wuye']=peitao[0]
????????item['cheweibi']=peitao[1]
????????item['wuyefei']=peitao[2]
????????item['gongnuan']=peitao[3]
????????item['gongshui']=peitao[4]
????????item['gongdian']=peitao[5]
????????item['chewei']=peitao[6]
????????return?item
問(wèn)問(wèn)大神,scrapy代碼為啥爬出來(lái)的數(shù)據(jù)重復(fù)了呢?
無(wú)無(wú)法師
2019-12-23 18:08:25