首頁猿問 Scrapy 不解析項目

Scrapy 不解析項目

Python

拉風(fēng)的咖菲貓 2021-12-21 17:26:15

我正在嘗試使用 pegination 抓取網(wǎng)頁，但回電不解析項目，任何幫助將不勝感激....這里是代碼# -*- coding: utf-8 -*-import scrapyfrom ..items import EscrotsItemclass Escorts(scrapy.Spider): name = 'escorts' allowed_domains = ['www.escortsandbabes.com.au'] start_urls = ['https://escortsandbabes.com.au/Directory/ACT/Canberra/2600/Any/All/'] def parse_links(self, response): for i in response.css('.btn.btn-default.btn-block::attr(href)').extract()[2:]: yield scrapy.Request(url=response.urljoin(i),callback=self.parse) NextPage = response.css('.page.next-page::attr(href)').extract_first() if NextPage: yield scrapy.Request( url=response.urljoin(NextPage), callback=self.parse_links) def parse(self, response): for x in response.xpath('//div[@class="advertiser-profile"]'): item = EscrotsItem() item['Name'] = x.css('.advertiser-names--display-name::text').extract_first() item['Username'] = x.css('.advertiser-names--username::text').extract_first() item['Phone'] = x.css('.contact-number::text').extract_first() yield item

查看完整描述

1 回答

溫溫醬

TA貢獻1752條經(jīng)驗獲得超4個贊

您的代碼調(diào)用 urlsstart_urls并parse運行。由于沒有任何div.advertiser-profile元素，它確實應(yīng)該在沒有任何結(jié)果的情況下關(guān)閉。所以你的parse_links函數(shù)根本沒有被調(diào)用。

更改函數(shù)名稱：

import scrapy

class Escorts(scrapy.Spider):

name = 'escorts'

allowed_domains = ['escortsandbabes.com.au']

start_urls = ['https://escortsandbabes.com.au/Directory/ACT/Canberra/2600/Any/All/']

def parse(self, response):

for i in response.css('.btn.btn-default.btn-block::attr(href)').extract()[2:]:

yield scrapy.Request(response.urljoin(i), self.parse_links)

next_page = response.css('.page.next-page::attr(href)').get()

if next_page:

yield scrapy.Request(response.urljoin(next_page))

def parse_links(self, response):

for x in response.xpath('//div[@class="advertiser-profile"]'):

item = {}

item['Name'] = x.css('.advertiser-names--display-name::text').get()

item['Username'] = x.css('.advertiser-names--username::text').get()

item['Phone'] = x.css('.contact-number::text').get()

yield item

我來自scrapy shell的日志：

In [1]: fetch("https://escortsandbabes.com.au/Directory/ACT/Canberra/2600/Any/All/")

2019-03-29 15:22:56 [scrapy.core.engine] INFO: Spider opened

2019-03-29 15:23:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://escortsandbabes.com.au/Directory/ACT/Canberra/2600/Any/All/> (referer: None, latency: 2.48 s)

In [2]: response.css('.page.next-page::attr(href)').get()

Out[2]: u'/Directory/ACT/Canberra/2600/Any/All/?p=2'

反對回復(fù) 2021-12-21