首頁猿問 scrapy（python）中的無...

scrapy（python）中的無效xpath

Python

慕工程0101907 2023-05-23 10:27:42

你好我正在嘗試使用 scrapy 構(gòu)建一個(gè)爬蟲我的爬蟲代碼是：import scrapyfrom shop.items import ShopItemclass ShopspiderSpider(scrapy.Spider): name = 'shopspider' allowed_domains = ['www.organics.com'] start_urls = ['https://www.organics.com/product-tag/special-offers/'] def parse(self, response): items = ShopItem() title = response.xpath('//*[@id="content"]/div[2]/div[1]/ul/li[1]/a/h3').extract() sale_price = response.xpath('//*[@id="content"]/div[2]/div[1]/ul/li[1]/a/span[2]/del/span').extract() product_original_price = response.xpath('//*[@id="content"]/div[2]/div[1]/ul/li[1]/a/span[2]/ins/span').extract() category = response.xpath('//*[@id="content"]/div[2]/div[1]/ul/li[1]/a/span[2]/ins/span').extract() items['product_name'] = ''.join(title).strip() items['product_sale_price'] = ''.join(sale_price).strip() items['product_original_price'] = ''.join(product_original_price).strip() items['product_category'] = ','.join(map(lambda x: x.strip(), category)).strip() yield items但是當(dāng)我運(yùn)行命令： scrapy crawl shopspider -o info.csv以查看輸出時(shí)，我只能找到有關(guān)第一個(gè)產(chǎn)品的信息，而不是此頁面中的所有產(chǎn)品。所以我刪除了 xpath 中 [ ] 之間的數(shù)字，例如標(biāo)題的 xpath ://*[@id="content"]/div/div/ul/li/a/h3 但仍然得到相同的結(jié)果。結(jié)果是：<span class="amount">?￡40.00</span>,<h3>Halo Skincare Organic Gift Set</h3>,"<span class=""amount"">?￡40.00</span>","<span class=""amount"">?￡58.00</span>"請(qǐng)幫忙

查看完整描述

1 回答

Smart貓小萌

TA貢獻(xiàn)1911條經(jīng)驗(yàn) 獲得超7個(gè)贊

如果您刪除 XPath 上的索引，它們將找到頁面中的所有項(xiàng)目：

response.xpath('//*[@id="content"]/div/div/ul/li/a/h3').extract() # Returns 7 items

但是，您應(yīng)該注意到這將返回所選 html 元素的字符串列表。如果您想要元素內(nèi)的文本，您應(yīng)該添加/text()XPath。（這看起來像你做的）

另外，你只得到一個(gè)回報(bào)的原因是因?yàn)槟阍趯⑺许?xiàng)目分配給時(shí)將它們連接成一個(gè)字符串item：

items['product_name'] = ''.join(title).strip()

這title是一個(gè)元素列表，您將它們?nèi)窟B接在一個(gè)字符串中。相同的邏輯適用于其他變量

如果那真的是您想要的，您可以忽略以下內(nèi)容，但我相信更好的方法是分別執(zhí)行 for 循環(huán)和yield它們？

我的建議是：

def parse(self, response):

products = response.xpath('//*[@id="content"]/div/div/ul/li')

for product in products:

items = ShopItem()

items['product_name'] = product.xpath('a/h3/text()').get()

items['product_sale_price'] = product.xpath('a/span/del/span/text()').get()

items['product_original_price'] = product.xpath('a/span/ins/span/text()').get()

items['product_category'] = product.xpath('a/span/ins/span/text()').get()

yield items

請(qǐng)注意，在您的原始代碼中，您的categoryvar 與您的具有相同的 XPath product_original_price，我將邏輯保留在代碼中，但這可能是一個(gè)錯(cuò)誤。

反對(duì) 回復(fù) 2023-05-23

1 回答
0 關(guān)注
183 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

scrapy（python）中的無效xpath

scrapy（python）中的無效xpath

1 回答

添加回答