我想刪除 [ ] 括號 scrapy 添加到它的所有輸出中,為此您只需在 xpath 語句的末尾添加 [0] ,如下所示:'a[@class="question-hyperlink"]/text()').extract()[0]這在某些情況下解決了 [] 問題,但在其他情況下,scrapy 將每第二行輸出返回為空白,因此在使用 [0] 時它到達第二行時出現(xiàn)錯誤:Index error: list index out of range如何防止scrapy創(chuàng)建空行?這似乎是一個常見問題,但每個人在導出為 CSV 時都會遇到這個問題,而對我來說,在導出為 CSV 之前,它是帶有scrapy 響應的。項目.py:import scrapyfrom scrapy.item import Item, Fieldclass QuestionItem(Item): title = Field() url = Field()class PopularityItem(Item): votes = Field() answers = Field() views = Field()class ModifiedItem(Item): lastModified = Field() modName = Field()不會每隔一行輸出為空白并因此與 [0] 一起使用的蜘蛛:from scrapy import Spiderfrom scrapy.selector import Selectorfrom stack.items import QuestionItemclass QuestionSpider(Spider): name = "questions" allowed_domains = ["stackoverflow.com"] start_urls = [ "http://stackoverflow.com/questions?pagesize=50&sort=newest", ] def parse(self, response): questions = Selector(response).xpath('//div[@class="summary"]/h3') for question in questions: item = QuestionItem() item['title'] = question.xpath( 'a[@class="question-hyperlink"]/text()').extract()[0] item['url'] = question.xpath( 'a[@class="question-hyperlink"]/@href').extract()[0] yield item每隔一行輸出為空白的蜘蛛:from scrapy import Spiderfrom scrapy.selector import Selectorfrom stack.items import PopularityItemclass PopularitySpider(Spider): name = "popularity" allowed_domains = ["stackoverflow.com"] start_urls = [ "https://stackoverflow.com/", ] def parse(self, response): popularity = response.xpath('//div[contains(@class, "question-summary narrow")]/div') for poppart in popularity:
添加回答
舉報
0/150
提交
取消