首頁猿問為什么最后一個(gè)函數(shù)沒有執(zhí)行？

為什么最后一個(gè)函數(shù)沒有執(zhí)行？

Python

明月笑刀無情 2022-01-11 16:58:56

我正在抓取 Dmoz 網(wǎng)站，我抓取了關(guān)于頁面，但是當(dāng)我按名稱創(chuàng)建另一個(gè)函數(shù)parse_editor并嘗試抓取時(shí)，它沒有給我結(jié)果。from ..items import DmoztutorialItemimport scrapyclass DmozSpiderSpider(scrapy.Spider): name = 'Dmoz' start_urls = ['http://dmoz-odp.org/'] about_page = 'http://dmoz-odp.org/docs/en/about.html' editor = 'http://dmoz-odp.org/docs/en/help/become.html' def parse(self, response): # collect data on first page items = { 'Navbar': response.css('#main-nav a::text').extract(), 'Category_names': response.css('.top-cat a::text').extract(), 'Subcategories': response.css('.sub-cat a::text').extract(), 'About_page': self.about_page, 'Become_an_editor': self.editor } # save and call request to another page yield response.follow(self.about_page, self.parse_about, self.editor, self.parse_editor, meta={'items': items}) def parse_about(self, response): # do your stuff on second page items = response.meta['items'] items['Headings'] = response.css('h2::text , #mainContent h1::text').extract() # add your logics items['Paragraphs'] = response.css('p::text').extract() items['3 Projects'] = response.css('li~ li+ li b a::text , li:nth-child(1) b a::text').extract() items['About Dmoz'] = response.css('.nav ul a::text , li:nth-child(2) b a::text').extract() items['Languages'] = response.css('.nav~ .nav a::text').extract() items['You can make a difference'] = response.css('dd::text , #about-contribute::text').extract() items['Further information'] = response.css('li::text , #about-more-info a::text').extract() yield items def parse_editor(self, response): # do your stuff on third page editor_items = response.meta['items'] editor_items['Heading'] = response.css('#mainContent h1::text').extract() yield editor_items

查看完整描述

1 回答

斯蒂芬大帝

TA貢獻(xiàn)1827條經(jīng)驗(yàn) 獲得超8個(gè)贊

你把所有東西都寫在一個(gè)里面response.follow，那是錯(cuò)誤的。它需要一對(duì) url-callback。所以將它們寫在兩個(gè)單獨(dú)的函數(shù)中：

不正確的變體：

yield response.follow(self.about_page, self.parse_about, self.editor, self.parse_editor, meta={'items': items})

正確的變體：

yield response.follow(self.about_page, self.parse_about, meta={'items': items})

yield response.follow(self.editor, self.parse_editor, meta={'items': items})

你可以先寫follow在parse函數(shù)中；調(diào)用parse_about并follow在parse_editor函數(shù)中生成第二項(xiàng)并產(chǎn)生最終項(xiàng)。

反對(duì) 回復(fù) 2022-01-11

1 回答
0 關(guān)注
144 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

為什么最后一個(gè)函數(shù)沒有執(zhí)行？

為什么最后一個(gè)函數(shù)沒有執(zhí)行？

1 回答

添加回答

為什么最后一個(gè)函數(shù)沒有執(zhí)行？

為什么最后一個(gè)函數(shù)沒有執(zhí)行？