from scrapy import Spiderfrom scrapy.http import Requestclass CourseSpider(Spider): name = 'course' allowed_domains = ['coursera.org'] start_urls = ['https://coursera.org/about/partners'] def parse(self, response): listings = response.xpath('//div[@class="rc-PartnerBox vertical-box"]') for listing in listings: title = listing.xpath('.//div[@class="partner-box-wrapper card-one-clicker flex-1"]/p').extract_first() relative_url = listing.xpath('.//a/@href').extract_first() absolute_url = response.urljoin(relative_url) yield Request(response.urljoin(relative_url), callback = self.parse_listing,meta={'title':title,'absolute_url':absolute_url}) def parse_listing(self,response): titles = response.meta.get('title') absolute_url = response.meta.get('absolute_url') titles_course = response.xpath('//div[@class="name headline-1-text"]/text()').extract() url_link = response.xpath('//div[@class="rc-Course"]/a/@href').extract() abs_url = response.urljoin(url_link) yield {'title':title, 'titles':title, 'absolute_url':absolute_url, 'titles_course':titles_course, 'abs_url':abs_url}但是,在通過(guò) cmd 運(yùn)行腳本時(shí)。我收到錯(cuò)誤。這些錯(cuò)誤表明我不能混合 str 和非 str 參數(shù),我對(duì)如何處理這個(gè)問(wèn)題感到困惑。任何幫助,將不勝感激。我嘗試添加 extract() 函數(shù),因?yàn)樗诹斜砣萜魃系囊恍┫惹暗?stackoverflow 問(wèn)題中被提及以消除該錯(cuò)誤,但是我的 xpath 沒(méi)有獲得所需的輸出。
1 回答

LEATH
TA貢獻(xiàn)1936條經(jīng)驗(yàn) 獲得超7個(gè)贊
您正在尋找.extract_first()
或它的新名稱.get()
,因?yàn)樗?code>.extract()會(huì)生成一個(gè)列表,不能在其中使用.urljoin
添加回答
舉報(bào)
0/150
提交
取消