我正在嘗試抓取推特。請輸入 search.twitter.com 并將 Comorbidity 放在搜索表單中。我可以正確獲取第一頁,向下滾動查看更多推文時可以看到,可以從 min_position 參數(shù)獲取下一頁。但是當(dāng)使用下一頁發(fā)送請求時,我無法獲得正確的內(nèi)容。這是我的一些代碼。headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}def start_requests(self): yield Request(url=self.start_urls[0], callback=self.parse_search_page)def parse_search_page(self, response): keyword = 'Comorbidity' search_url = self.search_url.format(keyword=keyword) yield Request(url=search_url, callback=self.parse_twitter_page, headers=self.headers)def parse_twitter_page(self, response): next_page = None if self.current_page == 0: posts = response.xpath('//li[@data-item-type="tweet"]').extract() min_position = re.search('data-min-position="(.*?)"', response.body) if min_position: min_position = min_position.group(1) next_page = self.next_page_url.format(position=min_position.replace('cm+', 'cm%2B').replace('==', '%3D%3D')) self.current_page = 1 else: json_data = json.loads(response.body) min_position = json_data.get('min_position') if next_page: yield scrapy.http.Request( url=self.next_page_url, callback=self.parse_twitter_page, headers=self.headers, )如何獲得正確的 min_position?
無法在 Twitter 抓取中正確獲取 min_position
ibeautiful
2021-08-14 21:22:52