2 回答

TA貢獻1813條經(jīng)驗 獲得超2個贊
Scrapy 將是完成此任務(wù)的不錯選擇。這將是一個非常簡單的蜘蛛,它將能夠收集所需的信息。
import scrapy
class TestSpider(scrapy.Spider):
name = 'test'
start_urls = ['https://www.amazon.com/dp/B07Q6H83VY']
def parse(self, response):
for row in response.css('div.review'):
item = {}
item['author'] = row.css('span.a-profile-name::text').extract_first()
rating = row.css('i.review-rating > span::text').extract_first().strip().split(' ')[0]
item['rating'] = int(float(rating.strip().replace(',', '.')))
item['title'] = row.css('span.review-title > span::text').extract_first()
created_date = row.css('span.review-date::text').extract_first().strip()
item['created_date'] = created_date
review_content = row.css('div.reviewText ::text').extract()
review_content = [rc.strip() for rc in review_content if rc.strip()]
item['content'] = ', '.join(review_content)
yield item
輸出示例:
{
"author": "Jhona Diaz",
"rating": 4,
"title": "Recomendable solo si eres fan ya que si está algo caro",
"created_date": "Reviewed in Mexico on November 23, 2019",
"content": "Buena calidad y pues muy completo"
},
{
"author": "MANUEL MENDOZA OLVERA",
"rating": 5,
"title": "Perfecto Estado",
"created_date": "Reviewed in Mexico on September 28, 2019",
"content": "excelente, la edición es de caja metálica y llegó intacta"
},

TA貢獻1963條經(jīng)驗 獲得超6個贊
首先做 pip install selenium
第二個使用 Python 庫 dryscrape 來抓取 javascript 驅(qū)動的網(wǎng)站。在這個網(wǎng)址https://phantomjs.org/download.html
from selenium import webdriver
#the path below from dryscrape folder from step2
driver = webdriver.PhantomJS(executable_path='C:\\Users\\nayef\\Desktop\\New folder\\phantomjs-2.1.1-windows\\bin\\phantomjs')
driver.get('https://www.amazon.com/dp/B07Q6H83VY')
p_element = driver.find_element_by_id('deliveryMessageMirId')
driver.get(my_url)
p_element = driver.find_element_by_id(id_='intro-text')
print(p_element.text)
# result:
Arrives: Friday, Aug 7 Details
添加回答
舉報