1 回答

TA貢獻(xiàn)1799條經(jīng)驗(yàn) 獲得超9個(gè)贊
您的第一個(gè)問(wèn)題是該行中的 CSS 類name = item.find("div", attrs={"class": "p13n-sc-truncated"}應(yīng)該是p13n-sc-truncate. 您的第二個(gè)問(wèn)題是您用來(lái)查找項(xiàng)目的類過(guò)于具體(對(duì)于第一項(xiàng))。我發(fā)現(xiàn)用 class 搜索列表項(xiàng)更有用zg-item-immersion。
如果只想列出前 10 個(gè)項(xiàng)目,則可以將[:10]切片說(shuō)明符添加到主 for 循環(huán)中。把它們放在一起,我們得到:
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9"
}
url_amazon = (
"https://www.amazon.co.uk/Best-Sellers-Electronics/zgbs/electronics"
)
response = requests.get(url_amazon, headers=headers)
soup = BeautifulSoup(response.content, "lxml")
print(soup.prettify())
title = soup.find(
"h1", class_="a-size-large a-spacing-medium zg-margin-left-15 a-text-bold"
).text
print(title)
titles = []
for item in soup.findAll("li", attrs={"class": "zg-item-immersion"})[:10]:
name = item.find("div", attrs={"class": "p13n-sc-truncate"})
if name is not None:
titles.append(name.text.strip())
else:
titles.append("unknown title")
print(len(titles))
for i in titles:
print(i)
我用來(lái)name.text.strip()刪除換行符和多余的空格。
需要注意的是,這個(gè)腳本比較脆弱,因?yàn)閬嗰R遜可以隨時(shí)更改布局和類名。
添加回答
舉報(bào)