首頁猿問無法在Python中的Beauti...

無法在Python中的Beautiful Soup中獲取div標簽，

Python

翻閱古今 2023-12-29 17:02:24

我正在嘗試下載官方網(wǎng)站上提供的所有口袋妖怪圖像。我這樣做的原因是因為我想要高質量的圖像。以下是我編寫的代碼。from bs4 import BeautifulSoup as bs4import requestsrequest = requests.get('https://www.pokemon.com/us/pokedex/')soup = bs4(request.text, 'html')print(soup.findAll('div',{'class':'container pokedex'}))輸出是[]我做錯了什么嗎？另外，從官方網(wǎng)站抓取合法嗎？有沒有任何標簽或東西可以說明這一點？謝謝PS：我是 BS 和 html 的新手。

查看完整描述

2 回答

嚕嚕噠

TA貢獻1784條經(jīng)驗獲得超7個贊

圖像是動態(tài)加載的，因此您必須使用selenium它們來抓取它們。這是執(zhí)行此操作的完整代碼：

from selenium import webdriver

import time

import requests

driver = webdriver.Chrome()

driver.get('https://www.pokemon.com/us/pokedex/')

time.sleep(4)

li_tags = driver.find_elements_by_class_name('animating')[:-3]

li_num = 1

for li in li_tags:

img_link = li.find_element_by_xpath('.//img').get_attribute('src')

name = li.find_element_by_xpath(f'/html/body/div[4]/section[5]/ul/li[{li_num}]/div/h5').text

r = requests.get(img_link)

with open(f"D:\\{name}.png", "wb") as f:

f.write(r.content)

li_num += 1

driver.close()

輸出：

12張口袋妖怪圖片。這是前兩張圖片：

圖片1：

圖片2：

另外，我注意到頁面底部有一個加載更多按鈕。單擊時，它會加載更多圖像。單擊“加載更多”按鈕后，我們必須繼續(xù)向下滾動才能加載更多圖像。如果我沒記錯的話，網(wǎng)站上一共有 893 張圖片。為了抓取所有 893 張圖像，您可以使用以下代碼：

from selenium import webdriver

import time

import requests

driver = webdriver.Chrome()

driver.get('https://www.pokemon.com/us/pokedex/')

time.sleep(3)

load_more = driver.find_element_by_xpath('//*[@id="loadMore"]')

driver.execute_script("arguments[0].click();",load_more)

lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")

match=False

while(match==False):

lastCount = lenOfPage

time.sleep(1.5)

lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")

if lastCount==lenOfPage:

match=True

li_tags = driver.find_elements_by_class_name('animating')[:-3]

li_num = 1

for li in li_tags:

img_link = li.find_element_by_xpath('.//img').get_attribute('src')

name = li.find_element_by_xpath(f'/html/body/div[4]/section[5]/ul/li[{li_num}]/div/h5').text

r = requests.get(img_link)

with open(f"D:\\{name}.png", "wb") as f:

f.write(r.content)

li_num += 1

driver.close()

反對回復 2023-12-29

元芳怎么了

TA貢獻1798條經(jīng)驗獲得超7個贊

如果您首先檢查網(wǎng)絡選項卡，這可能會更容易完成：

import time

import requests

endpoint = "https://www.pokemon.com/us/api/pokedex/kalos"

# contains all metadata

data = requests.get(endpoint).json()

# collect keys needed to save the picture

items = [{"name": item["name"], "link": item["ThumbnailImage"]} for item in data]

# remove duplicates

d = [dict(t) for t in {tuple(d.items()) for d in items}]

assert len(d) == 893

for pokemon in d:

response = requests.get(pokemon["link"])

time.sleep(1)

with open(f"{pokemon['name']}.png", "wb") as f:

f.write(response.content)

反對回復 2023-12-29

2 回答
0 關注
205 瀏覽

關注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

無法在Python中的Beautiful Soup中獲取div標簽，

無法在Python中的Beautiful Soup中獲取div標簽，

2 回答

添加回答

無法在Python中的Beautiful Soup中獲取div標簽，