第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號安全,請及時綁定郵箱和手機立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

抓取主圖像而不是縮略圖

抓取主圖像而不是縮略圖

一只斗牛犬 2023-06-20 16:33:30
import requestsroot_tag=["article", {"class":"sorted-article"}]image_tag=["img",{"":""},"src"]session = requests.Session()response = session.get("https://phys.org/earth-news/", headers=headers)webContent = response.contentfor div in all_tab_data:? ? image_url = None? ? div_img = str(div)? ? match = re.search(r"(http(s?):)([/|.|\w|\s|-])*\.(?:jpg|gif|png|jpeg)", div_img)? ? if match!=None:? ? ? ? image_url = match.group(0)? ? else:? ? ? ? image_url = div.find(image_tag[0],image_tag[1]).get(image_tag[2])? ? if image_url!=None:? ? ? ? if image_url[0] == '/' and image_url[1] != '/':? ? ? ? ? ? image_url = main_url + image_url我的圖像 url 輸出是output_url但圖像的實際 url 是actual_url。我怎樣才能抓取主圖像?
查看完整描述

2 回答

?
吃雞游戲

TA貢獻1829條經(jīng)驗 獲得超7個贊

用于beautifulsoup抓取所有新聞內(nèi)容以獲取圖像:


import requests

from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}


with requests.Session() as session:

    session.headers = headers

    soup = BeautifulSoup(session.get("https://phys.org/earth-news/").text, "lxml")

    news_list = [news_div.get("href") for news_div in soup.select('.news-link')]

    for url in news_list:

        soup = BeautifulSoup(session.get(url).text, "lxml")

        img = soup.select_one(".article-img")

        if img:

            print(url, img.select_one('img').get("src"))

        else:

            print(url, "This news doesn't contain image")


查看完整回答
反對 回復 2023-06-20
?
慕神8447489

TA貢獻1780條經(jīng)驗 獲得超1個贊

用于BeautifulSoup提取圖像鏈接:


import requests

from bs4 import BeautifulSoup



url = 'https://phys.org/earth-news/'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0'}

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

    

for img in soup.select('.sorted-article img[data-src]'):

    print( img['data-src'].replace('/175u/', '/800/') )

印刷:


https://scx1.b-cdn.net/csz/news/800/2020/biofuels.jpg

https://scx1.b-cdn.net/csz/news/800/2020/waterscarcity.jpg

https://scx1.b-cdn.net/csz/news/800/2020/soilerosion.jpg

https://scx1.b-cdn.net/csz/news/800/2020/hydropowerdam.jpg

https://scx1.b-cdn.net/csz/news/800/2019/flood.jpg

https://scx1.b-cdn.net/csz/news/800/2018/1-emissions.jpg

https://scx1.b-cdn.net/csz/news/800/2020/globalforest.jpg

https://scx1.b-cdn.net/csz/news/800/2020/fleeingthecl.jpg

https://scx1.b-cdn.net/csz/news/800/2020/watersecurity.jpg

https://scx1.b-cdn.net/csz/news/800/2019/2-water.jpg

https://scx1.b-cdn.net/csz/news/800/2020/japaneseexpe.jpg

https://scx1.b-cdn.net/csz/news/800/2020/6-scientistsco.jpg

https://scx1.b-cdn.net/csz/news/800/2020/housescollap.jpg

https://scx1.b-cdn.net/csz/news/800/2020/soil.jpg

https://scx1.b-cdn.net/csz/news/800/2020/32-researcherst.jpg

https://scx1.b-cdn.net/csz/news/800/2020/2-nasatracking.jpg

https://scx1.b-cdn.net/csz/news/800/2020/thelargersec.jpg

https://scx1.b-cdn.net/csz/news/800/2020/4-nasasterrasa.jpg

https://scx1.b-cdn.net/csz/news/800/2020/howtorecycle.jpg

https://scx1.b-cdn.net/csz/news/800/2020/newtoolstrac.jpg


查看完整回答
反對 回復 2023-06-20
  • 2 回答
  • 0 關注
  • 164 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網(wǎng)微信公眾號