首頁(yè) 猿問(wèn) 嘗試使用...

嘗試使用 BeautifulSoup 獲取元數(shù)據(jù)時(shí)出現(xiàn)意外結(jié)果

Python

慕無(wú)忌1623718 2021-10-12 17:49:17

好的，這就是我正在嘗試做的。我對(duì) Python 還很陌生，我才剛剛掌握它。無(wú)論如何，使用這個(gè)小工具，我正在嘗試從頁(yè)面中提取數(shù)據(jù)。在這種情況下，我希望用戶輸入一個(gè) URL 并讓它返回<meta content=" % Likes, % Comments - @% on Instagram: “post description []”" name="description" /> 但是，替換%為帖子的喜歡/評(píng)論等數(shù)量。這是我的完整代碼：from urllib.request import urlopenfrom bs4 import BeautifulSoupimport requestsimport reurl = "https://www.instagram.com/p/BsOGulcndj-/"page2 = requests.get(url)soup2 = BeautifulSoup(page2.content, 'html.parser')result = soup2.findAll('content', attrs={'content': 'description'})print (result)但是每當(dāng)我運(yùn)行它時(shí)，我都會(huì)得到[]. 我究竟做錯(cuò)了什么？

查看完整描述

2 回答

ITMISS

TA貢獻(xiàn)1871條經(jīng)驗(yàn) 獲得超8個(gè)贊

匹配這些標(biāo)簽的正確方法是：

result = soup2.findAll('meta', content=True, attrs={"name": "description"})

但是，html.parser不能<meta>正確解析標(biāo)簽。它沒(méi)有意識(shí)到它們是自閉合的，所以它<head>在結(jié)果中包含了其余的大部分。我改為

soup2 = BeautifulSoup(page2.content, 'html5lib')

然后上面搜索的結(jié)果是：

[<meta content="46.3m Likes, 2.6m Comments - EGG GANG ?? (@world_record_egg) on Instagram: “Let’s set a world record together and get the most liked post on Instagram. Beating the current…”" name="description"/>]

反對(duì) 回復(fù) 2021-10-12