首頁(yè) 猿問如何使用 Beautiful...

如何使用 Beautiful Soup 從 HTML 中提取特定的腳本元素

Python

喵喵時(shí)光機(jī) 2023-08-08 09:51:00

我正在使用 BS4 從足球統(tǒng)計(jì)頁(yè)面提取信息。我是這樣開始的：from bs4 import BeautifulSoup as bsimport requestsres = requests.get(url)soup = bs(res.content, 'lxml')scripts = soup.find_all('script')scripts = [script for script in scripts]這成功地將所有腳本元素作為列表返回。我需要提取特定的腳本元素具體來說，其開頭如下： <script> var teamsData = JSON.parse('\x7B\x2271\x22\x3A\x7B\x22id\x22\x3A\x2271\x22,\x22title\x22\x3A\x22Aston\x20Villa\x22,\x22history\x22\x3A\x5B\x5D\x7D,\x2272\x22\x3A\x7B\x22id\x22\x3A\x2272\x22...</script>我嘗試了以下代碼的各種迭代，但輸出始終打印為空白：for script in scripts: if 'teamsData' in script.text: print(script)我總是可以簡(jiǎn)單地使用“print(scripts[2])”，但我想知道為什么我最初的努力失敗了。

查看完整描述

2 回答

縹緲止盈

TA貢獻(xiàn)2041條經(jīng)驗(yàn) 獲得超4個(gè)贊

顯然，.text腳本標(biāo)簽始終為空字符串。但是，您可以從以下位置獲取標(biāo)簽的內(nèi)容.children

from bs4 import BeautifulSoup

from io import StringIO

html = """

let a = "Hello";

</script>

"""

b = StringIO(html)

soup = BeautifulSoup(b, 'lxml')

for e in soup.find_all('script'):

print(repr(e.text))

print(repr(''.join(e.children)))

反對(duì) 回復(fù) 2023-08-08

楊__羊羊

TA貢獻(xiàn)1943條經(jīng)驗(yàn) 獲得超7個(gè)贊

您可以使用以下方式.string訪問<script>字符串：

import re

import json

from bs4 import BeautifulSoup

html_doc = '''<script>

var teamsData = JSON.parse('\x7B\x2271\x22\x3A\x7B\x22id\x22\x3A\x2271\x22,\x22title\x22\x3A\x22Aston\x20Villa\x22,\x22history\x22\x3A\x5B\x5D\x7D,\x2272\x22\x3A\x7B\x22id\x22\x3A\x2272\x22\x7D\x7D');

</script>'''

soup = BeautifulSoup(html_doc, 'html.parser')

script_string = soup.find('script').string

print(script_string)

印刷：

var teamsData = JSON.parse('{"71":{"id":"71","title":"Aston Villa","history":[]},"72":{"id":"72"}}');

要解析JSON數(shù)據(jù)，可以使用re/ jsonmodules。例如：

data = re.search(r"JSON\.parse\('(.*?)'\);", script_string).group(1)

data = json.loads(data)

for k, v in data.items():

print(k, v)

印刷：

71 {'id': '71', 'title': 'Aston Villa', 'history': []}

72 {'id': '72'}

反對(duì) 回復(fù) 2023-08-08

2 回答
0 關(guān)注
200 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如何使用 Beautiful Soup 從 HTML 中提取特定的腳本元素

如何使用 Beautiful Soup 從 HTML 中提取特定的腳本元素

2 回答

添加回答