首頁猿問如何使用動態(tài) HTML...

如何使用動態(tài) HTML (Python) 從網(wǎng)頁中抓取數(shù)據(jù)？

Python

GCT1015 2023-12-12 10:23:05

我試圖弄清楚如何從以下網(wǎng)址中抓取數(shù)據(jù)：https://www.aap.org/en-us/advocacy-and-policy/aap-health-initiatives/nicuverification/Pages/NICUSearch.aspx這是數(shù)據(jù)類型：看起來所有內(nèi)容都是從數(shù)據(jù)庫填充并通過 JavaScript 加載到網(wǎng)頁中。我過去使用seleniumand做過類似的事情PhantomJS，但我不知道如何在 Python 中獲取這些數(shù)據(jù)字段。正如預(yù)期的那樣，我不能用于pd.read_html此類問題。是否可以解析以下結(jié)果：from selenium import webdriverurl="https://www.aap.org/en-us/advocacy-and-policy/aap-health-initiatives/nicuverification/Pages/NICUSearch.aspx"browser = webdriver.PhantomJS()browser.get(url)content = browser.page_source或者也許可以訪問實際的底層數(shù)據(jù)？如果沒有，除了幾個小時的復制和粘貼之外，還有什么其他方法？編輯：基于下面的答案，來自 @thenullptr，我已經(jīng)能夠訪問該材料，但僅限于第 1 頁。我如何調(diào)整它以遍歷所有頁面 [正確解析的建議]？我的最終目標是將其放入 pandas 數(shù)據(jù)框中import requestsfrom bs4 import BeautifulSoupr = requests.post( url = 'https://search.aap.org/nicu/', data = {'SearchCriteria.Level':'1', 'X-Requested-With':'XMLHttpRequest'}, ) #key:valuehtml = r.text# Parsing the HTML soup = BeautifulSoup(html.split("</script>")[-1].strip(), "html")div = soup.find("div", {"id": "main"})div = soup.findAll("div", {"class":"blue-border panel list-group"})def f(x): ignore_fields = ['Collapse all','Expand all'] output = list(filter(bool, map(str.strip, x.text.split("\n")))) output = list(filter(lambda x: x not in ignore_fields, output)) return outputresults = pd.Series(list(map(f, div))[0])

查看完整描述

1 回答

Smart貓小萌

TA貢獻1911條經(jīng)驗獲得超7個贊

繼我上一條評論之后，下面的內(nèi)容應(yīng)該為您提供一個很好的起點。在查看 XHR 調(diào)用時，您只想查看每個調(diào)用發(fā)送和接收的數(shù)據(jù)，以查明您需要的數(shù)據(jù)。下面是進行搜索時發(fā)送到 API 的原始 POST 數(shù)據(jù)，看起來您需要至少使用一個并包含最后一個。

{

? ? "SearchCriteria.Name": "smith",

? ? "SearchCriteria.City": "",

? ? "SearchCriteria.State": "",

? ? "SearchCriteria.Zip": "",

? ? "SearchCriteria.Level": "",

? ? "SearchCriteria.LevelAssigner": "",

? ? "SearchCriteria.BedNumberRange": "",

? ? "X-Requested-With": "XMLHttpRequest"

}

這是一個簡單的示例，說明如何使用 requests 庫發(fā)送 post 請求，網(wǎng)頁將回復原始數(shù)據(jù)，以便您可以使用 BS 或類似的方法來解析它以獲取您需要的信息。

import requests

r = requests.post('https://search.aap.org/nicu/',?

data = {'SearchCriteria.Name':'smith', 'X-Requested-With':'XMLHttpRequest'}) #key:value

print(r.text)

印刷 <strong class="col-md-8 white-text">JOHN PETER SMITH HOSPITAL</strong>...

https://requests.readthedocs.io/en/master/user/quickstart/

反對回復 2023-12-12

1 回答
0 關(guān)注
171 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如何使用動態(tài) HTML (Python) 從網(wǎng)頁中抓取數(shù)據(jù)？

如何使用動態(tài) HTML (Python) 從網(wǎng)頁中抓取數(shù)據(jù)？

1 回答

添加回答

如何使用動態(tài) HTML (Python) 從網(wǎng)頁中抓取數(shù)據(jù)？