2 回答

TA貢獻(xiàn)1828條經(jīng)驗(yàn) 獲得超6個(gè)贊
你不需要硒。您可以做的(并且您正確識(shí)別了它)是提取注釋,然后從其中解析表格。
import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd
url = 'https://www.pro-football-reference.com/teams/crd/2017_roster.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
tables = []
for each in comments:
if 'table' in each:
try:
tables.append(pd.read_html(each)[0])
except ValueError as e:
print(e)
continue
輸出:
print(tables[0].head().to_string())
No. Player Age Pos G GS Wt Ht College/Univ BirthDate Yrs AV Drafted (tm/rnd/yr) Salary
0 54.0 Bryson Albright 23.0 NaN 7 0.0 245.0 6-5 Miami (OH) 3/15/1994 1 0.0 NaN $246,177
1 36.0 Budda Baker*+ 21.0 ss 16 7.0 195.0 5-10 Washington 1/10/1996 Rook 9.0 Arizona Cardinals / 2nd / 36th pick / 2017 $465,000
2 64.0 Khalif Barnes 35.0 NaN 3 0.0 320.0 6-6 Washington 4/21/1982 12 0.0 Jacksonville Jaguars / 2nd / 52nd pick / 2005 $176,471
3 41.0 Antoine Bethea 33.0 db 15 6.0 206.0 5-11 Howard 7/27/1984 11 4.0 Indianapolis Colts / 6th / 207th pick / 2006 $2,000,000
4 28.0 Justin Bethel 27.0 rcb 16 6.0 200.0 6-0 Presbyterian 6/17/1990 5 3.0 Arizona Cardinals / 6th / 177th pick / 2012 $2,000,000
....

TA貢獻(xiàn)1744條經(jīng)驗(yàn) 獲得超4個(gè)贊
您嘗試抓取的標(biāo)簽是由 JavaScript 動(dòng)態(tài)生成的。您很可能使用請(qǐng)求來抓取 HTML。不幸的是 requests 不會(huì)運(yùn)行 JavaScript,因?yàn)樗鼘⑺?HTML 作為原始文本提取。 BeautifulSoup 找不到該標(biāo)簽,因?yàn)樗鼜奈丛谀淖ト〕绦蛑猩伞?/p>
- 2 回答
- 0 關(guān)注
- 179 瀏覽
添加回答
舉報(bào)