1 回答

TA貢獻1851條經(jīng)驗 獲得超3個贊
好吧,我曾經(jīng)縮短了選擇值以開頭的re所有標簽的路徑,您也可以用不同的方式來完成,例如。ahreftopten
for item in soup.select("a[href^=topten]"):
然后我得到了標簽內的所有文本,然后stripped將其與strip=True并放置一個空separator,這樣text就不會一起分配。
import requests
from bs4 import BeautifulSoup
import re
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
for item in soup.findAll("a", href=re.compile("^topten")):
item = item.get_text(strip=True, separator=" ")
if item:
print(item)
main("http://edition.cnn.com/EVENTS/1996/year.in.review/main.html")
輸出:
Israel elects Netanyahu
Crash of TWA Flight 800
Russia elects Yeltsin
U.S . elects Clinton
Hutu-Tutsi conflict in central Africa
Peace, elections in Bosnia
U.S . base bombed in Saudi Arabia
Centennial Olympic Games
Advances against AIDS
Unabomb suspect Ted Kaczynski arrested
- 1 回答
- 0 關注
- 115 瀏覽
添加回答
舉報