首頁猿問如何從href中獲取URL本身就是...

如何從href中獲取URL本身就是一個(gè)超鏈接？

Python

炎炎設(shè)計(jì) 2021-11-09 17:00:11

我正在使用 Python 和 lxml 嘗試抓取此 html 頁面。我遇到的問題是試圖從這個(gè)超鏈接文本“Chapter02a”中獲取 URL。（請注意，我似乎無法在此處使用鏈接格式）。<li><a href="[Chapter02A](https://www.math.wisc.edu/~mstemper2/Math/Pinter/Chapter02A)">Examples of Operations</a></li>我試過了//ol[@id="ProbList"]/li/a/@href但這只會(huì)給我文本“Chapter02a”。還：//ol[@id="ProbList"]/li/a這將返回一個(gè) lxml.html.HtmlElement'object，并且我在文檔中找到的所有屬性都無法完成我想要做的事情。from lxml import htmlimport requestschapter_req = requests.get('https://www.math.wisc.edu/~mstemper2/Math/Pinter/Chapter02')chapter_html = html.fromstring(chapter_req.content)sections = chapter_html.xpath('//ol[@id="ProbList"]/li/a/@href')print(sections[0])我希望部分是小節(jié)的 URL 列表。

查看完整描述

2 回答

慕田峪7331174

TA貢獻(xiàn)1828條經(jīng)驗(yàn) 獲得超13個(gè)贊

您看到的返回是正確的，因?yàn)樗麮hapter02a是指向下一部分的“相對”鏈接。未列出完整的 url，因?yàn)檫@不是它在 html 中的存儲(chǔ)方式。

要獲取完整的網(wǎng)址，您可以使用：

url_base = 'https://www.math.wisc.edu/~mstemper2/Math/Pinter/'

sections = chapter_html.xpath('//ol[@id="ProbList"]/li/a/@href')

section_urls = [url_base + s for s in sections]

反對回復(fù) 2021-11-09

搖曳的薔薇

TA貢獻(xiàn)1793條經(jīng)驗(yàn) 獲得超6個(gè)贊

您也可以直接在XPATH級(jí)別進(jìn)行串聯(lián)以從相對鏈接重新生成 URL：

from lxml import html

import requests

chapter_req = requests.get('https://www.math.wisc.edu/~mstemper2/Math/Pinter/Chapter02')

chapter_html = html.fromstring(chapter_req.content)

sections = chapter_html.xpath('concat("https://www.math.wisc.edu/~mstemper2/Math/Pinter/",//ol[@id="ProbList"]/li/a/@href)')

print(sections)

輸出：

https://www.math.wisc.edu/~mstemper2/Math/Pinter/Chapter02A

反對回復(fù) 2021-11-09

2 回答
0 關(guān)注
269 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如何從href中獲取URL本身就是一個(gè)超鏈接？

如何從href中獲取URL本身就是一個(gè)超鏈接？

2 回答

添加回答

如何從href中獲取URL本身就是一個(gè)超鏈接？