課程
/后端開發(fā)
/Python
/Python開發(fā)簡單爬蟲
包含中文的url都不能download,,,求解
2019-01-12
源自:Python開發(fā)簡單爬蟲 7-7
正在回答
import urllib.requestfrom urllib.parse import quoteimport string
class HtmlDownloader(object):??? ??? ??? def download(self,url):??????? if url is None:??????????? return None??????? s=quote(url,safe=string.printable)??????? ??????? response=urllib.request.urlopen(s)??????? if response.getcode()!=200:??????????? return None??????? return response.read()
urllib.quote 解決Python傳遞中文參數(shù)給URL
def?_get_new_urls(self,?page_url,?soup): ????new_urls?=?set() ????#<a?target="_blank"?href="/item/%E9%98%BF%E5%A7%86%E6%96%AF%E7%89%B9%E4%B8%B9/2259975"?data-lemmaid="2259975">阿姆斯特丹</a> ????#https:?//?baike.baidu.com?/?item?/?阿姆斯特丹?/?2259975 ????links?=?soup.find_all('a',href=re.compile(r"/item/")) ????for?link?in?links: ????????new_url?=?'/item/'+link.get_text() ????????new_full_url?=?urlparse.urljoin(page_url,new_url) ????????new_urls.add(new_full_url) ????return?new_urls
我也是這么寫的,有哪里寫錯(cuò)了嗎?
BSH
這一步務(wù)必使用函數(shù)進(jìn)行url的拼接,帶有中文的url會(huì)有編碼問題
趙崇輝 提問者
舉報(bào)
本教程帶您解開python爬蟲這門神奇技術(shù)的面紗
Copyright ? 2025 imooc.com All Rights Reserved | 京ICP備12003892號(hào)-11 京公網(wǎng)安備11010802030151號(hào)
購課補(bǔ)貼聯(lián)系客服咨詢優(yōu)惠詳情
慕課網(wǎng)APP您的移動(dòng)學(xué)習(xí)伙伴
掃描二維碼關(guān)注慕課網(wǎng)微信公眾號(hào)
2019-03-26
import urllib.request
from urllib.parse import quote
import string
class HtmlDownloader(object):
???
???
??? def download(self,url):
??????? if url is None:
??????????? return None
??????? s=quote(url,safe=string.printable)
???????
??????? response=urllib.request.urlopen(s)
??????? if response.getcode()!=200:
??????????? return None
??????? return response.read()
urllib.quote 解決Python傳遞中文參數(shù)給URL
2019-01-14
我也是這么寫的,有哪里寫錯(cuò)了嗎?
2019-01-13
這一步務(wù)必使用函數(shù)進(jìn)行url的拼接,帶有中文的url會(huì)有編碼問題