對(duì)于同一個(gè)頁面,幾乎同樣的代碼,在Python3,windows8環(huán)境下能夠正常解析運(yùn)行。但是把代碼移植到Ubuntu,Python2.7下面之后,會(huì)出現(xiàn)獲取的網(wǎng)頁不能被beautifulsoup解析,find_all('table')返回空節(jié)點(diǎn)的情況。出問題的代碼的一部分(可以運(yùn)行):python#coding:utf-8importsysreload(sys)sys.setdefaultencoding('utf-8')importurllib2frombs4importBeautifulSouppostdata="T1=&T2=1&T3=&T4=&T5=&APPDate=&T7=&T8=&T9=&PRDate=&T11=&SQDate=&JDDate=&T14=&T15=&T16=&T17=&SDDate=&T19=&T20=&T21=&D1=%B8%B4%C9%F3&D2=jdr&D3=%C9%FD%D0%F2&C1=fm&C2=&C3=&page=70"postdata=postdata.encode('utf-8')headers={'User-Agent':'Mozilla/5.0(Windows;U;WindowsNT6.1;en-US;rv:1.9.1.6)Gecko/20091201Firefox/3.5.6','Referer':'http://app.sipo-reexam.gov.cn/reexam_out/searchdoc/searchfs.jsp'}req=urllib2.Request(url="http://app.sipo-reexam.gov.cn/reexam_out/searchdoc/searchfs.jsp",headers=headers,data=postdata)fp=urllib2.urlopen(req)mybytes=fp.read().decode('gbk').encode('utf-8')soup=BeautifulSoup(mybytes,from_coding="uft-8")printsoup.original_encodingprintsoup.prettify()求指點(diǎn)一二
beautifulsoup解析中文網(wǎng)頁的編碼問題
慕田峪4524236
2019-03-30 11:36:27