-
Python3 讀取 pdf 文件的庫(kù) pdfminer3k查看全部
-
urllib 請(qǐng)求 web 服務(wù)器 beautiful soul 解析返回的結(jié)果 re 進(jìn)行正則處理查看全部
-
from urllib.request import urlopen from bs4 import BeautifulSoup as bs import re resp = urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8") soup = bs(resp,"html.parser") listUrls = soup.findAll("a", herf=re.compile("^/wiki/")) for url in listUrls: print(url["herf"])查看全部
-
1. print(soup.find(id="link2").string) #獲取link2標(biāo)簽的內(nèi)容; 2. for link in soup.findAll("a"): print(link.string) #查找A標(biāo)簽下面所有的內(nèi)容,利用For循環(huán)實(shí)現(xiàn)查看全部
-
爬蟲查看全部
-
from urllib.request import urlopen # 引入urlopen 模塊 from urllib.request import Request # 引入urlrequest 模塊 from urllib import parse # 引入parse 模塊 req = Request("http://www.thsrc.com.tw/tw/TimeTable/SearchResult") postDate = parse.urlencode([ ("StartStation", "2f940836-cedc-41ef-8e28-c2336ac8fe68"), ("EndStation", "977abb69-413a-4ccf-a109-0272c24fd490"), ("SearchDate", "2016/08/31"), ("SearchTime", "21:30"), ("SearchWay", "DepartureInMandarin") ]) req.add_header("Origin", "http://www.thsrc.com.tw") req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:48.0) Gecko/20100101 Firefox/48.0)") resp = urlopen(req,data=postDate.encode("utf-8")) print(resp.read().decode("utf-8"))查看全部
-
輸入代碼的時(shí)候需要注意大小寫、半角、全角符號(hào)都對(duì)結(jié)果有些影響。查看全部
-
from urllib.request import urlopen # 引入urlopen 模塊 from urllib.request import Request # 引入urlrequest 模塊 from urllib import parse # 引入parse 模塊 req = Request("http://www.thsrc.com.tw/tw/TimeTable/SearchResult") postDate = parse.urlencode([ ("StartStation", "2f940836-cedc-41ef-8e28-c2336ac8fe68"), ("EndStation", "977abb69-413a-4ccf-a109-0272c24fd490"), ("SearchDate", "2016/08/31"), ("SearchTime", "21:30"), ("SearchWay", "DepartureInMandarin") ]) req.add_header("Origin", "http://www.thsrc.com.tw") req.add_header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:48.0) Gecko/20100101 Firefox/48.0)") resp = urlopen(req,data=postDate.encode("utf-8")) print(resp.read().decode("utf-8"))查看全部
-
Python1查看全部
-
代碼塊2查看全部
-
代碼塊查看全部
-
1. 安裝文件的時(shí)候一定要把Path 路徑添加進(jìn)來,否則后面會(huì)出現(xiàn)較多錯(cuò)誤; 2 BeautifulSoup的安裝要退出 python 下才能裝 ;查看全部
-
亂碼問題查看全部
-
#暫時(shí)跑不出來。。?;仡^再看看 # -*- coding:utf-8 -*- import urllib2 import urllib #(1)Request 建立連接 url = 'http://www.thsrc.com.tw/tw/TimeTable/SearchResult' headers = { 'Host':'www.thsrc.com.tw', 'Origin':'http://www.thsrc.com.tw', 'Connection': 'keep-alive', 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36' } req = urllib2.Request(url=url, headers=headers) # (2)數(shù)據(jù)傳輸 data = {'StartStation': '2f940836-cedc-41ef-8e28-c2336ac8fe68', 'EndStation': 'e6e26e66-7dc1-458f-b2f3-71ce65fdc95f', 'SearchData': '2016/08/31', 'SearchTime': '13:00', 'SearchWay': 'DepartureInMandarin', 'RestTime': '', 'EarlyOrLater': '' } post_data = urllib.urlencode(data) resp = urllib2.urlopen(req, data=post_data) print (resp.read().decode('utf-8'))查看全部
-
嗯查看全部
舉報(bào)
0/150
提交
取消