-
#!/usr/bin/env?python?? #?encoding:?utf-8 from?urllib.request?import?urlopen req?=?urlopen("https://en.wikipedia.org/robots.txt") print(req.read().decode('utf-8'))
查看全部 -
#!/usr/bin/env?python?? #?encoding:?utf-8 import?pymysql connection?=?pymysql.connect(host='localhost', ????????????????????????????user='root', ????????????????????????????password='', ????????????????????????????db='wiki', ????????????????????????????charset='utf8') try: ????with?connection.cursor()?as?cursor: ????????sql?=?"select?`urlname`,?`urlhref`?from?`urls`?where?`id`?is?not?null" ????????count?=?cursor.execute(sql) ????????print(count) ????????#result?=?cursor.fetchall() ????????#print(result) ????????result?=?cursor.fetchmany(size=5) ????????print(result) finally: ????connection.close()
查看全部 -
#!/usr/bin/env?python?? #?encoding:?utf-8 #引入開發(fā)包 from?urllib.request?import?urlopen from?bs4?import?BeautifulSoup import?re import?pymysql resp?=?urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8") soup?=?BeautifulSoup(resp,?"html.parser") listUrls?=?soup.find_all("a",?href=re.compile("^/wiki/")) #print(listUrls) connection?=?pymysql.connect(host='localhost', ????????????????????????????user='root', ????????????????????????????password='', ????????????????????????????db='wiki', ????????????????????????????charset='utf8') print(connection) try: ????with?connection.cursor()?as?cursor: ????????for?url?in?listUrls: ????????????if?not?re.search("\.(jpg|jpeg)$",?url['href']): ????????????????sql?=?"insert?into?`urls`(`urlname`,`urlhref`)values(%s,?%s)" ????????????????#print(sql) ????????????????#print(url.get_text()) ????????????????cursor.execute(sql,?(url.get_text(),?"https://en.wikipedia.org"?+?url["href"])) ????????????????connection.commit() finally: ????connection.close();
查看全部 -
urllib
查看全部 -
python3 亂碼解決
查看全部 -
mark
查看全部 -
導(dǎo)入模塊
1.讀取網(wǎng)頁信息
2.對(duì)讀取到的信息進(jìn)行排版
3.對(duì)排版過的數(shù)據(jù)進(jìn)行二次獲取操作。
4.打印結(jié)果
查看全部 -
讀取在線PDF查看全部
-
獲取維基百科詞條查看全部
-
值得一看,爬取數(shù)據(jù)查看全部
-
用urllib發(fā)送post請(qǐng)求;訪問有的網(wǎng)站需要添加 origins、user agent 來表明自己不是爬蟲 否則會(huì)報(bào)錯(cuò)查看全部
-
urllib模擬真實(shí)瀏覽器操作查看全部
-
檢測(cè)python是否安裝成功命令查看全部
-
使用decode("utf-8")可以防止亂碼查看全部
-
https://en.wikipedia.org/robots.txt查看全部
舉報(bào)
0/150
提交
取消