課程
/后端開(kāi)發(fā)
/Python
/python遇見(jiàn)數(shù)據(jù)采集
腳本可以執(zhí)行成功,數(shù)據(jù)庫(kù)只有一條數(shù)據(jù),這是為什么?
2017-04-08
源自:python遇見(jiàn)數(shù)據(jù)采集 4-1
正在回答
#我剛剛測(cè)試了自己的代碼,發(fā)現(xiàn)完全沒(méi)有問(wèn)題 #首先,這樣,你把下面這段代碼完全復(fù)制到你的運(yùn)行環(huán)境進(jìn)行測(cè)試 from?urllib.request?import?urlopen from?bs4?import?BeautifulSoup import?re import?pymysql.cursors resp?=?urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8") soup?=?BeautifulSoup(resp,"html.parser") listUrls?=?soup.findAll("a",href=re.compile(r"^/wiki/")) for?url?in?listUrls: ???if?not?re.search("\.(jpg|JPG)$",?url["href"]): ????print(url.get_text(),"<---->","https://en.wikipedia.org"+url["href"]) ????#這里要縮進(jìn),不然后面面取得URL的值就是for遍歷的卒子后一個(gè)值,才會(huì)出現(xiàn)在打印過(guò)程中沒(méi)有問(wèn)題, ????#但是插入數(shù)據(jù)庫(kù)出現(xiàn)問(wèn)題, ????#不過(guò)我覺(jué)得在遍歷過(guò)程的外面連接數(shù)據(jù)庫(kù)可能好點(diǎn),我覺(jué)得每一次遍歷都會(huì)連接數(shù)據(jù)庫(kù)的代價(jià)太高了 ????connection=pymysql.connect( ???????host='localhost', ???????user='root', ???????password='lqmysql', ???????db='wikiurl', ???????charset='utf8mb4' ????) ????try: ?????with?connection.cursor()?as?cursor: ?????????sql="insert?into?`urls`(`urlhref`,`urlname`)values(%s,%s)" ?????????cursor.execute(sql,(url.get_text(),"https://en.wikipedia.org"+url["href"])) ?????????connection.commit() ????finally: ???????connection.close() #查看自己的運(yùn)行結(jié)果,應(yīng)該沒(méi)有什么問(wèn)題,我的就是這樣的 # # #然后,你復(fù)制下面這段代碼去測(cè)試一下 from?urllib.request?import?urlopen from?bs4?import?BeautifulSoup import?re import?pymysql.cursors resp?=?urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8") soup?=?BeautifulSoup(resp,"html.parser") listUrls?=?soup.findAll("a",href=re.compile(r"^/wiki/")) for?url?in?listUrls: ????if?not?re.search("\.(jpg|JPG)$",?url["href"]): ?????print(url.get_text(),"<---->","https://en.wikipedia.org"+url["href"]) connection=pymysql.connect( ????host='localhost', ????user='root', ????password='lqmysql', ????db='wikiurl', ????charset='utf8mb4' ) try: ??with?connection.cursor()?as?cursor: ??????sql="insert?into?`urls`(`urlhref`,`urlname`)values(%s,%s)" ??????cursor.execute(sql,(url.get_text(),"https://en.wikipedia.org"+url["href"])) ??????connection.commit() finally: ????connection.close() #這一次,應(yīng)該只有一條語(yǔ)句插入了 #每次運(yùn)行完,可視化數(shù)據(jù)庫(kù)軟件要記得刷新一下
慕粉3745191 提問(wèn)者
果然是格式問(wèn)題,前車(chē)之鑒.....
我的天,我也遇到了這個(gè)問(wèn)題,,,,,好氣
解決了……
追逐奔跑
看上圖
from?urllib.request?import?urlopen from?bs4?import?BeautifulSoup import?re import?pymysql.cursors resp?=?urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8") soup?=?BeautifulSoup(resp,"html.parser") listUrls?=?soup.findAll("a",href=re.compile("^/wiki/")) for?url?in?listUrls: ???if?not?re.search("\.(jgp||JPG)$",?url["href"]): ????print(url.get_text(),"<---->","https://en.wikipedia.org"+url["href"]) ????#這里要縮進(jìn),不然后面面取得URL的值就是for遍歷的卒子后一個(gè)值,才會(huì)出現(xiàn)在打印過(guò)程中沒(méi)有問(wèn)題, ????#但是插入數(shù)據(jù)庫(kù)出現(xiàn)問(wèn)題, ????#不過(guò)我覺(jué)得在遍歷過(guò)程的外面連接數(shù)據(jù)庫(kù)可能好點(diǎn),我覺(jué)得每一次遍歷都會(huì)連接數(shù)據(jù)庫(kù)的代價(jià)太高了 ????connection=pymysql.connect( ???????host='localhost', ???????user='root', ???????password='lqmysql', ???????db='wikiurl', ???????charset='utf8mb4' ????) ????try: ?????with?connection.cursor()?as?cursor: ?????????sql="insert?into?`urls`(`urlhref`,`urlname`)values(%s,%s)" ?????????cursor.execute(sql,(url.get_text(),"https://en.wikipedia.org"+url["href"])) ?????????connection.commit() ????finally: ???????connection.close()
from urllib.request import urlopenfrom bs4 import BeautifulSoupimport reimport pymysql.cursorsresp = urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8")soup = BeautifulSoup(resp,"html.parser")listUrls = soup.findAll("a",href=re.compile("^/wiki/"))for url in listUrls: ? ?if not re.search("\.(jgp||JPG)$", url["href"]): ? ? print(url.get_text(),"<---->","https://en.wikipedia.org"+url["href"])connection=pymysql.connect( ? ?host='localhost', ? ?user='root', ? ?password='lqmysql', ? ?db='wikiurl', ? ?charset='utf8mb4')try: ?with connection.cursor() as cursor: ? ? ?sql="insert into `urls`(`urlhref`,`urlname`)values(%s,%s)" ? ? ?cursor.execute(sql,(url.get_text(),"https://en.wikipedia.org"+url["href"])) ? ? ?connection.commit()finally: ? ?connection.close()
慕粉3745191 提問(wèn)者 回復(fù) 追逐奔跑
額,帥哥,總共不到100行代碼,你完全可以全部粘出來(lái)的,單獨(dú)這一部分應(yīng)該是沒(méi)有問(wèn)題的。
是不是數(shù)據(jù)庫(kù)操作成了update?明顯是沒(méi)有Insert吧
追逐奔跑 回復(fù) 慕粉3745191 提問(wèn)者
舉報(bào)
本教程讓你初步掌握Python進(jìn)行數(shù)據(jù)采集,創(chuàng)造屬于你的價(jià)值
1 回答可以成功連接數(shù)據(jù)庫(kù)為什么每次都只有一條數(shù)據(jù),最后的那一條,disclaimers這條
2 回答插入mysql數(shù)據(jù)庫(kù)只有一條數(shù)據(jù)?
2 回答Python向數(shù)據(jù)庫(kù)中插入數(shù)據(jù)時(shí),數(shù)據(jù)庫(kù)仍然空白
2 回答導(dǎo)入MySQL數(shù)據(jù)庫(kù)
2 回答為什么爬不出數(shù)據(jù)
Copyright ? 2025 imooc.com All Rights Reserved | 京ICP備12003892號(hào)-11 京公網(wǎng)安備11010802030151號(hào)
購(gòu)課補(bǔ)貼聯(lián)系客服咨詢(xún)優(yōu)惠詳情
慕課網(wǎng)APP您的移動(dòng)學(xué)習(xí)伙伴
掃描二維碼關(guān)注慕課網(wǎng)微信公眾號(hào)
2017-04-09
2017-07-12
果然是格式問(wèn)題,前車(chē)之鑒.....
2017-07-12
我的天,我也遇到了這個(gè)問(wèn)題,,,,,好氣
2017-04-09
解決了……
2017-04-09
看上圖
2017-04-09
2017-04-09
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import pymysql.cursors
resp = urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8")
soup = BeautifulSoup(resp,"html.parser")
listUrls = soup.findAll("a",href=re.compile("^/wiki/"))
for url in listUrls:
? ?if not re.search("\.(jgp||JPG)$", url["href"]):
? ? print(url.get_text(),"<---->","https://en.wikipedia.org"+url["href"])
connection=pymysql.connect(
? ?host='localhost',
? ?user='root',
? ?password='lqmysql',
? ?db='wikiurl',
? ?charset='utf8mb4'
)
try:
?with connection.cursor() as cursor:
? ? ?sql="insert into `urls`(`urlhref`,`urlname`)values(%s,%s)"
? ? ?cursor.execute(sql,(url.get_text(),"https://en.wikipedia.org"+url["href"]))
? ? ?connection.commit()
finally:
? ?connection.close()
2017-04-09
額,帥哥,總共不到100行代碼,你完全可以全部粘出來(lái)的,單獨(dú)這一部分應(yīng)該是沒(méi)有問(wèn)題的。
2017-04-08
是不是數(shù)據(jù)庫(kù)操作成了update?明顯是沒(méi)有Insert吧