課程
/后端開發(fā)
/Python
/Python開發(fā)簡單爬蟲
Python怎么調用bs4???
2016-05-09
源自:Python開發(fā)簡單爬蟲 7-5
正在回答
from?bs4?import?BeautifulSoup
安裝; ?windows 在 CMD 中,使用?
$?pip?install?beautifulsoup4
http://images2015.cnblogs.com/blog/740516/201609/740516-20160912231014211-2052713170.png
__author__?=?'xray' #?coding:?utf8 from?bs4?import?BeautifulSoup import?re,?urlparse class?HtmlParser(object): ????#?def?__init__(self): ????#?????self ????def?get_new_urls(self,?page_url,?soup): ????????new_urls?=?set() ????????links?=?soup.find_all('a',?href=re.compile(r'/view/\d+\.html')) ????????for?link?in?links: ????????????new_url?=?link['href'] ????????????new_full_url?=?urlparse.urljoin(page_url,?new_url) ????????????new_urls.add(new_full_url) ????????return?new_urls ????def?get_new_data(self,?page_url,?soup): ????????res_data?=?{} ????????# ????????res_data['url']?=?page_url ????????# ????????title_node?=?soup.find('dd',?class_='LemmaWgt-LemmaTitle-title').find('h1') ????????res_data['title']?=?title_node.get_text() ????????# ????????summary_node?=?soup.find('div',?class_='lemma-summary') ????????res_data['summary']?=?summary_node.get_text() ????????return?res_data ????def?parse(self,?page_url,?html_cont): ????????if?page_url?is?None?or?html_cont?is?None: ????????????return ????????soup?=?BeautifulSoup(html_cont,?'html.parser',?from_encoding='utf-8') ????????new_urls?=?self.get_new_urls(page_url,?soup) ????????new_data?=?self.get_new_data(page_url,?soup) ????????return?new_urls,?new_data
首先確定當前python版本已經安裝bs4
然后就可以隨便調用bs4中的函數比如BeautifulSoup函數: from bs4 import BeautifulSoup
舉報
本教程帶您解開python爬蟲這門神奇技術的面紗
Copyright ? 2025 imooc.com All Rights Reserved | 京ICP備12003892號-11 京公網安備11010802030151號
購課補貼聯系客服咨詢優(yōu)惠詳情
慕課網APP您的移動學習伙伴
掃描二維碼關注慕課網微信公眾號
2016-09-13
安裝; ?windows 在 CMD 中,使用?
http://images2015.cnblogs.com/blog/740516/201609/740516-20160912231014211-2052713170.png
2016-09-13
2016-05-09
首先確定當前python版本已經安裝bs4
然后就可以隨便調用bs4中的函數比如BeautifulSoup函數: from bs4 import BeautifulSoup