課程
                    
                        /后端開(kāi)發(fā)
                        
                            /Python
                        
                        /Python開(kāi)發(fā)簡(jiǎn)單爬蟲(chóng)

本課程全部代碼python 3.6.3

from?bs4?import?BeautifulSoup
import?re
html_doc?=?"""
<html><head><title>The?Dormouse's?story</title></head>
<p?class="title"><b>The?Dormouse's?story</b></p>
<p?class="story">Once?upon?a?time?there?were?three?little?sisters;?and?their?names?were
<a?>Elsie</a>,
<a?>Lacie</a>?and
<a?>Tillie</a>;
and?they?lived?at?the?bottom?of?a?well.</p>
<p?class="story">...</p>
"""
soup?=?BeautifulSoup(html_doc,"html.parser",from_encoding="utf-8")
print("獲取所有連接")
links?=soup.find_all('a')
for?link?in?links:
????????print?(link.name,link['href'],link.get_text())

print("獲取lacie的連接")
link_node=soup.find('a',)
print(link_node.name,link_node['href'],link_node.get_text())

print("正則匹配")
link_node=soup.find('a',href=re.compile(r"ill"))
print(link_node.name,link_node['href'],link_node.get_text())

print("獲取P段落文字")
P_node=soup.find('p',class_="title")
print(P_node.name,P_node.get_text())

qq_戲時(shí)丶_0

2018-11-27

源自：Python開(kāi)發(fā)簡(jiǎn)單爬蟲(chóng) 6-4

關(guān)注問(wèn)題我要回答

1269

操作

收起

2 回答

龍宸軒
2018-12-24

python3.7.2(版本)

from bs4 import BeautifulSoup

import re?

html_doc = """

<html><head><title>The Dormouse's story</title></head>

<body>

The Dormouse's story

Once upon a time there were three little sisters; and their names were

<a class="sister" id="link1">Elsie</a>,

<a class="sister" id="link2">Lacie</a> and

<a class="sister" id="link3">Tillie</a>;

and they lived at the bottom of a well.

...

"""

soup = BeautifulSoup(

html_doc,

'html.parser',

from_encoding='utf8'

)

link = soup.find('a',class_='sister')

print(link.name,link.get_text())

0 回復(fù) 有任何疑惑可以回復(fù)我~

收起回答

weixin_慕沐5442772
2018-12-24

import?re
from?bs4?import?BeautifulSoup
html_doc?=?"""
<html><head><title>The?Dormouse's?story</title></head>
<body>
<p?class="title"><b>The?Dormouse's?story</b></p>

<p?class="story">Once?upon?a?time?there?were?three?little?sisters;?and?their?names?were
<a?>Elsie</a>,
<a?>Lacie</a>?and
<a?>Tillie</a>;
and?they?lived?at?the?bottom?of?a?well.</p>

<p?class="story">...</p>
"""
#soup?=?BeautifulSoup(html_doc,?'html.parser',from_encoding='utf-8')
soup?=?BeautifulSoup(html_doc,?'html.parser')
print('獲取所有的鏈接')
links?=?soup.find_all('a')
for?link_node?in?links:
???print(link_node.name,?link_node['href'],?link_node.get_text())
print('獲取Lacie的鏈接')
link_node?=?soup.find(?'a',?href?='http://example.com/lacie')
print(link_node.name,?link_node['href'],?link_node.get_text())
print('正則表達(dá)式匹配')
link_node?=?soup.find('a',href?=?re.compile(r'till'))
print(link_node.name,?link_node['href'],?link_node.get_text())
print('獲取P段落')
p_node?=?soup.find('p',class_='title')#因?yàn)閏lass是python關(guān)鍵字，所以要class屬性要寫(xiě)成class_
print(p_node.name,?p_node.get_text())