各位高手:本人小白,最近在學(xué)習(xí)爬蟲技術(shù),以攜程網(wǎng)機(jī)票作為起步對(duì)象。 過程中發(fā)現(xiàn)使用子標(biāo)簽函數(shù)并不能返回所有的子標(biāo)簽內(nèi)容(如圖),父標(biāo)簽是內(nèi)容是<div class='flight-list">, 四個(gè)子標(biāo)簽內(nèi)容均為 <div class>...</div> ,但實(shí)際能夠取數(shù)的只有前兩個(gè),請(qǐng)問這是為什么,并如何解決呢?感激不盡!代碼:from selenium import webdriverfrom bs4 import BeautifulSoupimport timebrower = webdriver.PhantomJS(executable_path='D:/phantomjs-2.1.1-windows/phantomjs-2.1.1-windows/bin/phantomjs')try:? html_infor = brower.get("https://flights.ctrip.com/itinerary/roundtrip/KHH-WUH?date=2018-11-21,2018-11-21&portingToken=570b1bdc855c4eaba0654eb83e9923f7")? time.sleep(20)? pageSource = brower.page_source ###網(wǎng)頁(yè)加載信息的實(shí)體化? bsObj = BeautifulSoup(pageSource)? ###放進(jìn)美湯? for child in bsObj.find("div",{"class":"flight-list"}).children:??? print(child)finally:??? brower.close()
1 回答

MyFray
TA貢獻(xiàn)2條經(jīng)驗(yàn) 獲得超0個(gè)贊
上面的格式太亂了,代碼我在這里重新打一下
for?i?in?range(20): ????js?=?"var?q=document.documentElement.scrollTop={}" ????js?=?js.format((i+1)?*?400) ????self.driver.execute_script(js) ????time.sleep(0.1)
添加回答
舉報(bào)
0/150
提交
取消