首頁猿問 BeautifulSoup的Res...

BeautifulSoup的ResultSet，如何遍歷全部內(nèi)容？

Python

大話西游666 2019-02-18 00:38:36

目標網(wǎng)頁https://www.w3cschool.cn/code... 這個是抓取html def getHtml(url): re = requests.get(url) return re.text index = getHtml(url) index 這個是解析html的方法 def parseHtml(html): soup = BeautifulSoup(index,'html.parser') #soup lessonList= soup.find('div',class_='codecamplist-catalog').find_all('a') return lessonList lessonList = parseHtml(index) lessonList 最后得到的lessonList 是bs4.element.ResultSet 格式 [<a title="Say Hello to HTML Element"> 開始學習HTML標簽</a>, <a title="Headline with the h2 Element"> HTML 學習h2標簽</a>, <a title="Inform with the Paragraph Element"> HTML 學習p標簽</a>, <a title="Uncomment HTML"> 刪除HTML的注釋</a>] 請問一下這樣的格式的數(shù)據(jù)怎么解析呀目標是把里面的鏈接和title 保存成csv格式對應的Tag格式的數(shù)據(jù)只能找到第一個，使用Find_all方法又會報錯。 def getLesson(lessonList): for i in lessonList: lesson={} try: lesson['title'] = i.find('a')['href'].lstrip('//') lesson['name']= i.find('a')['title'] except: print('error') return lesson getLesson(lessonList) # 當上面是 lessonList= soup.find_all('div',class_='codecamplist-catalog') # .find_all('a') 時為什么只能輸出一條呢結(jié)果 {'name': 'Say Hello to HTML Element', 'title': 'www.w3cschool.cn/codecamp/say-hello-to-html-element.html'}

查看完整描述