爬取的是Amazon中國,手機->手機通訊->ApplePhone中的商品標題和價格。其URL=https://www.amazon.cn/s/ref=s...我的python代碼如下:importrequestsfrombs4importBeautifulSoupimportre#用于HTML配合查找條件importtime#用于文件名的保存#獲取總頁面數量defget_total_page_number():user_agent='Mozilla/5.0(WindowsNT6.3;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/45.0.2454.101Safari/537.36'headers={'User-Agent':user_agent}#定義頭信息#尋找頁碼的URLurl='https://www.amazon.cn/s/ref=sa_menu_digita_l3_siphone?ie=UTF8&page=1&rh=n%3A665002051%2Cp_89%3AApple%2Cn%3A664978051'res=requests.get(url,headers=headers)#發(fā)送請求html=res.texthtml_soup=BeautifulSoup(html,"lxml")#建立soup對象,用于處理htmlpage_number_span=html_soup.find('h2',id='s-result-count')#查找id="s-result-count"的h2標簽page_number_code=page_number_span.text#讀取該標簽的文本信息number_list=re.findall(r'(\w*[0-9]+)\w',page_number_code)#使用正則表達式解析出文本中的3個數字total_page_number=(int(number_list[-1])/int(number_list[-2])+1)#計算得出總的頁碼returnint(total_page_number)#返回頁面數字#解析單頁面defparse_single_page(i):url_part1='https://www.amazon.cn/s/ref=sa_menu_digita_l3_siphone?ie=UTF8&page=%d'%i#定義URL動態(tài)前半部分url_part2='&rh=n%3A665002051%2Cp_89%3AApple%2Cn%3A664978051'#定義URL靜態(tài)后半部分url=url_part1+url_part2#拼接完整的URLprint('praseurl:%s'%url)#輸出URL信息user_agent='Mozilla/5.0(WindowsNT6.3;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/45.0.2454.101Safari/537.36'res=requests.get(url,headers=headers)#發(fā)送請求html=res.texthtml_soup=BeautifulSoup(html,"lxml")#建立soup對象,用于處理htmltag_list=html_soup.find_all('li',id=re.compile('^result.*'))#查找id以result開始的li標簽,返回列表#讀取列表中每一個標簽(一個標簽對應一個商品)fortag_infointag_list:#價格解析print(tag_info)price_code=tag_info.find('span',class_="a-size-basea-color-prices-pricea-text-bold")#若價格標簽不空則取出價格文字ifprice_code!=None:#解析商品標題title_code=tag_info.find('h2')#查找標題標簽title=title_code.text#取出標題標簽文字write_data(title,price)#每次解析完成寫入文件#將數據寫入文件defwrite_data(title,price):file_data=time.strftime('%Y-%m-%d',time.localtime(time.time()))#取當前文件日期用于文件命名fn=open('%s.txt'%file_data,'a+')#新建文件對象,以追加模式打開content=title+'\t'+price+'\n'#寫內容,標題和價格以tab分割,末尾增加換行符fn.write(content)#寫入文件fn.close()#解析多頁面并寫入文件defmain():total_page_number=get_total_page_number()#獲得頁面總數foriinrange(1,int(total_page_number)+1):parse_single_page(i)main()報的錯誤如下:AttributeErrorTraceback(mostrecentcalllast)in()51parse_single_page(i)52--->53main()inmain()47#解析多頁面并寫入文件48defmain():--->49total_page_number=get_total_page_number()#獲得頁面總數50foriinrange(1,int(total_page_number)+1):51parse_single_page(i)inget_total_page_number()9html_soup=BeautifulSoup(html,"lxml")#建立soup對象,用于處理html10page_number_span=html_soup.find('h2',id='s-result-count')#查找id="s-result-count"的h2標簽--->11page_number_code=page_number_span.text#讀取該標簽的文本信息12number_list=re.findall(r'(\w*[0-9]+)\w',page_number_code)#使用正則表達式解析出文本中的3個數字13total_page_number=(int(number_list[-1])/int(number_list[-2])+1)#計算得出總的頁碼AttributeError:'NoneType'objecthasnoattribute'text'我解決了一些問題,但是這個問題上網查了還就還是不能解決,請求大神的幫助,謝謝!我基本每行都有注釋,希望能有效幫助大神閱讀,小弟感謝!
按照一個仿例寫了一個爬取amazon網頁的程序,但是有諸多的錯誤,是在不明白,求助!
翻翻過去那場雪
2019-05-21 09:28:39