3 回答

TA貢獻(xiàn)1828條經(jīng)驗 獲得超3個贊
獲取
<li>...</li>
其中的元素<div class="pagination "><ul>
按類別排除最后一個
<li class="no-page">
解析“href”并構(gòu)建您的下一個 url 目的地。
抓取每個新的 url 目標(biāo)。

TA貢獻(xiàn)1821條經(jīng)驗 獲得超6個贊
我只想感謝所有回答我問題的人花時間回答我的問題。我找到了答案——或者至少是對我有用的答案——并決定分享,以防對其他人有幫助。
url = 'insert url'
re = requests.get(url)
soup = BeautifulSoup(re.content,'html.parser')
#look for pagination class
page = soup.find(class_='pagination')
#create list to include all page numbers
href=[]
#look for all 'li' tags as the users above suggested
links = page.findAll('li')
for link in links:
href += [link.find('a',href=True).text]
'''
href will now include all pages and the word Next.
So for instance it will look something like this:[1,2,3...,44,Next].
I want to get 44, which will be href[-2] and then convert that to an int for
a for loop. In the for loop add + 1 because it will iterate to i-1, instead of
i. For instance, if you're iterating (0,44), the last output of i will be 43,
which is why we +1
'''
for i in range(0, int(href[-2])+1):
new_url = url + str(1)

TA貢獻(xiàn)1853條經(jīng)驗 獲得超6個贊
循環(huán)的“骨架”可以如下所示:
url = 'http://url/?page={page}'
page = 1
while True:
soup = BeautifulSoup(requests.get(url.format(page=page)).content, 'html.parser')
# ...
# do we have next page?
next_page = soup.select_one('.next-page')
# no, so break from the loop
if not next_page:
break
page += 1
您可以有無限循環(huán),并且僅當(dāng)沒有下一頁(如果最后一頁上while True:沒有任何標(biāo)簽)時才會中斷循環(huán)。class="next-page"
添加回答
舉報