首頁猿問網(wǎng)頁抓取 Google Domains

網(wǎng)頁抓取 Google Domains

Python

肥皂起泡泡 2021-12-17 14:45:22

我試圖從前 100 個(gè)結(jié)果中獲取域列表：例如：abc.com/xxxx/dddd 域應(yīng)該是：abc.com我正在使用以下代碼：import timefrom bs4 import BeautifulSoupimport requestssearch=input("What do you want to ask: ")search=search.replace(" ","+")link="https://www.google.com/search?q="+searchprint(link)headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}source=requests.get(link, headers=headers).textsoup=BeautifulSoup(source,"html.parser")soup=BeautifulSoup(source,"html.parser")但是，我不知道如何僅選擇域，也不知道如何指定 100 個(gè)結(jié)果。當(dāng)我寫soup.text我只得到：'te - Pesquisa Google(function(){window.google={kEI:\'jsCaXM3AHM6g5OUP4eyT2A0\',kEXPI:\'31\',authuser:0,kscs:\'c9c918f0_jsCaXM3AHM6g5OUP4eyT2A0\',kGL:\'BR\'};google.sn=\'web\';google.kHL=\'pt-BR\';})();(function(){google.lc=[];google.li=0;google.getEI=function(a){for(var b;a&&(!a.getAttribute||!(b=a.getAttribute("eid")));)a=a.parentNode;return b||google.kEI};google.getLEI=function(a){for(var b=null;a&&(!a.getAttribute||!(b=a.getAttribute("leid")));)a=a.parentNode;return b};google.https=function(){return"https:"==window.location.protocol};google.ml=function(){return null};google.time=function()

查看完整描述

1 回答

qq_笑_17

TA貢獻(xiàn)1818條經(jīng)驗(yàn) 獲得超7個(gè)贊

獲得 100 個(gè)結(jié)果

您必須逐頁抓取，直到它有 100 個(gè)結(jié)果。假設(shè) 要廢棄的關(guān)鍵字beautiful+girls URL 適用于像這樣的第 2 頁https://www.google.com/search?q=beautiful+girls&start=10

僅獲取域

首先，您必須使用“srg”類獲取所有 div（查看源代碼后，我看到所有鏈接都在此）

srg_divs = soup.findAll("div", {"class": "srg"})

然后你會(huì)發(fā)現(xiàn)所有的標(biāo)簽

out = ''

for div in srg_divs:

links = div.find_all('a', href=True)

for a in links:

# url to domain

parsed_uri = urlparse(a['href'])

domain = '{uri.netloc}'.format(uri=parsed_uri)

# exclude googleusercontent.com

if 'googleusercontent' in domain or domain == '':

continue

out += domain + '\n'

反對(duì) 回復(fù) 2021-12-17

1 回答
0 關(guān)注
191 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

網(wǎng)頁抓取 Google Domains

網(wǎng)頁抓取 Google Domains

1 回答

添加回答