1 回答

TA貢獻1807條經(jīng)驗 獲得超9個贊
(即使我不理解我得到的圖像,因為我在網(wǎng)站上找不到它們,似乎爬蟲不是從網(wǎng)站的起始頁開始的)。
是的,你是對的。您的代碼不會從起始頁下載圖像,因為它從起始頁獲取的唯一內(nèi)容是所有錨點標(biāo)記元素,然后調(diào)用在起始頁上找到的每個錨點元素 -processElement()
response, err := http.Get(currWebsite)
if err != nil {
log.Fatalln("error on searching website")
}
defer response.Body.Close()
document, err := goquery.NewDocumentFromReader(response.Body)
if err != nil {
log.Fatalln("Error loading HTTP response body. ", err)
}
document.Find("a").Each(processElement) // Here
要從起始頁下載所有圖像,您應(yīng)該定義另一個函數(shù)來執(zhí)行獲取元素和下載圖像的工作,但是在函數(shù)中,您只需要獲取鏈接并在該鏈接上調(diào)用 -processUrl()imgprocessElement()hrefprocessUrl()
func processElement(index int, element *goquery.Selection) {
href, exists := element.Attr("href")
if exists && strings.HasPrefix(href, "http") {
crawlWebsite = href
processUrl(crawlWebsite)
}
}
func processUrl(crawlWebsite string) {
response, err := http.Get(crawlWebsite)
if err != nil {
log.Fatalf("error on current website")
}
defer response.Body.Close()
document, err := goquery.NewDocumentFromReader(response.Body)
if err != nil {
log.Fatal("Error loading HTTP response body.", err)
}
document.Find("img").Each(func(index int, element *goquery.Selection) {
imgSrc, exists := element.Attr("src")
if strings.HasPrefix(imgSrc, "http") && exists {
fileName := fmt.Sprintf("./images/img" + strconv.Itoa(imageCount) + ".jpg")
currWebsite := fmt.Sprint(imgSrc)
fmt.Println("[+]", currWebsite)
DownloadFile(fileName, currWebsite)
imageCount++
}
})
}
現(xiàn)在只需在處理所有鏈接之前從起始頁抓取圖像 -
func main() {
...
document, err := goquery.NewDocumentFromReader(response.Body)
if err != nil {
log.Fatalln("Error loading HTTP response body. ", err)
}
// First crawl images from start page url
processUrl(currWebsite)
document.Find("a").Each(processElement)
}
- 1 回答
- 0 關(guān)注
- 132 瀏覽
添加回答
舉報