撒科打諢
2023-11-02 20:01:11
getURL() 函數(shù)從原始 URL 創(chuàng)建一個(gè)抓取 URL 的數(shù)組。getSubURL() 然后循環(huán)遍歷該數(shù)組并抓取所有這些頁面的 URL。目前,此代碼可以很好地輸出到控制臺(tái),但我不知道如何等待我的數(shù)據(jù)解析,以便我可以將所有收集的數(shù)據(jù)推送到單個(gè)數(shù)組。目前,當(dāng)我嘗試返回站點(diǎn)然后推送到數(shù)組時(shí),它只推送最后一個(gè)值。我相信這是一種promise.all(map) 的情況,但我不知道如何正確編寫而不出現(xiàn)錯(cuò)誤。理想情況下,我完成的抓取可以在另一個(gè)函數(shù)中調(diào)用。如果可以的話請(qǐng)看一下const cheerio = require('cheerio');const axios = require('axios');let URL = 'https://toscrape.com';const getURLS = async () => { try { const res = await axios.get(URL); const data = res.data; const $ = cheerio.load(data); const urlQueue = []; $("a[href^='http']").each((i, elem) => { const link = $(elem).attr('href'); if (urlQueue.indexOf(link) === -1) { urlQueue.push(link); } }); return urlQueue; } catch (err) { console.log(`Error fetching and parsing data: `, err); }};const getSubURLs = async () => { let urls = await getURLS(); try { //loop through each url in array for (const url of urls) { //fetch all html from the current url const res = await axios.get(url); const data = res.data; const $ = cheerio.load(data); //create object and push that url into that object let sites = {}; sites.url = url; let links = []; //scrape all links and save in links array $("a[href^='/']").each((i, elem) => { const link = $(elem).attr('href'); if (links.indexOf(link) === -1) { links.push(link); } //save scraped data in object sites.links = links; }); // returns list of {url:'url', links:[link1,link2,link3]} console.log(sites); } } catch (err) { console.log(`Error fetching and parsing data: `, err); }};
1 回答

Cats萌萌
TA貢獻(xiàn)1805條經(jīng)驗(yàn) 獲得超9個(gè)贊
不要認(rèn)為這是一個(gè)與 Promise 相關(guān)的問題。
您需要將您的數(shù)據(jù)收集sites到一個(gè)在循環(huán)外部初始化的數(shù)組中。然后當(dāng)getSubURLs()解析時(shí),它將解析為您的數(shù)組:
const getSubURLs = async() => {
let urls = await getURLS();
let siteList = [];
try {
for (const url of urls) {
// :
// :
// :
siteList.push(sites);
}
} catch (err) {
console.log(`Error fetching and parsing data: `, err);
}
return siteList; // array of objects
};
getSubURLs().then(console.log);
添加回答
舉報(bào)
0/150
提交
取消