我已經(jīng)在 python 中編寫了一個腳本,使用asyncio與aiohttp庫的關(guān)聯(lián)來解析彈出框中的名稱,該彈出框是在點(diǎn)擊本網(wǎng)站表格內(nèi)不同機(jī)構(gòu)信息的聯(lián)系信息按鈕時啟動的。該網(wǎng)頁顯示 513 頁的表格內(nèi)容。我too many file descriptors in select()在嘗試時遇到了這個錯誤,asyncio.get_event_loop()但是當(dāng)我遇到這個線程時,我可以看到有一個建議可以asyncio.ProactorEventLoop()用來避免這種錯誤,所以我使用了后者,但注意到,即使我遵守了建議,腳本也會收集僅從幾頁命名,直到引發(fā)以下錯誤。我怎樣才能解決這個問題?raise client_error(req.connection_key, exc) from excaiohttp.client_exceptions.ClientConnectorError: Cannot connect to host www.tursab.org.tr:443 ssl:None [The semaphore timeout period has expired]這是我到目前為止的嘗試:import asyncioimport aiohttpfrom bs4 import BeautifulSouplinks = ["https://www.tursab.org.tr/en/travel-agencies/search-travel-agency?sayfa={}".format(page) for page in range(1,514)]lead_link = "https://www.tursab.org.tr/en/displayAcenta?AID={}"async def get_links(url): async with asyncio.Semaphore(10): async with aiohttp.ClientSession() as session: async with session.get(url) as response: text = await response.text() result = await process_docs(text) return resultasync def process_docs(html): coros = [] soup = BeautifulSoup(html,"lxml") items = [itemnum.get("data-id") for itemnum in soup.select("#acentaTbl tr[data-id]")] for item in items: coros.append(fetch_again(lead_link.format(item))) await asyncio.gather(*coros)async def fetch_again(link): async with asyncio.Semaphore(10): async with aiohttp.ClientSession() as session: async with session.get(link) as response: text = await response.text() sauce = BeautifulSoup(text,"lxml") try: name = sauce.select_one("p > b").text except Exception: name = "" print(name)簡而言之,該process_docs()函數(shù)所做的就是data-id從每個頁面中收集數(shù)字,并將其作為此https://www.tursab.org.tr/en/displayAcenta?AID={}鏈接的前綴來重復(fù)使用,從而從彈出框中收集名稱。一個這樣的 id is8757和一個這樣的合格鏈接https://www.tursab.org.tr/en/displayAcenta?AID=8757。順便說一句,如果我將links變量中使用的最高數(shù)字更改為 20 或 30 左右,它會順利進(jìn)行。
添加回答
舉報
0/150
提交
取消