首頁猿問我的腳本在應(yīng)該異步運(yùn)行時遇到錯誤

我的腳本在應(yīng)該異步運(yùn)行時遇到錯誤

Python

Qyouu 2021-09-14 21:16:11

我已經(jīng)在 python 中編寫了一個腳本，使用asyncio與aiohttp庫的關(guān)聯(lián)來解析彈出框中的名稱，該彈出框是在點(diǎn)擊本網(wǎng)站表格內(nèi)不同機(jī)構(gòu)信息的聯(lián)系信息按鈕時啟動的。該網(wǎng)頁顯示 513 頁的表格內(nèi)容。我too many file descriptors in select()在嘗試時遇到了這個錯誤，asyncio.get_event_loop()但是當(dāng)我遇到這個線程時，我可以看到有一個建議可以asyncio.ProactorEventLoop()用來避免這種錯誤，所以我使用了后者，但注意到，即使我遵守了建議，腳本也會收集僅從幾頁命名，直到引發(fā)以下錯誤。我怎樣才能解決這個問題？raise client_error(req.connection_key, exc) from excaiohttp.client_exceptions.ClientConnectorError: Cannot connect to host www.tursab.org.tr:443 ssl:None [The semaphore timeout period has expired]這是我到目前為止的嘗試：import asyncioimport aiohttpfrom bs4 import BeautifulSouplinks = ["https://www.tursab.org.tr/en/travel-agencies/search-travel-agency?sayfa={}".format(page) for page in range(1,514)]lead_link = "https://www.tursab.org.tr/en/displayAcenta?AID={}"async def get_links(url): async with asyncio.Semaphore(10): async with aiohttp.ClientSession() as session: async with session.get(url) as response: text = await response.text() result = await process_docs(text) return resultasync def process_docs(html): coros = [] soup = BeautifulSoup(html,"lxml") items = [itemnum.get("data-id") for itemnum in soup.select("#acentaTbl tr[data-id]")] for item in items: coros.append(fetch_again(lead_link.format(item))) await asyncio.gather(*coros)async def fetch_again(link): async with asyncio.Semaphore(10): async with aiohttp.ClientSession() as session: async with session.get(link) as response: text = await response.text() sauce = BeautifulSoup(text,"lxml") try: name = sauce.select_one("p > b").text except Exception: name = "" print(name)簡而言之，該process_docs()函數(shù)所做的就是data-id從每個頁面中收集數(shù)字，并將其作為此https://www.tursab.org.tr/en/displayAcenta?AID={}鏈接的前綴來重復(fù)使用，從而從彈出框中收集名稱。一個這樣的 id is8757和一個這樣的合格鏈接https://www.tursab.org.tr/en/displayAcenta?AID=8757。順便說一句，如果我將links變量中使用的最高數(shù)字更改為 20 或 30 左右，它會順利進(jìn)行。

查看完整描述