首頁猿問 Scrapy：...

Scrapy： twisted.internet.error.ReactorNotRestar

Python

哆啦的時光機 2022-08-11 17:40:20

我試圖從腳本中運行我的刮擦。我正在使用CramplerProcess，我只有一個蜘蛛可以運行。我已經(jīng)從這個錯誤中卡了一段時間，我已經(jīng)嘗試了很多事情來更改設(shè)置，但是每次我運行蜘蛛時，我都會得到twisted.internet.error.ReactorNotRestartable我一直在尋找解決這個錯誤，我相信你只有在嘗試多次調(diào)用process.start（）時才會得到這個錯誤。但我沒有。這是我的代碼：import scrapyfrom scrapy.utils.log import configure_loggingfrom scrapyprefect.items import ScrapyprefectItemfrom scrapy.crawler import CrawlerProcessfrom scrapy.utils.project import get_project_settingsclass SpiderSpider(scrapy.Spider): name = 'spider' start_urls = ['http://www.nigeria-law.org/A.A.%20Macaulay%20v.%20NAL%20Merchant%20Bank%20Ltd..htm'] def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) def parse(self, response): item = ScrapyprefectItem() ... yield itemprocess = CrawlerProcess(settings=get_project_settings())process.crawl('spider')process.start()錯誤：Traceback (most recent call last): File "/Users/pluggle/Documents/Upwork/scrapyprefect/scrapyprefect/spiders/spider.py", line 59, in <module> process.start() File "/Users/pluggle/Documents/Upwork/scrapyprefect/venv/lib/python3.7/site-packages/scrapy/crawler.py", line 309, in start reactor.run(installSignalHandlers=False) # blocking call File "/Users/pluggle/Documents/Upwork/scrapyprefect/venv/lib/python3.7/site-packages/twisted/internet/base.py", line 1282, in run self.startRunning(installSignalHandlers=installSignalHandlers) File "/Users/pluggle/Documents/Upwork/scrapyprefect/venv/lib/python3.7/site-packages/twisted/internet/base.py", line 1262, in startRunning ReactorBase.startRunning(self) File "/Users/pluggle/Documents/Upwork/scrapyprefect/venv/lib/python3.7/site-packages/twisted/internet/base.py", line 765, in startRunning raise error.ReactorNotRestartable()twisted.internet.error.ReactorNotRestartable

查看完整描述

2 回答

慕姐4208626

TA貢獻1852條經(jīng)驗獲得超7個贊

雷納爾多，

非常感謝 - 你保存了我的項目！

你把我推向了這個想法，這可能是因為你有這個片段，如果腳本在蜘蛛定義的同一個文件中啟動該過程。因此，每次 Scrapy 導(dǎo)入您的蜘蛛定義時都會執(zhí)行它。我不是廢料方面的專家，但可能它在內(nèi)部做了幾次，因此我們遇到了這個錯誤問題。

您的建議完全解決了問題！

另一種方法是將蜘蛛類定義和運行它的腳本分開?？赡?，這是 Scrapy 假設(shè)的方法，這就是為什么在腳本文檔中的運行蜘蛛中，它甚至沒有提到此檢查。__name__

所以我做的是遵循：

在項目根目錄中，我有站點文件夾，在其中我有site_spec.py文件。這只是一個實用程序文件，其中包含一些目標(biāo)站點特定信息（URL結(jié)構(gòu)等）。我在這里提到它只是為了向您展示如何將各種實用程序模塊導(dǎo)入到蜘蛛類定義中;
在項目根目錄中，我有蜘蛛文件夾，my_spider.py類定義。在該文件中，我導(dǎo)入site_spec.py帶有指令的文件：

from sites import site_spec

值得一提的是，運行蜘蛛（您提供的那個）的腳本已從my_spider.py文件的類定義中刪除。另外，請注意，我導(dǎo)入了我的site_spec.py文件，其中包含與 run.py 文件相關(guān)的路徑（見下文），但與類定義文件無關(guān)，其中此指令是按照人們的期望發(fā)布的（python相對導(dǎo)入，我猜）

最后，在項目根目錄中，我有 run.py 文件，從腳本中運行刮擦：

from scrapy.crawler import CrawlerProcess

from spiders.my_spider import MySpider # this is our friend in subfolder **spiders**

from scrapy.utils.project import get_project_settings

# Run that thing!

process = CrawlerProcess(get_project_settings())

process.crawl(MySpider)

process.start() # the script will block here until the crawling is finished

通過這種設(shè)置，我終于能夠擺脫這個扭曲的.internet.error.ReactorNotRestartable。

反對回復(fù) 2022-08-11

蝴蝶刀刀

TA貢獻1801條經(jīng)驗獲得超8個贊

好。我解決了它。所以我認(rèn)為，在管道中，當(dāng)刮刀進入open_spider時，它會再次運行 spider.py，并第二次調(diào)用process.start（）。

為了解決這個問題，我在蜘蛛中添加了這個，所以process.start（）只有在你運行蜘蛛時才會執(zhí)行：

if __name__ == '__main__':
    process = CrawlerProcess(settings=get_project_settings())
    process.crawl('spider')
    process.start()

反對回復(fù) 2022-08-11

2 回答
0 關(guān)注
613 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

Scrapy： twisted.internet.error.ReactorNotRestar

Scrapy： twisted.internet.error.ReactorNotRestar

2 回答

添加回答