首頁猿問 Apache Nutch 跳過...

Apache Nutch 跳過 URL 和截斷

Java

紅顏莎娜 2023-05-24 15:42:03

在我的 nutch-site.xml 中，我添加以下內(nèi)容以停止截斷；但是，在獲取過程中，出現(xiàn)以下錯誤。我希望它停止截斷并提供我需要的結(jié)果，我假設(shè) -1 值可以實現(xiàn)。我正在使用 2.2.1 版。有任何想法嗎？<property> <name>http.content.limit</name> <value>-1</value> <description>The length limit for downloaded content using the http protocol, in bytes. If this value is nonnegative (>=0), content longer than it will be truncated; otherwise, no truncation at all. Do not confuse this setting with the file.content.limit setting. </description></property>線程“main”中的異常 java.lang.RuntimeException：作業(yè)失?。簄ame=fetch，job_local1185573074_0001 在 org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:55) 在 org.apache.nutch.fetcher。 FetcherJob.run(FetcherJob.java:194) 在 org.apache.nutch.fetcher.FetcherJob.fetch(FetcherJob.java:219) 在 org.apache.nutch.fetcher.FetcherJob.run(FetcherJob.java:301) 在 org .apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 在 org.apache.nutch.fetcher.FetcherJob.main(FetcherJob.java:307)

查看完整描述

1 回答

FFIVE

TA貢獻(xiàn)1797條經(jīng)驗獲得超6個贊

我通過刪除http.content.limitnutch-site.xml 中的部分并添加parser.skip.truncated并將其設(shè)置為 false 來解決此問題。

<name>parser.skip.truncated</name>

<value>false</value>

<description>Boolean value for whether we should skip parsing for truncated documents. By default this

property is activated due to extremely high levels of CPU which parsing can sometimes take.

</description>

</property>

反對回復(fù) 2023-05-24

1 回答
0 關(guān)注
184 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

Apache Nutch 跳過 URL 和截斷

Apache Nutch 跳過 URL 和截斷

1 回答

添加回答