首頁猿問如何更有效地注釋多個斯坦福...

如何更有效地注釋多個斯坦福 CoreNLP 核心文檔？

Java

回首憶惘然 2022-10-26 17:06:54

我通過 Stanford Corenlp 將大量字符串注釋為 CoreDocuments。StanfordCoreNLP 管道具有用于多線程注釋以優(yōu)化流程的內(nèi)部功能，但是據(jù)我所知，CoreDocument 對象在我運(yùn)行的版本中無法使用該功能，即 stanford-corenlp-full-2018-10-05。由于我無法制作 Pipelines Annotate CoreDocuments 集合，因此我嘗試通過將單個注釋放在多線程方法中來優(yōu)化各個注釋。我對多線程環(huán)境沒有任何問題。我按預(yù)期收到了所有結(jié)果，我唯一的缺點(diǎn)是時間消耗。我嘗試了大約 7 種不同的實(shí)現(xiàn)，這些是最快的 3 種：//ForkJoinPool is initialized in the main method in my applicationprivate static ForkJoinPool executor = new ForkJoinPool(Runtime.getRuntime().availableProcessors(), ForkJoinPool.defaultForkJoinWorkerThreadFactory, null, false); public static ConcurrentMap<String, CoreDocument> getMultipleCoreDocumentsWay1(Collection<String> str) { ConcurrentMap<String, CoreDocument> pipelineCoreDocumentAnnotations = new MapMaker().concurrencyLevel(2).makeMap(); str.parallelStream().forEach((str1) -> { CoreDocument coreDocument = new CoreDocument(str1); pipeline.annotate(coreDocument); pipelineCoreDocumentAnnotations.put(str1, coreDocument); System.out.println("pipelineCoreDocumentAnnotations size1: " + pipelineCoreDocumentAnnotations.size() + "\nstr size: " + str.size() + "\n"); }); return pipelineCoreDocumentAnnotations; }并行時間 1：336562 毫秒。并行時間 4：391556 毫秒。時間parallel7：491639 ms。老實(shí)說，如果管道本身可以以某種方式進(jìn)行多注釋，那么最大的好處是，但是只要我不知道如何實(shí)現(xiàn)這一點(diǎn)，我希望有人可以解釋我如何單獨(dú)優(yōu)化 CoreDocument 注釋。PS：將所有字符串混合到一個單獨(dú)的核心文檔中進(jìn)行注釋也不是我想要的，因為之后我需要單獨(dú)的核心文檔進(jìn)行比較。

查看完整描述

1 回答

瀟湘沐

TA貢獻(xiàn)1816條經(jīng)驗獲得超6個贊

我沒有計時，但你可以試試這個示例代碼（將測試字符串添加到字符串列表中）......它應(yīng)該同時適用于 4 個文檔：

package edu.stanford.nlp.examples;

import edu.stanford.nlp.pipeline.*;

import java.util.*;

import java.util.function.*;

import java.util.stream.*;

public class MultiThreadStringExample {

public static class AnnotationCollector<T> implements Consumer<T> {

List<T> annotations = new ArrayList<T>();

public void accept(T ann) {

annotations.add(ann);

}

public static void main(String[] args) throws Exception {

Properties props = new Properties();

props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,depparse");

props.setProperty("threads", "4");

StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

AnnotationCollector<Annotation> annCollector = new AnnotationCollector<Annotation>();

List<String> exampleStrings = new ArrayList<String>();

for (String exampleString : exampleStrings) {

pipeline.annotate(new Annotation(exampleString), annCollector);

}

Thread.sleep(10000);

List<CoreDocument> coreDocs =

annCollector.annotations.stream().map(ann -> new CoreDocument(ann)).collect(Collectors.toList());

for (CoreDocument coreDoc : coreDocs) {

System.out.println(coreDoc.tokens());

}

反對回復(fù) 2022-10-26

1 回答
0 關(guān)注
124 瀏覽

關(guān)注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如何更有效地注釋多個斯坦福 CoreNLP 核心文檔？

如何更有效地注釋多個斯坦福 CoreNLP 核心文檔？

1 回答

添加回答

如何更有效地注釋多個斯坦福 CoreNLP 核心文檔？

如何更有效地注釋多個斯坦福 CoreNLP 核心文檔？