Issue when using corenlp for processing large corpus #1169

HiXiaochen · 2021-07-12T03:09:23Z

I divided the large English corpus into several subsets and ran multiple CorenLp commands simultaneously, but the following error always occurs after a period of time:
"""
Exception in thread "main" java.lang.RuntimeException: Error making document
at edu.stanford.nlp.coref.CorefSystem.annotate(CorefSystem.java:55)
at edu.stanford.nlp.pipeline.CorefAnnotator.annotate(CorefAnnotator.java:160)
at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:76)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:641)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:651)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1249)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1083)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1366)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1418)
Caused by: java.lang.IllegalArgumentException
at edu.stanford.nlp.semgraph.SemanticGraph.parentPairs(SemanticGraph.java:730)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.advance(GraphRelation.java:325)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.initialize(GraphRelation.java:1103)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.(GraphRelation.java:1084)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.(GraphRelation.java:310)
at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT.searchNodeIterator(GraphRelation.java:310)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChildIter(NodePattern.java:337)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.(NodePattern.java:332)
at edu.stanford.nlp.semgraph.semgrex.NodePattern.matcher(NodePattern.java:293)
at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.(CoordinationPattern.java:146)
at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern.matcher(CoordinationPattern.java:120)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChild(NodePattern.java:356)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.goToNextNodeMatch(NodePattern.java:455)
at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.matches(NodePattern.java:572)
at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.find(SemgrexMatcher.java:193)
at edu.stanford.nlp.coref.data.Mention.findDependentVerb(Mention.java:1099)
at edu.stanford.nlp.coref.data.Mention.setDiscourse(Mention.java:318)
at edu.stanford.nlp.coref.data.Mention.process(Mention.java:235)
at edu.stanford.nlp.coref.data.Mention.process(Mention.java:241)
at edu.stanford.nlp.coref.data.DocumentPreprocessor.fillMentionInfo(DocumentPreprocessor.java:341)
at edu.stanford.nlp.coref.data.DocumentPreprocessor.initializeMentions(DocumentPreprocessor.java:169)
at edu.stanford.nlp.coref.data.DocumentPreprocessor.preprocess(DocumentPreprocessor.java:62)
at edu.stanford.nlp.coref.data.DocumentMaker.makeDocument(DocumentMaker.java:92)
at edu.stanford.nlp.coref.data.DocumentMaker.makeDocument(DocumentMaker.java:64)
at edu.stanford.nlp.coref.CorefSystem.annotate(CorefSystem.java:53)
... 8 more
"""
Is this due to memory constraints?
My parameter setting is:
"java -mx64g -cp "$DATA/corenlp/stanford-corenlp-4.1.0/" edu.stanford.nlp.pipeline.StanfordCoreNLP $"
and my command is:
sh ./corenlp.sh -fileList $DATA/${SPLIT}_path.txt
-outputDirectory $DATA/output -outputFormat json
-annotators tokenize,ssplit,pos,lemma,ner,depparse,parse,coref
Besides, What should I set the -mx parameter to？

AngledLuffa · 2021-07-12T04:13:03Z

I think it is likely that will be enough memory, unless the documents are truly huge. Can you send us a document which causes the problem so we can reproduce it?

…

On Sun, Jul 11, 2021, 8:09 PM LXCCC ***@***.***> wrote: I divided the large English corpus into several subsets and ran multiple CorenLp commands simultaneously, but the following error always occurs after a period of time: """ Exception in thread "main" java.lang.RuntimeException: Error making document at edu.stanford.nlp.coref.CorefSystem.annotate(CorefSystem.java:55) at edu.stanford.nlp.pipeline.CorefAnnotator.annotate(CorefAnnotator.java:160) at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:76) at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:641) at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:651) at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1249) at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1083) at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1366) at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1418) Caused by: java.lang.IllegalArgumentException at edu.stanford.nlp.semgraph.SemanticGraph.parentPairs(SemanticGraph.java:730) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.advance(GraphRelation.java:325) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.initialize(GraphRelation.java:1103) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.(GraphRelation.java:1084) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.(GraphRelation.java:310) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT.searchNodeIterator(GraphRelation.java:310) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChildIter(NodePattern.java:337) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.(NodePattern.java:332) at edu.stanford.nlp.semgraph.semgrex.NodePattern.matcher(NodePattern.java:293) at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.(CoordinationPattern.java:146) at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern.matcher(CoordinationPattern.java:120) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChild(NodePattern.java:356) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.goToNextNodeMatch(NodePattern.java:455) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.matches(NodePattern.java:572) at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.find(SemgrexMatcher.java:193) at edu.stanford.nlp.coref.data.Mention.findDependentVerb(Mention.java:1099) at edu.stanford.nlp.coref.data.Mention.setDiscourse(Mention.java:318) at edu.stanford.nlp.coref.data.Mention.process(Mention.java:235) at edu.stanford.nlp.coref.data.Mention.process(Mention.java:241) at edu.stanford.nlp.coref.data.DocumentPreprocessor.fillMentionInfo(DocumentPreprocessor.java:341) at edu.stanford.nlp.coref.data.DocumentPreprocessor.initializeMentions(DocumentPreprocessor.java:169) at edu.stanford.nlp.coref.data.DocumentPreprocessor.preprocess(DocumentPreprocessor.java:62) at edu.stanford.nlp.coref.data.DocumentMaker.makeDocument(DocumentMaker.java:92) at edu.stanford.nlp.coref.data.DocumentMaker.makeDocument(DocumentMaker.java:64) at edu.stanford.nlp.coref.CorefSystem.annotate(CorefSystem.java:53) ... 8 more """ Is this due to memory constraints? My parameter setting is: "java -mx64g -cp "$DATA/corenlp/stanford-corenlp-4.1.0/*" edu.stanford.nlp.pipeline.StanfordCoreNLP $*" and my command is: sh ./corenlp.sh -fileList $DATA/${SPLIT}_path.txt -outputDirectory $DATA/output -outputFormat json -annotators tokenize,ssplit,pos,lemma,ner,depparse,parse,coref Besides, What should I set the -mx parameter to？ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1169>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWMWLTKE3Z5Z3OSVXSDTXJMHLANCNFSM5AF6VSHA> .

HiXiaochen · 2021-07-12T05:14:22Z

I think it is likely that will be enough memory, unless the documents are truly huge. Can you send us a document which causes the problem so we can reproduce it?
…
On Sun, Jul 11, 2021, 8:09 PM LXCCC @.*> wrote: I divided the large English corpus into several subsets and ran multiple CorenLp commands simultaneously, but the following error always occurs after a period of time: """ Exception in thread "main" java.lang.RuntimeException: Error making document at edu.stanford.nlp.coref.CorefSystem.annotate(CorefSystem.java:55) at edu.stanford.nlp.pipeline.CorefAnnotator.annotate(CorefAnnotator.java:160) at edu.stanford.nlp.pipeline.AnnotationPipeline.annotate(AnnotationPipeline.java:76) at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:641) at edu.stanford.nlp.pipeline.StanfordCoreNLP.annotate(StanfordCoreNLP.java:651) at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1249) at edu.stanford.nlp.pipeline.StanfordCoreNLP.processFiles(StanfordCoreNLP.java:1083) at edu.stanford.nlp.pipeline.StanfordCoreNLP.run(StanfordCoreNLP.java:1366) at edu.stanford.nlp.pipeline.StanfordCoreNLP.main(StanfordCoreNLP.java:1418) Caused by: java.lang.IllegalArgumentException at edu.stanford.nlp.semgraph.SemanticGraph.parentPairs(SemanticGraph.java:730) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.advance(GraphRelation.java:325) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.initialize(GraphRelation.java:1103) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$SearchNodeIterator.(GraphRelation.java:1084) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT$1.(GraphRelation.java:310) at edu.stanford.nlp.semgraph.semgrex.GraphRelation$DEPENDENT.searchNodeIterator(GraphRelation.java:310) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChildIter(NodePattern.java:337) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.(NodePattern.java:332) at edu.stanford.nlp.semgraph.semgrex.NodePattern.matcher(NodePattern.java:293) at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern$CoordinationMatcher.(CoordinationPattern.java:146) at edu.stanford.nlp.semgraph.semgrex.CoordinationPattern.matcher(CoordinationPattern.java:120) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.resetChild(NodePattern.java:356) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.goToNextNodeMatch(NodePattern.java:455) at edu.stanford.nlp.semgraph.semgrex.NodePattern$NodeMatcher.matches(NodePattern.java:572) at edu.stanford.nlp.semgraph.semgrex.SemgrexMatcher.find(SemgrexMatcher.java:193) at edu.stanford.nlp.coref.data.Mention.findDependentVerb(Mention.java:1099) at edu.stanford.nlp.coref.data.Mention.setDiscourse(Mention.java:318) at edu.stanford.nlp.coref.data.Mention.process(Mention.java:235) at edu.stanford.nlp.coref.data.Mention.process(Mention.java:241) at edu.stanford.nlp.coref.data.DocumentPreprocessor.fillMentionInfo(DocumentPreprocessor.java:341) at edu.stanford.nlp.coref.data.DocumentPreprocessor.initializeMentions(DocumentPreprocessor.java:169) at edu.stanford.nlp.coref.data.DocumentPreprocessor.preprocess(DocumentPreprocessor.java:62) at edu.stanford.nlp.coref.data.DocumentMaker.makeDocument(DocumentMaker.java:92) at edu.stanford.nlp.coref.data.DocumentMaker.makeDocument(DocumentMaker.java:64) at edu.stanford.nlp.coref.CorefSystem.annotate(CorefSystem.java:53) ... 8 more """ Is this due to memory constraints? My parameter setting is: "java -mx64g -cp "$DATA/corenlp/stanford-corenlp-4.1.0/" edu.stanford.nlp.pipeline.StanfordCoreNLP $" and my command is: sh ./corenlp.sh -fileList $DATA/${SPLIT}_path.txt -outputDirectory $DATA/output -outputFormat json -annotators tokenize,ssplit,pos,lemma,ner,depparse,parse,coref Besides, What should I set the -mx parameter to？ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1169>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWMWLTKE3Z5Z3OSVXSDTXJMHLANCNFSM5AF6VSHA .

Thanks for your reply！
I have sent the document as well as the result of my corenlp to [email protected].

AngledLuffa · 2022-08-13T07:27:43Z

https://nlp.stanford.edu/software/stanford-corenlp-4.5.0b.zip might have a fix for this issue?

AngledLuffa · 2022-10-21T22:23:25Z

#1296 seems fixed, and this should be the same issue

HiXiaochen closed this as completed Jul 12, 2021

HiXiaochen reopened this Jul 12, 2021

AngledLuffa mentioned this issue Aug 13, 2022

Exception thrown for operation attempted on unknown vertex #1296

Closed

AngledLuffa closed this as completed Oct 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue when using corenlp for processing large corpus #1169

Issue when using corenlp for processing large corpus #1169

HiXiaochen commented Jul 12, 2021

AngledLuffa commented Jul 12, 2021 via email

HiXiaochen commented Jul 12, 2021

AngledLuffa commented Aug 13, 2022

AngledLuffa commented Oct 21, 2022

Issue when using corenlp for processing large corpus #1169

Issue when using corenlp for processing large corpus #1169

Comments

HiXiaochen commented Jul 12, 2021

AngledLuffa commented Jul 12, 2021 via email

HiXiaochen commented Jul 12, 2021

AngledLuffa commented Aug 13, 2022

AngledLuffa commented Oct 21, 2022