A program to recognize self-acknowledged limitation sentences in biomedical articles
The repository contains the source code for the system described in the article Automatic recognition of self-acknowledged limitations in clinical research literature. The best performing rule-based system is presented (gov.nih.nlm.limitations.RuleBasedLimitationSentenceRecognizer
), as well as the rule-based baseline (gov.nih.nlm.limitations.RuleBasedLimitationSentenceRecognizerBaseline
).
To replicate the results, run gov.nih.nlm.limitations.RuleBasedLimitationSentenceRecognizer
with three arguments:
- DATA/XML: directory that contains the parsed XML of the test set
- DATA/limitation_sentences_final.txt: gold annotations
- Output file name (after the run, this file should match DATA/rule_based_test.out.txt)
The parsed XML is generated from PubMed Central XML using gov.nih.nlm.limitations.CorpusParser
.
Stanford CoreNLP model jar file that is needed for processing raw text for lexical and syntactic information (stanford-corenlp-3.3.1-models.jar
) is not included with the distribution due to its size. It can be downloaded from https://stanfordnlp.github.io/CoreNLP/ and copied to lib
directory.
- Halil Kilicoglu: [email protected]