The HELP dataset is an automatically created natural language inference (NLI) dataset that embodies the combination of lexical and logical inferences focusing on monotonicity (i.e., phrase replacement-based reasoning). The HELP (Ver.1.0) has 36K inference pairs consisting of upward monotone, downward monotone, non-monotone, conjunction, and disjunction.
output_en/
pmb_train_v1.0.tsv
If you would like to replicate the HELP dataset, try the following procedure:
git clone https://github.com/verypluming/HELP.git
cd HELP
pyenv virtualenv 3.4.6 help
pyenv activate help
pip install -r requirements.txt
python -c "import nltk; nltk.download('wordnet')"
Installing C&C parser and Parallel Meaning Bank (PMB)
Please download C&C, set it up, and create a file data/parser_location.txt
with the path to the C&C parser.
Then, please download PMB version 2.1.0 here and put it to data/
directry.
echo "candc:/path/to/candc-1.00/" > data/parser_location.txt
python scripts/create_dataset_PMB.py
If you use this dataset in any published research, please cite the following:
- Hitomi Yanaka, Koji Mineshima, Daisuke Bekki, Kentaro Inui, Satoshi Sekine, Lasha Abzianidze, and Johan Bos. HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning. Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM2019), Mineapolis, USA, 2019. arXiv
@InProceedings{yanaka-EtAl:2019:starsem,
author = {Yanaka, Hitomi and Mineshima, Koji and Bekki, Daisuke and Inui, Kentaro and Sekine, Satoshi and Abzianidze, Lasha and Bos, Johan},
title = {HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning},
booktitle = {Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM2019)},
year = {2019},
}
For questions and usage issues, please contact [email protected] .
This work is conducted in collaboration with RIKEN, Ochanomizu University, and University of Groningen. We thank the Parallel Meaning Bank (PMB).