This repository provides a universal semantic tagger which can be easily trained on the Parallel Meaning Bank.
A recent version of Python 3 with the packages listed in requirements.txt is expected.
$ ./run.sh --train [--model MODEL_FILE]
$ ./run.sh --predict --input INPUT_CONLL_FILE --output OUTPUT_CONLL_FILE [--model MODEL_FILE]
$ ./run.sh --train --predict --input INPUT_CONLL_FILE --output OUTPUT_CONLL_FILE [--model MODEL_FILE]
One can edit config.sh for fine control over the employed features and model architecture.
Note that trained models are stored/loaded using the directory defined in config.sh when the --model
option is not provided.
It is advisable to run a tokenizer such as Elephant on your additional data (if any).
Furthermore, if you have the means to identify multiword expressions, you can represent them as a single token using white spaces, tildes or hyphens (as in ice cream
, ice~cream
or ice-cream
).
-
L. Abzianidze and J. Bos. Towards Universal Semantic Tagging. In Proceedings of the 12th International Conference on Computational Semantics (IWCS) - Short papers. Association for Computational Linguistics, 2017.
-
J. Bjerva, B. Plank and J. Bos. Semantic Tagging with Deep Residual Networks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3531–3541. Association for Computational Linguistics, 2016.