This repository provides a universal semantic tagger which can be easily trained on the Parallel Meaning Bank.
It was developed as part of the master's thesis titled Universal semantic tagging methods and their applications, submitted at both Saarland University and the University of Groningen. The results there reported can be reproduced by running the script job.sh
and defining the appropriate configuration options.
A recent version of Python 3 with the packages listed in requirements.txt is expected.
$ ./run.sh --train [--model MODEL_FILE]
$ ./run.sh --predict --input INPUT_CONLL_FILE --output OUTPUT_CONLL_FILE [--model MODEL_FILE]
$ ./run.sh --train --predict --input INPUT_CONLL_FILE --output OUTPUT_CONLL_FILE [--model MODEL_FILE]
One can edit config.sh for fine control over the employed features and model architecture.
It is recommended that you edit config.sh in order to use models which are suitable for your system, especially when not using a GPU for computations.
Note that trained models are stored/loaded using the directory defined in config.sh when the --model
option is not provided.
It is advisable to run a tokenizer such as Elephant on your additional data (if any).
Furthermore, if you have the means to identify multiword expressions, you can represent them as a single token using white spaces, tildes or hyphens (as in ice cream
, ice~cream
or ice-cream
).
-
L. Abzianidze and J. Bos. Towards Universal Semantic Tagging. In Proceedings of the 12th International Conference on Computational Semantics (IWCS) - Short papers. Association for Computational Linguistics, 2017.
-
J. Bjerva, B. Plank and J. Bos. Semantic Tagging with Deep Residual Networks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3531–3541. Association for Computational Linguistics, 2016.