This packages contains extensions for Dutch lemmatization.
Pipelines:
gigant_lemmatizer
: a pipe that uses the GiGaNT-Molex lexicon for lemmatization.
CLI commands:
nl-lemmatizer-util convert
: convert GiGaNT-Molex TSV file to a JSON lexicon for thegigant_lemmatizer
pipe.nl-lemmatizer-util extend-model
: add thegigant_lemmatizer
pipe to an existing pipeline.
Install this package and get the GiGaNT-Molex dataset from Instituut voor de Nederlandse Taal.
First convert the tab-separated file from the dataset:
nl-lemmatizer-util convert molex_22_02_2022.tsv/molex_22_02_2022.tsv gigant-molex.json
Then add the gigant-molex
pipe to an existing pipeline:
nl-lemmatizer-util extend-pipeline nl_core_news_lg gigant-molex.json nl_core_news_gigant
This package was developed by Daniël de Kok (Biaffine) and Jeroen van de Nieuwenhof (Tolkie).