Skip to content

aajanki/finnish-pos-accuracy

Repository files navigation

Benchmarking Finnish POS taggers and lemmatizers

This repository contains an evaluation of the accuracy of open source Finnish part-of-speech taggers and lemmatization algorihtms.

Tested algorithms

Test datasets

Setup

Install dependencies:

  • Python 3.9
  • libvoikko with Finnish morphology data files
  • clang (or other C++ compiler)
  • Dependencies needed to compile FinnPos and cg3
  • Java 11

Setup git submodules, create a Python 3.9 (must be 3.9 because the Turku parser is incompatible with more recent Python versions) virtual environment and download test data and models by running the following commands:

git submodule init
git submodule update

python3.9 -m venv venv
source venv/bin/activate
pip install wheel
pip install -r requirements.txt

# Compile FinnPos
(cd models/FinnPos/src && make -j 4)

# Compile cg3 in models/cg3
# See https://visl.sdu.dk/cg3/chunked/installation.html

# Compile Raudikko
(cd models/raudikko && ./gradlew shadowJar)

./download_data.sh
./download_models.sh

Run

./run.sh

The numerical results will be saved in results/evaluation.csv, POS and lemma errors made by each model will be saved in results/errorcases, and plots will be saved in results/images.

Results

Lemmatization

Lemmatization speed

Execution duration as a function of the F1 score on the concatenated data. Larger values are better on both axes. Notice that the Y-axis is on log scale.

The execution duration is measured as a batched evaluation (a batch contains all sentences from one dataset) on a 4 core CPU. Some methods can be run on a GPU which most likely would improve their performance, but I haven't tested that.

Lemmatization error rates

Lemmatization F1 scores for the benchmarked algorithms on the test datasets.

Part-of-speech tagging

Part-of-speech speed

Execution duration as a function of the POS F1 score on the concatenated data.

Note that FinnPos and Voikko do not make a distinction between auxiliary and main verbs and therefore their performance suffers by 4-5% in this evaluation as they mislabel all AUX tags as VERBs.

Part-of-speech error rates

Part-of-speech F1 scores for the benchmarked algorithms.

Simplemma does not include a POS tagging feature.

About

Evaluating accuracy of Finnish part-of-speech taggers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published