Skip to content

Train a model and tag old french texts via a notebook using RNNTagger & Pie

Notifications You must be signed in to change notification settings

CristinaGHolgado/old-french-with-pie-rnntagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Lemmatizing medieval French with Pie & RNNTagger

This repository contains a couple of notebooks created to easily generate a langague model for Pie and RNNTagger for tagging and lemmatizing medieval French with no prior local installation and fastened training. Any training dataset can be used if the language-specific parameters are properly configured for each tool's parameters file. Also, it includes the possibility of tagging files from our Drive with the generated models.

Short description of the task

We've trained a model for medieval French with RNNTagger and Pie in order to tag a number of texts with the Cattex09 morphosyntactic labels. Two different corpora are used for training :

  • BFMGOLDLEM corpus, fully annotated in parts of speech (UD and Cattex POS tags) including a number of morphological labels. The lemmas in this corpus were previously standarized in a previous work. This corpus consists of 431,144 tokens distributed in 20 texts (36.1MB).
  • BFMGOLD corpus, where only a small number of texts include all lemmas. It contains 1,187,061 tokens distributed in 42 texts (75.4MB).

A complete description can be found here [French]

About

Train a model and tag old french texts via a notebook using RNNTagger & Pie

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published