Skip to content

chartes/deucalion-model-af

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deucalion Model Ancien Français ENC

DOI

Model files

Because models are large files, only Markdown is kept in this repository. Find the latest model in the Release tab

Cite As

@software{clerice_thibault_2019_3237455,
  author       = {Camps, Jean-Baptiste and
                  Clérice, Thibault and
                  Pinche, Ariane and
                  Ing, Lucence and
                  Duval, Frédéric and
                  Kanaoka, Naomi},
  title        = {Deucalion, Modèle Ancien Francais (0.3.0)},
  month        = dec,
  year         = 2020,
  publisher    = {Zenodo},
  version      = {v0.3.0},
  doi          = {10.5281/zenodo.4320487},
  url          = {https://doi.org/10.5281/zenodo.4320487}
}

Credits

This model can be used with Pie-Extended and tested on Deucalion.

This mosdel is trained on Pie by Enrique Manjavacas (@emanjavacas) and Mike Kestemont (@mikekestemont) DOI.

Information about the model

Scores

More details:

lemma

accuracy precision recall support
all 0.9747 0.7284 0.726 73507
known-tokens 0.9831 0.8969 0.8979 71726
unknown-tokens 0.6367 0.4327 0.4326 1781
ambiguous-tokens 0.9777 0.7723 0.7752 44987
unknown-targets 0.0751 0.0408 0.0408 253

POS

accuracy precision recall support
all 0.9739 0.8129 0.787 73507
known-tokens 0.9775 0.8325 0.8098 71726
unknown-tokens 0.8293 0.528 0.5079 1781
ambiguous-tokens 0.9713 0.8126 0.8007 49942

NOMB

accuracy precision recall support
all 0.966 0.9589 0.9576 28824
known-tokens 0.9709 0.9635 0.9627 27536
unknown-tokens 0.8626 0.8505 0.7935 1288
ambiguous-tokens 0.9538 0.9473 0.9481 14032

DEGRE

accuracy precision recall support
all 0.9842 0.8266 0.7396 28824
known-tokens 0.9865 0.8475 0.7554 27536
unknown-tokens 0.9356 0.5693 0.4783 1288
ambiguous-tokens 0.9524 0.8443 0.7695 5863

GENRE

accuracy precision recall support
all 0.9608 0.9276 0.8738 28824
known-tokens 0.9671 0.9332 0.8815 27536
unknown-tokens 0.8245 0.6139 0.6159 1288
ambiguous-tokens 0.9428 0.9156 0.863 13608

CAS

accuracy precision recall support
all 0.9457 0.9057 0.9052 28824
known-tokens 0.9514 0.9105 0.9106 27536
unknown-tokens 0.8238 0.6096 0.605 1288
ambiguous-tokens 0.9301 0.9036 0.9054 16647

TEMPS

accuracy precision recall support
all 0.9854 0.945 0.9408 28824
known-tokens 0.9886 0.9584 0.9585 27536
unknown-tokens 0.9169 0.8372 0.8109 1288
ambiguous-tokens 0.9644 0.9331 0.9527 4638

PERS

accuracy precision recall support
all 0.9872 0.8732 0.7848 28824
known-tokens 0.9897 0.8782 0.7906 27536
unknown-tokens 0.9325 0.8832 0.8721 1288
ambiguous-tokens 0.9756 0.8709 0.7839 8648

MODE

accuracy precision recall support
all 0.9897 0.9086 0.8815 28824
known-tokens 0.9932 0.9284 0.9066 27536
unknown-tokens 0.9146 0.7783 0.7381 1288
ambiguous-tokens 0.9722 0.8739 0.9003 4603

Corpora

The model was trained on the following corpora :

  • Geste: un corpus de chansons de geste, dir. Jean-Baptiste Camps, avec la collab. d'Elena Albarran, Alice Cochet & Lucence Ing, Paris, 2016-…, http:https://github.com/Jean-Baptiste-Camps/Geste.
  • Édition nativement numérique du recueil hagiographique "Li Seint Confessor" de Wauchier de Denain d'après le manuscrit 412 de la Bibliothèque nationale de France, éd. Ariane Pinche, Lyon, en cours (only Dialogues and La vie de Saint Martin are used right now. Data are closed source until publication of the PhD thesis)
  • Les Institutes de Justinien en français, éd. F. Olivier-Martin (1935), éd. revue par F. Duval, lemmatisée par F. Duval et L. Ing. Paris, 2018.
  • Chrétien de Troyes: Cligès, Erec, Lancelot, Perceval, Yvain -- Manuscrit P (BnF Fr. 794), éd. P. Kunstmann (2009), annotation revue par J.B. Camps et L. Ing (2017).

Annotations

The annotations are made according to the following reference lists:

  • lemma: Adolf Tobler et Erhard Friedrich Lommatzsch, Altfranzösisches Wörterbuch: édition électronique, éd. Peter Blumenthal et Achim Stein, Stuttgart, F. Steiner, 2002.
  • POS and morph: Sophie Prévost, Céline Guillot, Alexei Lavrentiev et Serge Heiden, Jeu d’étiquettes morphosyntaxiques CATTEX2009-max, Lyon, 2013, http:https://bfm.ens-lyon.fr/IMG/pdf/Cattex2009_2.0.pdf.

More information on the annotation practice can be found in the wiki of the Geste corpus: https://github.com/Jean-Baptiste-Camps/Geste/wiki.

Sample

Sample from annotation:

<<<<<<< HEAD

form	lemma	POS	morph
G'	je	PROper	PERS.=1|NOMB.=s|GENRE=m|CAS=n
irai	aler	VERcjg	MODE=ind|TEMPS=fut|PERS.=1|NOMB.=s
sor	sor2	PRE	MORPH=empty
eus	il	PROper	PERS.=3|NOMB.=p|GENRE=m|CAS=i
por	por2	PRE	MORPH=empty
lor	lor2	DETpos	PERS.=3|NOMB.=p|GENRE=f|CAS=r
terres	terre	NOMcom	NOMB.=p|GENRE=f|CAS=r
saisir	saisir	VERinf	MORPH=empty