Skip to content
/ OF3C Public

Old French Collective Corpus of the École des chartes

Notifications You must be signed in to change notification settings

chartes/OF3C

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OF3C - Old French Collective Corpus of the École des chartes

Cite this corpus

  • Camps, Jean-Baptiste, Clérice, Thibault, Duval, Frédéric, Kanaoka, Naomi & Pinche, Ariane (2021). Corpus and Models for Lemmatisation and POS-tagging of Old French, arXiv preprint arXiv:2109.11442, https://arxiv.org/abs/2109.11442.

Sources

  • [Chrestien]: Kunstmann, Pierre (éd), Chrétien de Troyes: Cligès, Erec, Lancelot, Perceval, Yvain – manuscrit P (BnF fr. 794), 2009, http:https://www.atilf.fr/dect.
  • [Code]: Duval and Pastore, in progress.
  • [DocLing]: Gleßgen, Martin Dietrich (dir.), et al., Les plus anciens documents linguistiques de la France, 2016, http:https://www.rose.uzh.ch/docling/, 3e édition.
  • [Geste]: Camps, Jean-Baptiste (dir.), Geste: un corpus de chansons de geste, 2016-… (v02), École nationale des chartes, Paris, 2019, http:https://doi.org/10.5281/zenodo.2630574, textes du domaine public, développements CC-BY-SA.
  • [Lancelot]: Ing, Lucence, Disparitions lexicales en diachronie: traitements automatiques sur le Lancelot en prose, thèse de doct. en préparation, dir. F. Duval, codir. J.B. Camps, École nationale des chartes, Université PSL, Paris.
  • [WauchierSConf] Pinche, Ariane, Édition nativement numérique du recueil hagiographique ‘Li Seint Confessor’ de Wauchier de Denain d’après le manuscrit fr. 412 de la Bibliothèque nationale de France, thèse de doctorat dir. C. pierreville et B. Bureau, Université de Lyon, Lyon, 2021.

The [Varia] are composed of short excerpts, taken from the work of students at the École des chartes, annotated in 2020, as part of the evaluation of the course initiation à la philologie romane: introduction au moyen français, given by Lucence Ing and Jean-Baptiste Camps (thematic dossier on the plague and medicine, during the first lockdown of 2020 of the COVID19 pandemic)

Texts from:

  • Chroniques de Froissart after Paris ms. fr. 2663, 168v.-169r Online Froissart : P63, SHF 1-318
  • Chroniques de Froissart, after London Arundel 67 (vol. 1), 360r-360v Online Froissart : L67, SHF 1-330
  • Great surgery by Guy de Chauliac From the ed. by Nicaise, Edouard (1890) p. 167 ff
  • Poésies de Gilles li Muisis, published for the first time, according to the manuscript of Lord Ashburnham by baron Kervyn de Lettenhove, Louvain, 1882, https://archive.org/details/posiesdegilles01lemuuoft/page/78/mode/2up,

Statistics (2023-04-26)

Token, Lemma and POS counts

Category Different Total Values with 1 occurrence only
Forms 47,661 1,183,960 23,851
Lemma 11,295 1,183,960 3,852
POS 66 1,183,960 6

Morphology counts

Non-x values means that the category actually applied to the token: a verb will have a DEGRE annotation of x, because verb can't have DEGRE.

Category Different Total Non-x values
Mode 6 478,657 60,740
Temps 5 478,657 57,367
Personne 5 478,657 106,566
Nombre 3 478,657 290,326
Genre 4 478,657 226,996
Cas 4 478,657 229,586
Degre 5 478,657 42,949

POS

Value Count
NOMcom 160,410
VERcjg 156,630
PROper 96,533
PRE 91,586
PONfbl 79,784
ADVgen 79,578
CONcoo 66,658
DETdef 57,655
PONfrt 42,489
CONsub 40,120
VERppe 35,647
ADJqua 31,675
VERinf 28,218
NOMpro 27,872
ADVneg 25,947
PROrel 25,542
DETpos 22,367
PROadv 15,003
PRE.DETdef 14,836
PROdem 14,327
PROind 11,661
DETind 10,985
PONpga 7,707
DETndf 7,076
DETdem 6,057
PONpdr 4,842
DETcar 3,229
VERppa 2,784
ADJind 2,575
PROimp 2,036
PROcar 1,855
ADJcar 1,277
ADJpos 1,049
PROint 1,014
PONpxx 1,012
ADVneg.PROper 952
PROpos 669
ADJord 636
ADVsub 592
INJ 549
ADVint 506
DETrel 448
PROord 327
PROper.PROper 311
ADVgen.PROper 271
DETint 225
PRE.PROdem 151
DETcom 52
PRE.PROper 47
PROrel.PROper 46
RED 34
ETR 33
CONsub.PROper 18
ADVgen.CONsub 16
PRE.DETcom 12
DETord 8
ADJqua.NOMcom 7
PRE.PROrel 4
ADVing 2
ADVneg.PROadv 2
PROint.PROper 1
CONsubs 1
ADVgen.PROadv 1
NomPro 1
PRE.DETrel 1
CONsub.DETdef 1

Mode

Value Count
MODE=x 417,917
MODE=ind 51,951
MODE=sub 5,416
MODE=imp 2,061
MODE=con 1,311
MODE=cond 1

Temps

Value Count
TEMPS=x 421,290
TEMPS=pst 29,150
TEMPS=psp 14,882
TEMPS=ipf 9,012
TEMPS=fut 4,323

Personne

Value Count
PERS.=x 372,091
PERS.=3 76,497
PERS.=1 18,377
PERS.=2 11,455
PERS.=0 237

Nombre

Value Count
NOMB.=s 218,952
NOMB.=x 188,331
NOMB.=p 71,374

Genre

Value Count
GENRE=x 251,661
GENRE=m 155,955
GENRE=f 63,962
GENRE=n 7,079

Cas

Value Count
CAS=x 249,071
CAS=r 145,693
CAS=n 75,652
CAS=i 8,241

Degre

Value Count
DEGRE=x 435,708
DEGRE=- 24,947
DEGRE=p 16,622
DEGRE=c 910
DEGRE=s 470