Neural Language Models for Nineteenth-Century English

Hosseini, Kasra; Beelen, Kaspar; Colavizza, Giovanni; Ardanuy, Mariona Coll

Computer Science > Computation and Language

arXiv:2105.11321 (cs)

[Submitted on 24 May 2021]

Title:Neural Language Models for Nineteenth-Century English

Authors:Kasra Hosseini, Kaspar Beelen, Giovanni Colavizza, Mariona Coll Ardanuy

View PDF

Abstract:We present four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5.1 billion tokens. The language model architectures include static (word2vec and fastText) and contextualized models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate instances on text published before 1850 for the two static models, and four instances considering different time slices for BERT. Our models have already been used in various downstream tasks where they consistently improved performance. In this paper, we describe how the models have been created and outline their reuse potential.

Comments:	5 pages, 1 figure
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2105.11321 [cs.CL]
	(or arXiv:2105.11321v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2105.11321

Submission history

From: Kasra Hosseini [view email]
[v1] Mon, 24 May 2021 14:57:34 UTC (128 KB)

Full-text links:

Access Paper:

view license

Current browse context:

< prev | next >

new | recent | 2021-05

Change to browse by:

cs.CL
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Kaspar Beelen
Giovanni Colavizza

export BibTeX citation

Computer Science > Computation and Language

Title:Neural Language Models for Nineteenth-Century English

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Neural Language Models for Nineteenth-Century English

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators