Skip to content
/ BERTu Public

A BERT-based language model for the Maltese language

License

Notifications You must be signed in to change notification settings

MLRS/BERTu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BERTu: A BERT-based language model for the Maltese language 🇲🇹

This repository contains code & information relevant for the paper Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese.

The pre-trained language models can be accessed through the Hugging Face Hub using MLRS/BERTu or MLRS/mBERTu. For details on how pre-training was done see the pretrain directory.

The models were trained on Korpus Malti v4.0, which can be accessed through the Hugging Face Hub using MLRS/korpus_malti.

Citation

Cite this work as follows:

@inproceedings{BERTu,
    title = "Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese",
    author = "Micallef, Kurt  and
              Gatt, Albert  and
              Tanti, Marc  and
              van der Plas, Lonneke  and
              Borg, Claudia",
    booktitle = "Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing",
    month = jul,
    year = "2022",
    address = "Hybrid",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.deeplo-1.10",
    doi = "10.18653/v1/2022.deeplo-1.10",
    pages = "90--101",
}

About

A BERT-based language model for the Maltese language

Resources

License

Stars

Watchers

Forks

Packages

No packages published