BERTu: A BERT-based language model for the Maltese language 🇲🇹

This repository contains code & information relevant for the paper Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese.

The pre-trained language models can be accessed through the Hugging Face Hub using MLRS/BERTu or MLRS/mBERTu. For details on how pre-training was done see the pretrain directory.

The models were trained on Korpus Malti v4.0, which can be accessed through the Hugging Face Hub using MLRS/korpus_malti.

For details on how fine-tuning was done see the finetune directory.
To consume fine-tuned models for evaluation/prediction refer to the evaluate directory.

Citation

Cite this work as follows:

@inproceedings{BERTu,
    title = "Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese",
    author = "Micallef, Kurt  and
              Gatt, Albert  and
              Tanti, Marc  and
              van der Plas, Lonneke  and
              Borg, Claudia",
    booktitle = "Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing",
    month = jul,
    year = "2022",
    address = "Hybrid",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.deeplo-1.10",
    doi = "10.18653/v1/2022.deeplo-1.10",
    pages = "90--101",
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
allennlp @ bd44574		allennlp @ bd44574
data		data
evaluate		evaluate
finetune		finetune
pretrain		pretrain
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERTu: A BERT-based language model for the Maltese language 🇲🇹

Citation

About

Releases

Packages

Languages

License

MLRS/BERTu

Folders and files

Latest commit

History

Repository files navigation

BERTu: A BERT-based language model for the Maltese language 🇲🇹

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages