The L3Cube's IndicNLP project is an effort to improve NLP resources for Indic languages. We have created monolingual BERT models for 10 Indic languages. We have also released monolingual and multilingual (cross-lingual) Sentence BERT models. These models provide state-of-the-art results on downstream tasks.
More details about these models can be found in paper
Model | Link |
---|---|
Marathi BERT | model |
Hindi BERT | model |
Dev BERT (Hindi + Marathi) | model |
Kannada BERT | model |
Telugu BERT | model |
Malayalam BERT | model |
Tamil BERT | model |
Gujarati BERT | model |
Oriya BERT | model |
Bengali BERT | model |
Punjabi BERT | model |
Assamese BERT | model |
More details about these models can be found in paper
All the resources are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The datasets are released to the community for research purposes only and the group is not responsible for any misuse of these datasets.
@article{joshi2022l3cube_hind,
title={L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages},
author={Joshi, Raviraj},
journal={arXiv preprint arXiv:2211.11418},
year={2022}
}
@article{deode2023l3cube,
title={L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT},
author={Deode, Samruddhi and Gadre, Janhavi and Kajale, Aditi and Joshi, Ananya and Joshi, Raviraj},
journal={arXiv preprint arXiv:2304.11434},
year={2023}
}
Joshi, Raviraj. "L3Cube-HindBERT and DevBERT: Pre-Trained BERT Transformer models for Devanagari based Hindi and Marathi Languages." arXiv preprint arXiv:2211.11418 (2022).
Deode, Samruddhi, et al. "L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT." arXiv preprint arXiv:2304.11434 (2023).
Mirashi, Aishwarya, et al. "L3Cube-IndicNews: News-based Short Text and Long Document Classification Datasets in Indic Languages." arXiv preprint arXiv:2401.02254 (2024).
This project is led by Raviraj Joshi under L3Cube Labs, Pune. For any queries contact [email protected] .