Skip to content

The first-ever vast benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained models, and a starter code! (AACL-IJCNLP 2020)

License

Notifications You must be signed in to change notification settings

kevinmel2000/indonlu

 
 

Repository files navigation

IndoNLU

IndoNLU is a collection of Natural Language Understanding (NLU) resources for Bahasa Indonesia.

12 Downstream Tasks

  • You can check [Link]
  • We provide train, valid, and test set (with masked labels, no true labels). We are currently preparing a platform for auto-evaluation using Codalab. Please stay tuned!

Examples

  • A guide to load IndoBERT model and finetune the model on Sequence Classification and Sequence Tagging task.
  • You can check [Link]

Indo4B

  • 23GB Indo4B Pretraining Dataset [Link]

IndoBERT models

Leaderboard

Submission Format

Please kindly check [Link]. For each task, there is different format. Every submission file always start with the index column (the id of the test sample following the order of the masked test set).

For the submission, first you need to rename your prediction into 'pred.txt', then zip the file.

Paper

IndoNLU has been accepted on AACL 2020 and you can find the detail on https://arxiv.org/abs/2009.05387 If you are using any component on IndoNLU for research purposes, please cite the following paper:

@inproceedings{wilie2020indonlu,
  title={IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding},
  author={Bryan Wilie and Karissa Vincentio and Genta Indra Winata and Samuel Cahyawijaya and X. Li and Zhi Yuan Lim and S. Soleman and R. Mahendra and Pascale Fung and Syafri Bahar and A. Purwarianti},
  booktitle={Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing},
  year={2020}
}

About

The first-ever vast benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained models, and a starter code! (AACL-IJCNLP 2020)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 66.4%
  • Jupyter Notebook 27.1%
  • Shell 6.5%