Skip to content
@IndoNLP

IndoNLP

We are researchers who push up the lower bound of the Indonesian NLP standard. We are collaborating to release new data resources and benchmarks.

Pinned Loading

  1. indonlu indonlu Public

    The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)

    Jupyter Notebook 525 185

  2. nusa-crowd nusa-crowd Public

    A collaborative project to collect datasets in Indonesian languages.

    Jupyter Notebook 262 61

  3. nusax nusax Public

    High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)

    Jupyter Notebook 84 8

  4. indonlg indonlg Public

    The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, pre-trained IndoGPT and IndoBART models, and a starter code!…

    Python 68 11

Repositories

Showing 10 of 10 repositories
  • .github Public

    Landing page

    IndoNLP/.github’s past year of commit activity
    1 0 0 0 Updated Jul 26, 2024
  • cendol Public

    Indonesian T0 | Instruction-tuning for low-resource and extremely low-resource Austronesian languages

    IndoNLP/cendol’s past year of commit activity
    Jupyter Notebook 9 Apache-2.0 1 0 1 Updated Jun 24, 2024
  • nusa-crowd Public

    A collaborative project to collect datasets in Indonesian languages.

    IndoNLP/nusa-crowd’s past year of commit activity
    Jupyter Notebook 262 Apache-2.0 61 35 (5 issues need help) 2 Updated Jun 2, 2024
  • nusa-writes Public

    NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented and extremely low-resource Indonesian local languages.

    IndoNLP/nusa-writes’s past year of commit activity
    Jupyter Notebook 25 Apache-2.0 1 0 0 Updated Feb 26, 2024
  • nusa-catalogue Public

    Dataset Catalogue Homepage for Indonesian Languages

    IndoNLP/nusa-catalogue’s past year of commit activity
    JavaScript 6 Apache-2.0 6 1 0 Updated Feb 19, 2024
  • nusax Public

    High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)

    IndoNLP/nusax’s past year of commit activity
    Jupyter Notebook 84 Apache-2.0 8 0 0 Updated May 8, 2023
  • nusacrowd-asr Public

    NusaCrowd ASR Experiment

    IndoNLP/nusacrowd-asr’s past year of commit activity
    Jupyter Notebook 2 Apache-2.0 0 0 0 Updated Jan 5, 2023
  • indonlu Public

    The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)

    IndoNLP/indonlu’s past year of commit activity
    Jupyter Notebook 525 Apache-2.0 185 3 1 Updated Dec 3, 2022
  • indonlg Public

    The first-ever vast natural language generation benchmark for Indonesian, Sundanese, and Javanese. We provide multiple downstream tasks, pre-trained IndoGPT and IndoBART models, and a starter code! (EMNLP 2021)

    IndoNLP/indonlg’s past year of commit activity
    Python 68 Apache-2.0 11 1 0 Updated Dec 3, 2022
  • IndoNLP/indonlp.github.io’s past year of commit activity
    SCSS 1 Apache-2.0 2 0 0 Updated Jun 12, 2022