-
MBZUAI, IndoNLP
- Abu Dhabi
- fajrikoto.com
- @FajriKoto
Block or Report
Block or report fajri91
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (1)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented and extremely low-resource Indonesian local languages.
CMMLU: Measuring massive multitask language understanding in Chinese
A Multilingual Replicable Instruction-Following Model
Discourse Probing of Pretrained Language Models. In Proceedings of NAACL 2021.
A framework for assessing and improving classification fairness.
High-quality parallel resource on sentiment analysis for 10 low-resource Indonesian languages, English, and Indonesian (Outstanding Paper at EACL 2023)
Evaluating the Efficacy of Summarization Evaluation across Languages. In Findings of ACL 2021.
KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation
IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)
Complete Web Scraping of TED.com for Metadata, Transcript, Audio, Video, Images using Parallel Programming
Classification of twitter user's personality based on their tweets. Big Five Model used to classify the personality.
The Dataset for Hate Speech Detection in Indonesian (Bahasa Indonesia)
BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)
The first large-scale summarization corpus for the Indonesian language. AACL 2020.
IndoLEM is a comprehensive Indonesian NLU benchmark, comprising three pillars NLP task: morpho-syntax, semantic, and discourse. Presented in COLING 2020.