![:octocat: :octocat:](https://github.githubassets.com/images/icons/emoji/octocat.png)
-
Capital One AI Foundations
- New York
- https://gentawinata.com
- @gentaiscool
Highlights
- Pro
Block or Report
Block or report gentaiscool
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
ANSI color formatting for output in terminal
Mexican NLP 2024 Summerschool Tutorial on Knowledge Distillation and Parameter Efficient Finetuning
A Python implementation of global optimization with gaussian processes.
A library to calculate similarity scores between two collections of text sequences encoded using transformer models for bitext mining, dense retrieval, retrieval-based classification, and retrieval…
A library of translation-based text similarity measures
MTEB: Massive Text Embedding Benchmark
Implementation of ProxyLM, a scalable and efficient LM performance prediction framework on NLP task using proxy models
Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.
MINERS ⛏️: The semantic retrieval benchmark for evaluating multilingual language models.
A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems
Indonesian T0 | Instruction-tuning for low-resource and extremely low-resource Austronesian languages
This repository is dedicated to development of code-mixed language resources.
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.
NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented and extremely low-resource Indonesian local languages.
Code for DeepCubeA, a Deep Reinforcement Learning algorithm that can learn to solve the Rubik's cube.
Can LLMs generate code-mixed sentences through zero-shot prompting?
Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)
GlobalBench: A Benchmark for Global Progress in Language Technology
The unified platform for data-related resources.
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.