Stars
OLMoE: Open Mixture-of-Experts Language Models
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
🧬 RegMix: Data Mixture as Regression for Language Model Pre-training
AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark
BigCodeBench: Benchmarking Code Generation Towards AGI
Home of StarCoder: fine-tuning & inference!
Language models scale reliably with over-training and on downstream tasks
A Survey on Data Selection for Language Models
Generative Representational Instruction Tuning
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
Evaluation of BLOOM on the HumanEval benchmark
A Scandinavian Benchmark for sentence embeddings
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
A framework for few-shot evaluation of language models.
Modeling, training, eval, and inference code for OLMo
Data and tools for generating and inspecting OLMo pre-training data.
A framework for the evaluation of autoregressive code generation language models.
Retrieval and Retrieval-augmented LLMs
🐙 OctoPack: Instruction Tuning Code Large Language Models
Scaling Data-Constrained Language Models
BLOOM+1: Adapting BLOOM model to support a new unseen language
Crosslingual Generalization through Multitask Finetuning