Stars
- All languages
- ActionScript
- ApacheConf
- Arduino
- Assembly
- Bikeshed
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Common Lisp
- Cuda
- Cython
- DIGITAL Command Language
- Dart
- Dockerfile
- Elixir
- Elm
- Emacs Lisp
- Erlang
- F#
- Fortran
- G-code
- GDScript
- Go
- Groff
- Groovy
- HTML
- Hack
- Haskell
- Hy
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- Kotlin
- Lean
- Less
- LiveScript
- Lua
- MATLAB
- MDX
- Makefile
- Mako
- Markdown
- Mathematica
- Mojo
- NASL
- Nginx
- Nunjucks
- OCaml
- Objective-C
- Objective-C++
- OpenEdge ABL
- PHP
- Pascal
- Perl
- Processing
- Protocol Buffer
- Python
- R
- Racket
- RobotFramework
- Roff
- Ruby
- Rust
- SCSS
- SVG
- Scala
- Scheme
- Shell
- Starlark
- Swift
- TSQL
- TeX
- TypeScript
- Verilog
- Vim Script
- Vue
- Zig
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
Find all the fundamental UXI guidelines and pattern-based web components to build brand driven, consistent and intuitive designs for digital Porsche products.
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674
Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
commaVQ is a dataset of compressed driving video
A Data Streaming Library for Efficient Neural Network Training
A high-throughput and memory-efficient inference and serving engine for LLMs
Write scalable load tests in plain Python 🚗💨
Implementation of Flash Attention in Jax
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
Tevatron - A flexible toolkit for neural retrieval research and development.
The RedPajama-Data repository contains code for preparing large datasets for training large language models.