-
Amazon
- New York
- https://www.cs.cmu.edu/~negrinho/
Starred repositories
Model components of the Llama Stack APIs
[NeurlPS D&B 2024] Generative AI for Math: MathPile
Collaborative book Machine Learning Systems
Efficient Triton Kernels for LLM Training
An interactive framework to visualize and analyze your AutoML process in real-time.
Open source hardware and software platform to build a small scale self driving car.
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
An automated pipeline for evaluating LLMs for role-playing.
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance. Accepted to ACL 2024.
Official inference repo for FLUX.1 models
Utilities intended for use with Llama models.
A Comprehensive Toolkit for High-Quality PDF Content Extraction
A high-throughput and memory-efficient inference and serving engine for LLMs
Explorations into some recent techniques surrounding speculative decoding
Simple Byte pair Encoding mechanism used for tokenization process . written purely in C
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
A self-organizing file system with llama 3
The easiest way to use Agentic RAG in any enterprise
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
A repo lists papers related to LLM based agent
TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Neural Networks: Zero to Hero
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
Reconquer the canvas: beautiful Tikz figures without clunky Tikz code