Skip to content
View garrett4wade's full-sized avatar
  • Tsinghua University
  • Beijing, China
Block or Report

Block or report garrett4wade

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashInfer: Kernel Library for LLM Serving

Cuda 764 64 Updated Jul 6, 2024

Official inference library for Mistral models

Jupyter Notebook 9,146 801 Updated Jun 22, 2024
Python 1,116 154 Updated May 28, 2024
Python 127 5 Updated Jun 25, 2024

An extremely fast Python linter and code formatter, written in Rust.

Rust 28,766 933 Updated Jul 6, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,430 88 Updated Jun 21, 2024

A library to analyze PyTorch traces.

Python 249 33 Updated Jul 5, 2024

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Python 24 1 Updated Jul 4, 2024

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Python 23,951 4,929 Updated Jul 7, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,352 484 Updated Jul 3, 2024

Taming Transformers for High-Resolution Image Synthesis

Jupyter Notebook 5,547 1,107 Updated Apr 25, 2024
Jupyter Notebook 121 13 Updated Feb 2, 2024

Official Jax Implementation of MaskGIT

Jupyter Notebook 402 47 Updated Nov 18, 2022

Pandora: Towards General World Model with Natural Language Actions and Video States

Python 428 26 Updated May 27, 2024

Profiling and inspecting memory in pytorch

Python 995 35 Updated Sep 12, 2023

LLM training in simple, raw C/CUDA

Cuda 21,431 2,327 Updated Jul 4, 2024
Python 68 9 Updated Dec 18, 2023

Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.

Python 253 49 Updated Mar 31, 2023

An application-focused API for memory management on NUMA & GPU architectures

C++ 306 52 Updated Jul 1, 2024

Grok open release

Python 49,149 8,310 Updated May 29, 2024

Task-based datasets, preprocessing, and evaluation for sequence models.

Python 544 58 Updated Jul 6, 2024

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 154 12 Updated May 28, 2024

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 2,481 553 Updated Jul 6, 2024

A fast implementation of T5/UL2 in PyTorch using Flash Attention

Python 53 6 Updated Jun 26, 2024

Building a quick conversation-based search demo with Lepton AI.

TypeScript 7,513 964 Updated Jun 22, 2024

Open standard for machine learning interoperability

Python 17,228 3,624 Updated Jul 5, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,402 800 Updated Jul 5, 2024

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

Python 9,251 657 Updated Jun 16, 2024

Mamba SSM architecture

Python 11,528 943 Updated Jul 3, 2024
Next