zhenxl

zhengxianli zhenxl

engineer

5 followers · 53 following

chitu.ai
beijing

Stars

srush / triton-autodiff

Experiment of using Tangent to autodiff triton

Python 70 1 Updated Jan 22, 2024

BobMcDear / attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Python 473 21 Updated Oct 25, 2024

uwsampl / gpt-fast

Forked from pytorch-labs/gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 1 Updated Oct 4, 2024

RulinShao / FastCkpt

Python package for rematerialization-aware gradient checkpointing

Python 23 3 Updated Oct 31, 2023

nisargthakkar / cuda-max-pool

Implemented the max pool filter in CUDA using shared memory

Cuda 5 1 Updated Sep 10, 2019

autumnjohnson / deepul

Forked from rll/deepul

Jupyter Notebook 1 Updated Apr 25, 2024

joelparkerhenderson / demo-rust-axum

Demo of Rust and axum web framework with Tokio, Tower, Hyper, Serde

Rust 363 30 Updated Oct 17, 2024

ekzhang / cs262

Solutions to introductory distributed computing exercises

Rust 9 1 Updated Apr 9, 2023

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 230 31 Updated Sep 12, 2024

flexflow / FlexFlow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 1,696 226 Updated Nov 7, 2024

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 603 54 Updated Apr 7, 2024

allenai / OLMo

Modeling, training, eval, and inference code for OLMo

Python 4,598 468 Updated Nov 7, 2024

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 274 43 Updated Nov 28, 2021

marcelroed / spring2024-assignment1-basics

Forked from stanford-cs336/spring2024-assignment1-basics

Python 1 1 Updated Apr 18, 2024

shawntan / scattermoe

Triton-based implementation of Sparse Mixture of Experts.

Python 185 14 Updated Oct 10, 2024

awreece / memory-bandwidth-demo

An attempt at achieving the theoretical best memory bandwidth of my machine.

C 52 19 Updated May 19, 2013

LLMServe / SwiftTransformer

High performance Transformer implementation in C++.

C++ 77 10 Updated Sep 14, 2024

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 729 37 Updated Nov 6, 2024

XuezheMax / megalodon

Reference implementation of Megalodon 7B model

Cuda 504 52 Updated Apr 18, 2024

rll / deepul

Jupyter Notebook 764 372 Updated Mar 12, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 627 24 Updated Sep 21, 2024

interestingLSY / swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 95 7 Updated Jul 5, 2024

mlc-ai / web-stable-diffusion

Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.

Jupyter Notebook 3,589 227 Updated Mar 12, 2024

BlackHC / toma

Helps you write algorithms in PyTorch that adapt to the available (CUDA) memory

Python 424 10 Updated Aug 29, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 5,911 482 Updated Nov 7, 2024

haoliuhl / ringattention

Transformers with Arbitrarily Large Context

Python 635 52 Updated Aug 12, 2024

huggingface / ratchet

A cross-platform browser ML framework.

Rust 616 33 Updated Nov 4, 2024

unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen & Gemma LLMs 2-5x faster with 80% less memory

Python 17,845 1,237 Updated Nov 7, 2024

sail-sg / EditAnything

Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)

Python 3,318 189 Updated Feb 29, 2024

TJ-Solergibert / Megatron-LM

Forked from NVIDIA/Megatron-LM

Debugging Megatron. 3D Parallelism, models, training and more!

Python 2 Updated Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly