Skip to content
View weizhenhuan's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report weizhenhuan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.

Cuda 1,462 160 Updated Nov 18, 2024

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

C++ 153 10 Updated Nov 18, 2024

Efficient Triton Kernels for LLM Training

Python 3,434 202 Updated Nov 18, 2024

Cataloging released Triton kernels.

134 7 Updated Aug 26, 2024

LLM101n: Let's build a Storyteller

30,177 1,647 Updated Aug 1, 2024

This is a cross-chip platform collection of operators and a unified neural network library.

Python 13 1 Updated Nov 3, 2023

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 1,942 154 Updated Mar 27, 2024

The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++

CSS 42,849 5,439 Updated Oct 24, 2024

A simple network quantization demo using pytorch from scratch.

Python 510 97 Updated Jun 18, 2023

A simple, performant and scalable Jax LLM!

Python 1,529 294 Updated Nov 18, 2024

CUDA Core Compute Libraries

C++ 1,274 163 Updated Nov 18, 2024

Grok open release

Python 49,575 8,320 Updated Aug 30, 2024

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Python 14,266 2,240 Updated Aug 31, 2024

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.

Go 98,278 7,823 Updated Nov 18, 2024

A Easy-to-understand TensorOp Matmul Tutorial

C++ 290 31 Updated Sep 21, 2024

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook 1,535 94 Updated Feb 16, 2024

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,967 413 Updated Sep 6, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 20,270 2,238 Updated Aug 12, 2024

OneDiff: An out-of-the-box acceleration library for diffusion models.

Jupyter Notebook 1,699 103 Updated Nov 14, 2024

The repo that drives my blog chadbaldwin.net

HTML 33 36 Updated Aug 6, 2024

Lossless Training Speed Up by Unbiased Dynamic Data Pruning

Python 318 18 Updated Sep 24, 2024

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Python 14,056 1,818 Updated Jul 3, 2024

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

C++ 8,745 1,666 Updated Nov 18, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,667 514 Updated Oct 18, 2024

Pytorch library for fast transformer implementations

Python 1,642 178 Updated Mar 23, 2023

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

Python 59,978 7,374 Updated Nov 7, 2024

A batched offline inference oriented version of segment-anything

Python 1,204 71 Updated Sep 13, 2024

Official Repo For IROS 2023 Accepted Paper "Poly-MOT"

Python 165 29 Updated Mar 20, 2024

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Python 7,343 1,224 Updated Jul 23, 2024
Next