-
Xi'an Jiaotong University
- Xi'an
-
19:48
(UTC +08:00) - https://t.me/frankwei0109
Lists (12)
Sort Name ascending (A-Z)
Stars
📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
Efficient Triton Kernels for LLM Training
This is a cross-chip platform collection of operators and a unified neural network library.
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
A simple network quantization demo using pytorch from scratch.
A simple, performant and scalable Jax LLM!
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
A Easy-to-understand TensorOp Matmul Tutorial
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
OneDiff: An out-of-the-box acceleration library for diffusion models.
The repo that drives my blog chadbaldwin.net
Lossless Training Speed Up by Unbiased Dynamic Data Pruning
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Pytorch library for fast transformer implementations
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
A batched offline inference oriented version of segment-anything
Official Repo For IROS 2023 Accepted Paper "Poly-MOT"
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377