Skip to content
View abduld's full-sized avatar
Block or Report

Block or report abduld

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Cuda 1 Updated Jul 29, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 149 12 Updated Jul 25, 2024

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.

Python 181 46 Updated Aug 15, 2024

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,061 53 Updated Aug 18, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 512 37 Updated Aug 15, 2024

List of papers related to neural network quantization in recent AI conferences and journals.

393 38 Updated Jul 4, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,248 146 Updated Aug 17, 2024

A Top-Down Profiler for GPU Applications

Python 12 1 Updated Feb 29, 2024

Puzzles for learning Triton

Jupyter Notebook 919 56 Updated Jul 17, 2024

Solve puzzles. Learn CUDA.

Jupyter Notebook 5,516 321 Updated Jul 5, 2024

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 4,420 142 Updated Aug 7, 2024

A framework for few-shot evaluation of language models.

Python 6,160 1,631 Updated Aug 17, 2024

A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators

Python 59 4 Updated Jan 2, 2024

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

6,696 1,395 Updated Aug 14, 2024

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 721 187 Updated Aug 18, 2024

Apple GPU microarchitecture

Metal 341 13 Updated Jun 12, 2024

Library to manipulate Apple Metal Shading Language IR

C++ 46 3 Updated Jan 18, 2023

Python bindings for the Transformer models implemented in C/C++ using GGML library.

C 1,771 136 Updated Jan 28, 2024

Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.

Python 66 8 Updated Feb 23, 2023

Another rockchip Operating System

Shell 1,443 82 Updated Jul 31, 2024

Home of the JELOS Linux distribution.

Makefile 924 175 Updated May 8, 2024

A Easy-to-understand TensorOp Matmul Tutorial

C++ 240 27 Updated Jun 15, 2024
Cuda 87 12 Updated Mar 18, 2024

WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding

JavaScript 1,526 103 Updated Aug 18, 2024

row-major matmul optimization

C++ 575 76 Updated Sep 9, 2023

mperf是一个面向移动/嵌入式平台的算子性能调优工具箱

C++ 168 26 Updated Aug 17, 2023

Yinghan's Code Sample

Cuda 264 49 Updated Jul 25, 2022
Next