Skip to content
View abduld's full-sized avatar

Block or report abduld

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.

Python 1,616 107 Updated Aug 29, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 177 10 Updated Aug 3, 2024

"The gift of mental power comes from God, Divine Being, and if we concentrate our minds on that truth, we become in tune with this great power. My Mother had taught me to seek all truth in the Bibl…

74 6 Updated Sep 28, 2020
Cuda 1 Updated Jul 29, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 162 12 Updated Jul 25, 2024

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.

Python 185 49 Updated Aug 29, 2024

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,103 56 Updated Aug 28, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 532 40 Updated Aug 15, 2024

List of papers related to neural network quantization in recent AI conferences and journals.

405 37 Updated Jul 4, 2024

đź“–A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,321 149 Updated Aug 28, 2024

A Top-Down Profiler for GPU Applications

Python 12 1 Updated Feb 29, 2024

Puzzles for learning Triton

Jupyter Notebook 932 58 Updated Jul 17, 2024

Solve puzzles. Learn CUDA.

Jupyter Notebook 5,564 327 Updated Jul 5, 2024

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 4,455 143 Updated Aug 7, 2024

A framework for few-shot evaluation of language models.

Python 6,252 1,648 Updated Aug 28, 2024

A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators

Python 60 5 Updated Jan 2, 2024

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

6,927 1,435 Updated Aug 14, 2024

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 724 187 Updated Aug 28, 2024

Apple GPU microarchitecture

Metal 344 14 Updated Jun 12, 2024

Library to manipulate Apple Metal Shading Language IR

C++ 46 3 Updated Jan 18, 2023

Python bindings for the Transformer models implemented in C/C++ using GGML library.

C 1,779 135 Updated Jan 28, 2024

Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.

Python 66 8 Updated Feb 23, 2023

Another rockchip Operating System

Shell 1,455 82 Updated Aug 24, 2024

Home of the JELOS Linux distribution.

Makefile 924 175 Updated May 8, 2024

A Easy-to-understand TensorOp Matmul Tutorial

C++ 241 27 Updated Aug 27, 2024
Cuda 88 12 Updated Mar 18, 2024

WebUI for Fine-Tuning and Self-hosting of Open-Source Large Language Models for Coding

JavaScript 1,529 103 Updated Aug 18, 2024

row-major matmul optimization

C++ 581 77 Updated Sep 9, 2023
Next