abduld

abdul dakkak abduld

147 followers · 87 following

dakkak.dev

Achievements

x4 x4

Achievements

x4 x4

Starred repositories

Lightning-AI / LitServe

Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.

Python 1,616 107 Updated Aug 29, 2024

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 177 10 Updated Aug 3, 2024

manjunath5496 / Supertech-Papers

"The gift of mental power comes from God, Divine Being, and if we concentrate our minds on that truth, we become in tune with this great power. My Mother had taught me to seek all truth in the Bibl…

74 6 Updated Sep 28, 2020

Ratbuyer / h100_features

Cuda 1 Updated Jul 29, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 162 12 Updated Jul 25, 2024

bytedance / ByteMLPerf

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.

Python 185 49 Updated Aug 29, 2024

sustcsonglin / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,103 56 Updated Aug 28, 2024

S-Lab-System-Group / Awesome-DL-Scheduling-Papers

224 30 Updated Jan 22, 2024

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 532 40 Updated Aug 15, 2024

Zhen-Dong / Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

405 37 Updated Jul 4, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,321 149 Updated Aug 28, 2024

FindHao / drgpu

A Top-Down Profiler for GPU Applications

Python 12 1 Updated Feb 29, 2024

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 932 58 Updated Jul 17, 2024

srush / GPU-Puzzles

Solve puzzles. Learn CUDA.

Jupyter Notebook 5,564 327 Updated Jul 5, 2024

XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 4,455 143 Updated Aug 7, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 6,252 1,648 Updated Aug 28, 2024

ROCm / amd_matrix_instruction_calculator

A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators

Python 60 5 Updated Jan 2, 2024

aishwaryanr / awesome-generative-ai-guide

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

6,927 1,435 Updated Aug 14, 2024

rapidsai / raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …

Cuda 724 187 Updated Aug 28, 2024