rhmaaa

quinlan rhmaaa

1 follower · 55 following

Achievements

Block or Report

Block or report rhmaaa

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

te42kyfo / gpu-benches

collection of benchmarks to measure basic GPU capabilities

Jupyter Notebook 220 35 Updated Jun 21, 2024

Bruce-Lee-LY / flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

C++ 15 1 Updated Aug 18, 2024

weishengying / tiny-flash-attention

使用 cutlass 实现 flash-attention 精简版，具有教学意义

Cuda 26 Updated Aug 12, 2024

LitLeo / TensorRT_Tutorial

C++ 963 182 Updated Mar 13, 2024

parallel101 / cppguidebook

小彭老师领衔编写，现代C++的中文百科全书

Typst 466 31 Updated Aug 20, 2024

parallel101 / simdtutor

x86-64 SIMD矢量优化系列教程

C++ 92 8 Updated Jul 7, 2024

shadowpa0327 / Palu

Code for Palu: Compressing KV-Cache with Low-Rank Projection

Python 23 1 Updated Aug 10, 2024

Kobzol / hardware-effects-gpu

Demonstration of various hardware effects on CUDA GPUs.

C++ 334 25 Updated Nov 22, 2023

weishengying / cutlass_flash_atten_fp8

使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention

Cuda 39 2 Updated Aug 12, 2024

sgl-project / sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Python 4,229 277 Updated Aug 20, 2024

ColfaxResearch / cutlass-kernels

Cuda 127 24 Updated Jul 11, 2024

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 8,613 990 Updated Aug 20, 2024

yuxianzhi / Top-K

A way to use cuda to accelerate top k algorithm

Cuda 29 7 Updated Jul 11, 2017

HarleysZhang / dl_note

深度学习系统笔记，包含深度学习数学基础知识、神经网络基础部件详解、深度学习炼丹策略、模型压缩算法详解，以及如何实现深度学习推理框架实战。

Python 319 49 Updated Feb 2, 2024

unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Python 14,463 955 Updated Aug 20, 2024

thu-nics / MoA

The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>

Python 65 4 Updated Aug 12, 2024

karpathy / LLM101n

LLM101n: Let's build a Storyteller

27,091 1,477 Updated Aug 1, 2024

twitter / the-algorithm

Source code for Twitter's Recommendation Algorithm

Scala 61,935 12,159 Updated Jul 10, 2024

a1909095090 / Handwritten-Digit-Recognition-by-MLP

Using a four layer perceptron, The highest accuracy can reach 97%。

Python 2 Updated Jun 5, 2024

YihangChen-ee / HAC

🏠 [ECCV 2024] Pytorch implementation of 'HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression'

Python 181 10 Updated Jul 9, 2024

unixpickle / learn-ptx

Learning about CUDA by writing PTX code.

Python 28 Updated Feb 27, 2024

SNU-ARC / any-precision-llm

[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Python 61 3 Updated Aug 13, 2024

neuralmagic / AutoFP8

Python 127 16 Updated Jul 23, 2024

bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Python 5,930 600 Updated Aug 20, 2024

HandH1998 / QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution.

Python 49 3 Updated Aug 2, 2024

SqueezeAILab / LLMCompiler

[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling

Python 1,330 100 Updated Jul 10, 2024

Macaronlin / LLaMA3-Quantization

A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..

Python 148 6 Updated Aug 9, 2024

jy-yuan / KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 191 16 Updated Aug 16, 2024

rachitiitr / DataStructures-Algorithms

The best library for implementation of all Data Structures and Algorithms - Trees + Graph Algorithms too!

C++ 2,747 993 Updated Mar 16, 2024

NVlabs / tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

C++ 3,619 443 Updated Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quinlan rhmaaa

Achievements

Achievements

Block or report rhmaaa

Stars

te42kyfo / gpu-benches

Bruce-Lee-LY / flash_attention_inference

weishengying / tiny-flash-attention

LitLeo / TensorRT_Tutorial

parallel101 / cppguidebook

parallel101 / simdtutor

shadowpa0327 / Palu

Kobzol / hardware-effects-gpu

weishengying / cutlass_flash_atten_fp8

sgl-project / sglang

ColfaxResearch / cutlass-kernels

huggingface / text-generation-inference

yuxianzhi / Top-K

HarleysZhang / dl_note

unslothai / unsloth

thu-nics / MoA

karpathy / LLM101n

twitter / the-algorithm

a1909095090 / Handwritten-Digit-Recognition-by-MLP

YihangChen-ee / HAC

unixpickle / learn-ptx

SNU-ARC / any-precision-llm

neuralmagic / AutoFP8

bitsandbytes-foundation / bitsandbytes

HandH1998 / QQQ

SqueezeAILab / LLMCompiler

Macaronlin / LLaMA3-Quantization

jy-yuan / KIVI

rachitiitr / DataStructures-Algorithms

NVlabs / tiny-cuda-nn