UranusSeven

🎯

Focusing

Uranus UranusSeven

🎯

Focusing

42 followers · 12 following

https://www.zhihu.com/people/840445

Achievements

x2 x3 x3

Achievements

x2 x3 x3

Block or Report

Block or report UranusSeven

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

HPC💻

A list for high-performance computing libs.

23 repositories

openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

C 1,073 409 Updated Jul 10, 2024

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 451 34 Updated Jul 10, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 777 68 Updated Jul 10, 2024

apoorvumang / prompt-lookup-decoding

Jupyter Notebook 417 22 Updated Jun 25, 2024

databricks / megablocks

Python 1,120 154 Updated May 28, 2024

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,647 407 Updated Jul 1, 2024

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 9,351 2,108 Updated Jul 10, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,432 802 Updated Jul 10, 2024

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and…

Python 6,279 1,226 Updated Jul 10, 2024

zhuzilin / ring-flash-attention

Ring attention implementation with flash attention

Python 433 30 Updated May 20, 2024

microsoft / msccl

Microsoft Collective Communication Library

C++ 271 26 Updated Sep 20, 2023

KnowingNothing / MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

C++ 221 22 Updated Jun 15, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 79 7 Updated Jul 9, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 11,956 1,062 Updated Jul 10, 2024

tlc-pack / libflash_attn

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 79 12 Updated May 21, 2024

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 183 27 Updated Jul 9, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

833 15 Updated Jul 10, 2024

LLMServe / SwiftTransformer

High performance Transformer implementation in C++.

C++ 43 2 Updated Apr 22, 2024

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 74 9 Updated Jun 30, 2024

lhao499 / ringattention

Transformers with Arbitrarily Large Context

Python 571 43 Updated Jul 8, 2024

microsoft / MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Python 347 11 Updated Jul 7, 2024

cuda-mode / lectures

Material for cuda-mode lectures

Jupyter Notebook 1,722 165 Updated Jun 13, 2024

hahnyuan / LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 224 25 Updated Jun 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uranus UranusSeven

Achievements

Achievements

Block or report UranusSeven

HPC💻

openucx / ucx

IST-DASLab / marlin

flashinfer-ai / flashinfer

apoorvumang / prompt-lookup-decoding

databricks / megablocks

SJTU-IPADS / PowerInfer

NVIDIA / Megatron-LM

NVIDIA / TensorRT-LLM

intel-analytics / ipex-llm

zhuzilin / ring-flash-attention

microsoft / msccl

KnowingNothing / MatmulTutorial

bytedance / flux

Dao-AILab / flash-attention

tlc-pack / libflash_attn

microsoft / mscclpp

kvcache-ai / Mooncake

LLMServe / SwiftTransformer

microsoft / sarathi-serve

lhao499 / ringattention

microsoft / MInference

cuda-mode / lectures

hahnyuan / LLM-Viewer