UranusSeven

🎯

Focusing

Uranus UranusSeven

🎯

Focusing

42 followers · 12 following

https://www.zhihu.com/people/840445

Achievements

x2 x3 x3

Achievements

x2 x3 x3

Block or Report

Block or report UranusSeven

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Lists (9)

Sort

🚀 My stack

Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

test-time-training / ttt-lm-jax

Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 171 7 Updated Jul 8, 2024

hahnyuan / LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 224 25 Updated Jun 23, 2024

cuda-mode / lectures

Material for cuda-mode lectures

Jupyter Notebook 1,722 165 Updated Jun 13, 2024

microsoft / MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Python 347 11 Updated Jul 7, 2024

lhao499 / ringattention

Transformers with Arbitrarily Large Context

Python 571 43 Updated Jul 8, 2024

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 74 9 Updated Jun 30, 2024

LLMServe / SwiftTransformer

High performance Transformer implementation in C++.

C++ 43 2 Updated Apr 22, 2024

WarrenWen666 / AI-Software-Startups

A Survey of AI startups

390 31 Updated Aug 27, 2023

siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Python 1,440 85 Updated Jul 10, 2024

OpenDevin / OpenDevin

🐚 OpenDevin: Code Less, Make More

Python 28,565 3,276 Updated Jul 10, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

833 15 Updated Jul 10, 2024

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 183 27 Updated Jul 9, 2024

feifeibear / Odysseus-Transformer

Odysseus: Playground of LLM Sequence Parallelism

Python 39 Updated Jun 17, 2024

tlc-pack / libflash_attn

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 79 12 Updated May 21, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 79 7 Updated Jul 9, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

1,926 134 Updated Jul 8, 2024