UranusSeven

🎯

Focusing

Uranus UranusSeven

🎯

Focusing

44 followers · 14 following

https://www.zhihu.com/people/840445

Achievements

x2 x3 x3

Achievements

x2 x3 x3

Block or Report

Block or report UranusSeven

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Lists (9)

Sort

🚀 My stack

Stable Diffusion⭐️

3 repositories

Tools🔨

7 repositories

Training

1 repository

Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

mem0ai / mem0

The memory layer for Personalized AI

Python 18,965 1,785 Updated Aug 4, 2024

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 151 10 Updated Aug 3, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 1,427 53 Updated Aug 4, 2024

neuralmagic / compressed-tensors

A safetensors extension to efficiently store sparse quantized tensors on disk

Python 19 Updated Aug 2, 2024

test-time-training / ttt-lm-jax

Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 315 22 Updated Jul 25, 2024

hahnyuan / LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 242 27 Updated Jul 30, 2024

cuda-mode / lectures

Material for cuda-mode lectures

Jupyter Notebook 2,036 203 Updated Jun 13, 2024

microsoft / MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Python 609 21 Updated Aug 1, 2024

forhaoliu / ringattention

Transformers with Arbitrarily Large Context

Python 588 43 Updated Jul 13, 2024

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 127 17 Updated Jul 31, 2024

LLMServe / SwiftTransformer

High performance Transformer implementation in C++.

C++ 53 4 Updated Apr 22, 2024

WarrenWen666 / AI-Software-Startups

A Survey of AI startups

391 31 Updated Aug 27, 2023

siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Python 1,501 91 Updated Aug 4, 2024

OpenDevin / OpenDevin

🐚 OpenDevin: Code Less, Make More

Python 29,424 3,400 Updated Aug 4, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

948 20 Updated Jul 31, 2024

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 196 30 Updated Jul 26, 2024

feifeibear / Odysseus-Transformer

Odysseus: Playground of LLM Sequence Parallelism

Python 47 1 Updated Jun 17, 2024

tlc-pack / libflash_attn

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 79 12 Updated May 21, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 121 9 Updated Jul 25, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,128 144 Updated Aug 4, 2024