Skip to content
View UranusSeven's full-sized avatar
🎯
Focusing
🎯
Focusing
Block or Report

Block or report UranusSeven

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

166 results for source starred repositories
Clear filter

OneDiff: An out-of-the-box acceleration library for diffusion models.

Python 1,395 85 Updated Jun 28, 2024

🐚 OpenDevin: Code Less, Make More

Python 28,127 3,210 Updated Jun 29, 2024

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 175 27 Updated Jun 29, 2024

Odysseus: Playground of LLM Sequence Parallelism

Python 31 Updated Jun 17, 2024

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 76 12 Updated May 21, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 53 3 Updated Jun 27, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

1,841 129 Updated Jun 29, 2024

A Easy-to-understand TensorOp Matmul Tutorial

C++ 212 21 Updated Jun 15, 2024

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 159 14 Updated Jun 16, 2024

The official Meta Llama 3 GitHub site

Python 22,600 2,361 Updated Jun 25, 2024

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python 397 22 Updated Apr 20, 2024

Microsoft Collective Communication Library

C++ 268 26 Updated Sep 20, 2023
Python 4,451 751 Updated Jun 28, 2024
Python 8,748 1,133 Updated Jun 24, 2024

Structured Text Generation

Python 6,899 356 Updated Jun 29, 2024

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 9,830 2,848 Updated Jun 29, 2024

The official home of the Presto distributed SQL query engine for big data

Java 15,739 5,282 Updated Jun 28, 2024

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Python 193 7 Updated Jun 27, 2024

Ring attention implementation with flash attention

Python 406 30 Updated May 20, 2024

Triton-based implementation of Sparse Mixture of Experts.

Python 146 9 Updated Jun 15, 2024

[ICML 2024] CLLMs: Consistency Large Language Models

Python 313 14 Updated Jun 24, 2024

A curated list for Efficient Large Language Models

Python 933 72 Updated Jun 28, 2024

Whisper realtime streaming for long speech-to-text transcription and translation

Python 1,384 171 Updated Jun 6, 2024

🤯 Lobe Chat - an open-source, modern-design LLMs/AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Bedrock / Azure / Mistral / Perplexity ), Multi-Modals (Vision…

TypeScript 34,203 8,020 Updated Jun 29, 2024

A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。

TypeScript 72,397 57,536 Updated Jun 27, 2024
Next