Block or Report
Block or report UranusSeven
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (9)
Sort Name ascending (A-Z)
Language
Sort by: Recently starred
Starred repositories
Standalone Flash Attention v2 kernel without libtorch dependency
A fast communication-overlapping library for tensor parallelism on GPUs.
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
A Easy-to-understand TensorOp Matmul Tutorial
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
The official home of the Presto distributed SQL query engine for big data
Sequence Parallel Attention for Long Context LLM Model Training and Inference
Ring attention implementation with flash attention
Triton-based implementation of Sparse Mixture of Experts.
[ICML 2024] CLLMs: Consistency Large Language Models
A curated list for Efficient Large Language Models
Whisper realtime streaming for long speech-to-text transcription and translation
🤯 Lobe Chat - an open-source, modern-design LLMs/AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Bedrock / Azure / Mistral / Perplexity ), Multi-Modals (Vision…
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。
Enhanced ChatGPT Clone: Features OpenAI, Assistants API, Azure, Groq, GPT-4 Vision, Mistral, Bing, Anthropic, OpenRouter, Vertex AI, Gemini, AI model switching, message search, langchain, DALL-E-3,…
A Blazing Fast AI Gateway. Route to 200+ LLMs with 1 fast & friendly API.