Skip to content
View UranusSeven's full-sized avatar
🎯
Focusing
🎯
Focusing
Block or Report

Block or report UranusSeven

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

Showing results

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 76 12 Updated May 21, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 50 3 Updated Jun 14, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

1,808 128 Updated Jun 24, 2024

A Easy-to-understand TensorOp Matmul Tutorial

C++ 209 20 Updated Jun 15, 2024

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Python 157 14 Updated Jun 16, 2024

The official Meta Llama 3 GitHub site

Python 22,445 2,331 Updated Jun 25, 2024

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python 396 22 Updated Apr 20, 2024

Microsoft Collective Communication Library

C++ 266 26 Updated Sep 20, 2023
Python 4,416 749 Updated Jun 25, 2024
Python 8,719 1,128 Updated Jun 24, 2024

Structured Text Generation

Python 6,829 348 Updated Jun 25, 2024

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 9,812 2,842 Updated Jun 25, 2024

The official home of the Presto distributed SQL query engine for big data

Java 15,720 5,279 Updated Jun 25, 2024

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Python 188 7 Updated Jun 25, 2024

Ring attention implementation with flash attention

Python 398 28 Updated May 20, 2024

Triton-based implementation of Sparse Mixture of Experts.

Python 146 9 Updated Jun 15, 2024

[ICML 2024] CLLMs: Consistency Large Language Models

Python 310 14 Updated Jun 24, 2024

A curated list for Efficient Large Language Models

Python 924 70 Updated Jun 24, 2024

Whisper realtime streaming for long speech-to-text transcription and translation

Python 1,369 170 Updated Jun 6, 2024

🤯 Lobe Chat - an open-source, modern-design LLMs/AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Bedrock / Azure / Mistral / Perplexity ), Multi-Modals (Vision…

TypeScript 33,941 7,933 Updated Jun 25, 2024

A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。

TypeScript 72,196 57,431 Updated Jun 25, 2024

Self-hosted AI coding assistant

Rust 18,203 762 Updated Jun 25, 2024

Enhanced ChatGPT Clone: Features OpenAI, Assistants API, Azure, Groq, GPT-4 Vision, Mistral, Bing, Anthropic, OpenRouter, Vertex AI, Gemini, AI model switching, message search, langchain, DALL-E-3,…

TypeScript 14,671 2,456 Updated Jun 25, 2024

A new bootable USB solution.

C 60,090 3,929 Updated Jun 23, 2024

A Blazing Fast AI Gateway. Route to 200+ LLMs with 1 fast & friendly API.

Jupyter Notebook 5,019 347 Updated Jun 20, 2024

AI chat for every model.

TypeScript 27,185 7,531 Updated Jun 24, 2024
Next