Skip to content
View UranusSeven's full-sized avatar
🎯
Focusing
🎯
Focusing
Block or Report

Block or report UranusSeven

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

Showing results

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …

Python 3,420 285 Updated Jun 28, 2024

Auditing and relabeling cross-distribution Linux wheels.

Python 423 138 Updated Jun 19, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 12,358 1,002 Updated Jun 27, 2024

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 3,111 281 Updated Jun 28, 2024

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

C 1,067 407 Updated Jun 28, 2024

Inference code for CodeLlama models

Python 15,400 1,784 Updated May 21, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 428 32 Updated Apr 22, 2024
Jupyter Notebook 410 22 Updated Jun 25, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 745 60 Updated Jun 28, 2024

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Python 2,742 176 Updated Jun 28, 2024

A project aimed at measuring the real-world performance of Large Language Model (LLM) inference frameworks, inspired by the concepts in deepspeed-fastgen.

Python 5 Updated Jan 15, 2024

Python packaging and dependency management made easy

Python 30,279 2,234 Updated Jun 25, 2024
Python 1,108 154 Updated May 28, 2024

Official Implementation of EAGLE

Python 610 57 Updated Jun 27, 2024

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 3,116 331 Updated Jun 28, 2024

Yet Another Clash Dashboard

TypeScript 3,835 686 Updated Feb 8, 2024

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,752 162 Updated Jun 28, 2024

A unified evaluation framework for large language models

Python 2,253 176 Updated May 27, 2024

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,571 401 Updated Jun 28, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 63,798 7,410 Updated Jun 22, 2024

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 79,918 21,498 Updated Jun 29, 2024

Ongoing research training transformer models at scale

Python 9,226 2,082 Updated Jun 27, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,339 794 Updated Jun 27, 2024

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and…

Python 6,213 1,222 Updated Jun 28, 2024

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…

TypeScript 35,562 4,787 Updated Jun 29, 2024

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,392 159 Updated Jun 27, 2024

Sparsity-aware deep learning inference runtime for CPUs

Python 2,925 168 Updated Jun 25, 2024

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 8,395 538 Updated Apr 16, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 1,989 130 Updated Jun 25, 2024

A natural language interface for computers

Python 50,478 4,401 Updated Jun 26, 2024