Stars
A throughput-oriented high-performance serving framework for LLMs
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
The most accurate natural language detection library for Python, suitable for short text and mixed-language text
The most accurate natural language detection library for Rust, suitable for short text and mixed-language text
QQQ is an innovative and hardware-optimized W4A8 quantization solution.
Corpus of Te Reo derived from the New Zealand Hansard
The main repository for building Pascal-compatible versions of ML applications and libraries.
RES-Q: Evaluating the Code-Editing Capability of Large Language Model Systems at the Repository Scale
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.
A self-generalizing gradient boosting machine which doesn't need hyperparameter optimization
Open source project for data preparation of LLM application builders
An easy-to-use LLM quantization and inference toolkit based on GPTQ algorithm (weight-only quantization).
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
neuralmagic / nm-vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
An amazing UI for OpenAI's ChatGPT (Website + Windows + MacOS + Linux)
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
Boosting 4-bit inference kernels with 2:4 Sparsity
Simple chat interface for local AI using llama-cpp-python and llama-cpp-agent
Label, clean and enrich text datasets with LLMs.
A Python package for LLM dynamic routing through the Unify REST API.
Evaluate your LLM's response with Prometheus and GPT4 💯