Highlights
- Pro
Stars
Predict the performance of LLM inference services
A throughput-oriented high-performance serving framework for LLMs
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
Efficient and easy multi-instance LLM serving
A library developed by Volcano Engine for high-performance reading and writing of PyTorch model files.
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Serverless LLM Serving for Everyone.
Fast Distributed Inference Serving for Large Language Models
Custom controller that extends the Horizontal Pod Autoscaler
SpotServe: Serving Generative Large Language Models on Preemptible Instances
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild
Letta (formerly MemGPT) is a framework for creating LLM services with memory.
A efficient and effective few-shot NL2SQL method on GPT-4.
Generate comic panels using a LLM + SDXL. Powered by Hugging Face 🤗
A quick guide (especially) for trending instruction finetuning datasets
https://acl2023-retrieval-lm.github.io/
Official release of InternLM2.5 base and chat models. 1M context support
A high-throughput and memory-efficient inference and serving engine for LLMs
A user gateway that provides serverless AIGC experience.
Large Language Model Text Generation Inference
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
ClearML - Model-Serving Orchestration and Repository Solution
Easy, Fast, Secure and Cost-Efficient LLM Pipelines to generate GhatGPT-like private domain models and knowledgeable agents for your organization.
🦜🔗 Build context-aware reasoning applications