Skip to content

Latest commit

 

History

History
194 lines (136 loc) · 9.11 KB

README.md

File metadata and controls

194 lines (136 loc) · 9.11 KB

LoRAX Logo

LoRAX: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

License Artifact Hub

LoRAX (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned models on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency.

📖 Table of contents

🌳 Features

  • 🚅 Dynamic Adapter Loading: include any fine-tuned LoRA adapter from HuggingFace, Predibase, or any filesystem in your request, it will be loaded just-in-time without blocking concurrent requests. Merge adapters per request to instantly create powerful ensembles.
  • 🏋️‍♀️ Heterogeneous Continuous Batching: packs requests for different adapters together into the same batch, keeping latency and throughput nearly constant with the number of concurrent adapters.
  • 🧁 Adapter Exchang