Stars
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
Universal LLM Deployment Engine with ML Compilation
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
Dockerfiles and scripts for ONNX container images
Software Engineering for AI/ML -- An Annotated Bibliography