Serve, optimize and scale PyTorch models in production
-
Updated
Sep 20, 2024 - Java
Serve, optimize and scale PyTorch models in production
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
AI + Data, online. https://vespa.ai
Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.
A high-performance inference system for large language models, designed for production environments.
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
A flexible, high-performance serving system for machine learning models
A scalable inference server for models optimized with OpenVINO™
A universal scalable machine learning model deployment solution
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Serving Inside Pytorch
Tutorial on serving LLMs via vllm in docker containers on kubernetes clusters
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
A unified end-to-end machine intelligence platform
Friendli: the fastest serving engine for generative AI
ClearML - Model-Serving Orchestration and Repository Solution
Add a description, image, and links to the serving topic page so that developers can more easily learn about it.
To associate your repository with the serving topic, visit your repo's landing page and select "manage topics."