serving

Star

Here are 108 public repositories matching this topic...

pytorch / serve

Star

Serve, optimize and scale PyTorch models in production

docker kubernetes machine-learning cpu deep-learning metrics gpu optimization pytorch serving mlops

Updated Sep 20, 2024
Java

ray-project / ray

Star

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Sep 20, 2024
Python

vespa-engine / vespa

Star

AI + Data, online. https://vespa.ai

java search-engine machine-learning big-data ai server cpp tensorflow vespa serving serving-recommendation vector-search

Updated Sep 20, 2024
Java

Lightning-AI / LitServe

Star

Lightning-fast serving engine for AI models. Flexible. Easy. Enterprise-scale.

api web ai deep-learning rest-api artificial-intelligence developer-tools serving fastapi

Updated Sep 20, 2024
Python

vectorch-ai / ScaleLLM

Star

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Sep 20, 2024
C++

A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

structured-data serving unstructured-data unified-sql vector-database mysql-compatibility embedding-search embedding-store key-value-distributed-store vector-ocean real-time-semantic-search

Updated Sep 20, 2024
Java

tensorflow / serving

Star

A flexible, high-performance serving system for machine learning models

python machine-learning deep-neural-networks deep-learning neural-network cpp tensorflow ml serving

Updated Sep 19, 2024
C++

openvinotoolkit / model_server

Star

A scalable inference server for models optimized with OpenVINO™

kubernetes machine-learning cloud ai deep-learning inference edge dag model-serving serving openvino

Updated Sep 20, 2024
C++

deepjavalibrary / djl-serving

Star

A universal scalable machine learning model deployment solution

deep-learning deployment inference pytorch serving djl

Updated Sep 20, 2024
Java

SeldonIO / seldon-core

Star

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

kubernetes machine-learning deployment serving aiops production-machine-learning mlops machine-learning-operations

Updated Sep 20, 2024
HTML

polyaxon / haupt

Star

Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon

Updated Sep 12, 2024
Python

torchpipe / torchpipe

Star

Serving Inside Pytorch

deployment inference pytorch ray serve tensorrt serving pipeline-parallelism torch2trt triton-inference-server llm-serving

Updated Sep 12, 2024
C++

nirmie / serving_on_k8_rocm

Star

Tutorial on serving LLMs via vllm in docker containers on kubernetes clusters

kubernetes serving vllm llm-inference

Updated Sep 11, 2024
Python

intel / intel-ai-inference-samples

Star

Intel® AI Inference Samples provide example code for deploying optimized inference in Intel platforms.

sample ai intel inference bert serving ipex openvino

Updated Sep 10, 2024
Python

NetEase-Media / grps

Star

【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架，支持dynamic batching、streaming模式，支持python/c++双语言，可限制，可拓展，高性能。帮助用户快速地将模型部署到线上，并通过http/rpc接口方式提供服务。

tensorflow torch tensorrt serving triton-inference-server dynamic-batching vllm tensorrt-llm

Updated Sep 5, 2024
C++

PaddlePaddle / FastDeploy

Star

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.