Skip to content

Latest commit

 

History

History
73 lines (58 loc) · 4.22 KB

ROADMAP.md

File metadata and controls

73 lines (58 loc) · 4.22 KB

KServe 2024-2025 Roadmap

Objective: "Support GenAI inference"

  • LLM Serving Runtimes

    • Support Speculative Decoding with vLLM runtime [#3800].
    • Support LoRA adapters [#3750].
    • Support LLM Serving runtimes for TensorRT-LLM, TGI and provide benchmarking comparisons [#3868].
    • Support multi-host, multi-GPU inference runtime [#2145].
  • LLM Autoscaling

    • Support Model Caching with automatic PV/PVC provisioning [#3869].
    • Support Autoscaling settings for serving runtimes.
    • Support Autoscaling based on custom metrics [#3561].
  • LLM RAG/Agent Pipeline Orchestration

    • Support declarative RAG/Agent workflow using KServe Inference Graph [#3829].
  • Open Inference Protocol extension to GenAI Task APIs

  • LLM Gateway

    • Support multiple LLM providers.
    • Support token based rate limiting.
    • Support LLM router with traffic shaping, fallback, load balancing.
    • LLM Gateway observability for metrics and cost reporting

Objective: "Graduate core inference capability to stable/GA"

  • Promote InferenceService and ClusterServingRuntime/ServingRuntime CRD to v1

    • Improve InferenceService CRD for REST/gRPC protocol interface
    • Improve model storage interface
    • Deprecate TrainedModel CRD and add multiple model support for co-hosting, draft model, LoRA adapters to InferenceService.
    • Improve YAML UX for predictor and transformer container collocation.
    • Close the feature gap between RawDeployment and Serverless mode.
  • Open Inference Protocol

    • Support batching for v2 inference protocol
    • Transformer and Explainer v2 inference protocol interoperability
    • Improve codec for v2 inference protocol

Reference: Control plane issues, Data plane issuesServing Runtime issues.

Objective: "Graduate KServe Python SDK to 1.0“

  • Create standardized model packaging API
  • Improve KServe model server observability with metrics and distributed tracing
  • Support batch inference

Reference:Python SDK issues, Storage issues

Objective: "Graduate InferenceGraph"

  • Improve InferenceGraph spec for replica and concurrency control
  • Support distributed tracing
  • Support gRPC for InferenceGraph
  • Standalone Transformer support for InferenceGraph
  • Support traffic mirroring node
  • Improve RawDeployment mode for InferenceGraph

Reference: InferenceGraph issues

Objective: "Secure InferenceService"

  • Document KServe ServiceMesh setup with mTLS
  • Support programmatic authentication token
  • Implement per service level auth
  • Add support for SPIFFE/SPIRE identity integration with InferenceService

Reference: Auth related issues

Objective: "KServe 1.0 documentation"

  • Add ModelMesh docs and explain the use cases for classic KServe and ModelMesh
  • Unify the data plane v1 and v2 page formats
  • Improve v2 data plane docs to tell the story why and what changed
  • Clean up the examples in kserve repo and unify them with the website's by creating one source of truth for documentation
  • Update any out-of-date documentation and make sure the website as a whole is consistent and cohesive