Name		Name	Last commit message	Last commit date
parent directory ..
apis		apis
diagrams		diagrams
guidelines		guidelines
predict-api/v2		predict-api/v2
samples		samples
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
KFSERVING_DEBUG_GUIDE.md		KFSERVING_DEBUG_GUIDE.md
MULTIMODELSERVING_GUIDE.md		MULTIMODELSERVING_GUIDE.md
OPENSHIFT_GUIDE.md		OPENSHIFT_GUIDE.md
PRESENTATIONS.md		PRESENTATIONS.md
README.md		README.md

README.md

Architecture Overview

The InferenceService Data Plane architecture consists of a static graph of components which coordinate requests for a single model. Advanced features such as Ensembling, A/B testing, and Multi-Arm-Bandits should compose InferenceServices together.

Concepts

Endpoint: InferenceServers are divided into two endpoints: "default" and "canary". The endpoints allow users to safely make changes using the Pinned and Canary rollout strategies. Canarying is completely optional enabling users to simply deploy with a BlueGreen deployment strategy on the "default" endpoint.

Component: Each endpoint is composed of multiple components: "predictor", "explainer", and "transformer". The only required component is the predictor, which is the core of the system. As KFServing evolves, we plan to increase the number of supported components to enable use cases like Outlier Detection.

Predictor: The predictor is the workhorse of the InferenceService. It is simply a model and a model server that makes it available at a network endpoint.

Explainer: The explainer enables an optional alternate data plane that provides model explanations in addition to predictions. Users may define their own explanation container, which KFServing configures with relevant environment variables like prediction endpoint. For common use cases, KFServing provides out-of-the-box explainers like Alibi.

Transformer: The transformer enables users to define a pre and post processing step before the prediction and explanation workflows. Like the explainer, it is configured with relevant environment variables too. For common use cases, KFServing provides out-of-the-box transformers like Feast.

Data Plane (V1)

KFServing has a standardized prediction workflow across all model frameworks.

API	Verb	Path	Payload
Readiness	GET	/v1/models/<model_name>	Response:{"name": <model_name>, "ready": true/false}
Predict	POST	/v1/models/<model_name>:predict	Request:{"instances": []} Response:{"predictions": []}
Explain	POST	/v1/models/<model_name>:explain	Request:{"instances": []} Response:{"predictions": [], "explainations": []}

Predict

All InferenceServices speak the Tensorflow V1 HTTP API: https://www.tensorflow.org/tfx/serving/api_rest#predict_api.

Note: Only Tensorflow models support the fields "signature_name" and "inputs".

Explain

All InferenceServices that are deployed with an Explainer support a standardized explanation API. This interface is identical to the Tensorflow V1 HTTP API with the addition of an ":explain" verb.

Data Plane (V2)

The second version of the data-plane protocol addresses several issues found with the V1 data-plane protocol, including performance and generality across a large number of model frameworks and servers.

Predict

The V2 protocol proposes both HTTP/REST and GRPC APIs. See the complete proposal for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

README.md

Architecture Overview

Concepts

Data Plane (V1)

Predict

Explain

Data Plane (V2)

Predict

Files

docs

Directory actions

More options

Directory actions

More options

Latest commit

History

docs

Folders and files

parent directory

README.md

Architecture Overview

Concepts

Data Plane (V1)

Predict

Explain

Data Plane (V2)

Predict