This repo deploys OFA-Sys / small-stable-diffusion-v0
on top of KServe and TorchServe.
This model is used because of its relatively small size.
-
Provision an
AWS Blank Open Environment
inap-southeast-1
, create an OpenShift cluster with 2p3.2xlarge
worker nodes-
Create a new directory for the install files
mkdir demo cd demo
-
Generate
install-config.yaml
openshift-install create install-config
-
Set the compute pool to 2 replicas with
p3.2xlarge
instances, and set the control plane to a single master (you will need to haveyq
installed)mv install-config.yaml install-config-old.yaml yq '.compute[0].replicas=2' < install-config-old.yaml \ | \ yq '.compute[0].platform = {"aws":{"zones":["ap-southeast-1b"], "type":"p3.2xlarge"}}' \ | \ yq '.controlPlane.replicas=1' \ > install-config.yaml
-
Create the cluster
openshift-install create cluster
You may get a
context deadline exceeded
error - this is expected because there is only a single control-plane node
-
-
Set the
KUBECONFIG
environment variable to point to the new cluster -
Setup the ingress with certificates from Let's Encrypt
./scripts/setup-letsencrypt
Note: After the certificates have been installed, you will need to edit
kubeconfig
and comment out.clusters[*].cluster.certificate-authority-data
-
Deploy the
InferenceService
and the frontend to OpenShiftmake deploy
This will:
- Configure OpenShift for User Workload Monitoring
- Deploy the NFD and Nvidia GPU operators
- Deploy the OpenShift Serverless and Service Mesh operators
- Deploy OpenShift AI and KServe
- Deploy minio
- Download the model and create a model archive (
.mar
) - Upload the model archive and
config.properties
to an S3 bucket - Deploy the model as an
InferenceService
- Deploy the frontend application
-
Send a test request to the
InferenceService
model="$(oc get inferenceservice/sd -n demo -o jsonpath='{.status.url}')" curl \ -sk \ $model/v2/models/sd/infer \ -H 'Content-Type: application/json' \ -d ' {"inputs": [{ "name":"dummy", "shape": [-1], "datatype":"STRING", "data":["an apple"] }]}' \ | \ jq -r '.outputs[0].data[0]' \ | \ base64 -d > apple.jpg
These instructions show how you prepare the model archive manually. If you deployed everything to OpenShift with make deploy
, this should already have been done for you.
We will use OFA-Sys / small-stable-diffusion-v0
-
Ensure you have
git-lfs
setup, then clone the model directorygit clone https://huggingface.co/OFA-Sys/small-stable-diffusion-v0
-
Prepare
model.zip
cd small-stable-diffusion-v0 rm -rf .git zip -r ../model.zip .
-
Create
.mar
torch-model-archiver \ --model-name sd \ --version 1.0 \ --serialized-file model.zip \ --handler custom_handler.py \ -r requirements.txt
These instructions show how to test the model in the model archive with torchserve
on your local machine.
-
Start
torchserve
torchserve \ --start \ --model-store . \ --models sd=sd.mar \ --ts-config config.properties
-
Call the inference API
curl \ -s \ -F 'data=an apple' \ localhost:8085/predictions/sd \ | \ base64 -d > apple.jpg
- waveglow handler - example of using zip files