- Overview
- Download the IBM Z Deep Learning compiler container image
- Environment variables
- IBM Z Deep Learning Compiler command line interface help
- Building the code samples
- IBM Z Integrated Accelerator for AI
- Removing IBM Z Deep Learning Compiler
The IBM Z Deep Learning Compiler uses ONNX-MLIR to compile .onnx deep learning AI models into shared libaries. The shared libaries can then be integrated into C, C++, Java, or Python applications.
The compiled models take advantage of IBM zSystems technologies including SIMD on IBM z13 and later and the Integrated Accelerator for AI available on IBM z16 without changes to the original model.
ONNX is an open format for representing AI models. It is open source and vendor neutral. Some AI frameworks directly support exporting to .onnx format. For other frameworks, open source converters are readily available. ONNX Support Tools has links to steps and converters for many popular AI frameworks.
See Verfied ONNX Model Zoo models for the list of models from the ONNX Model Zoo that have been built and verified with the IBM Z Deep Learning Compiler.
These are the general end-to-end steps to use IBM zDLC:
- Create, convert, or download an ONNX model.
- Download the zdlc image from IBM Z and LinuxOne Container Registry.
- Use the image to compile a shared library of the model for your desired language.
- Import the compiled model into your application.
- Run your application.
Downloading the IBM Z Deep Learning Compiler container image requires
credentials for the icr.io
registry. Information on obtaining the credentials
is located at IBM Z and LinuxONE Container Registry.
Determine the desired version of the zdlc image to download from the IBM Z and LinuxOne Container Registry.
Switch to the desired directory to download the zdlc image to.
Set ZDLC_IMAGE based on the desired version and ZDLC_DIR to the current working directory where the zdlc image will be downloaded to:
ZDLC_IMAGE=icr.io/ibmz/zdlc:[version]
ZDLC_DIR=$(pwd)/zDLC
Pull the image as shown in the following code block:
docker pull ${ZDLC_IMAGE}
Note the zdlc examples require that ZDLC_IMAGE and ZDLC_DIR be set.
Variable | Description |
---|---|
ZDLC_IMAGE=icr.io/ibmz/zdlc:[version] | Set [version] based on the desired version in IBM Z and LinuxONE Container Registry. Used in: • IBM Z Deep Learning Compiler command line interface help • Building a model .so using the IBM Z Deep Learning Compiler • Building C++ programs to call the model • Building a model .jar file using the IBM zDLC • Building Java programs to call the model • Running the Python example • Compiling models to utilize the IBM Z Integrated Accelerator for AI |
ZDLC_DIR=$(pwd)/zDLC | $(pwd) resolves to the current working directory. Ensure the current working directory contains the downloaded IBM Z Deep Learning Compiler container image before setting the ZLDC_DIR environment variable. Used in: • Environment variables • Running the Python example |
Set the environment variables for use with the IBM Z Deep Learning Compiler container image. The environment variables will simplify the container commands throughout the rest of this document. See the description in the table below for addidtional information. ZDLC_DIR must be set first. See Download the IBM Z Deep Learning compiler container image to set if needed.
GCC_IMAGE_ID=icr.io/ibmz/gcc:12
JDK_IMAGE_ID=icr.io/ibmz/openjdk:11
ZDLC_CODE_DIR=${ZDLC_DIR}/code
ZDLC_LIB_DIR=${ZDLC_DIR}/lib
ZDLC_BUILD_DIR=${ZDLC_DIR}/build
ZDLC_MODEL_DIR=${ZDLC_DIR}/models
ZDLC_MODEL_NAME=mnist-12
Running the IBM Z Deep Learning Compiler container image with no parameters shows the complete help for the IBM Z Deep Learning Compiler.
docker run --rm ${ZDLC_IMAGE}
Note the command line entry point for the IBM Z Deep Learning Compiler is the
zdlc
command. The IBM Z Deep Learning Compiler is invoked by running the
zdlc
image with the docker run
command.
Command and Parameters |
Description |
---|---|
docker run | Run the container image. |
--rm | Delete the container after running the command. |
The help for the IBM Z Deep Learning Compiler can also be displayed by
adding the --help
option to the command line.
The easiest way to follow the examples is to clone the example code repository:
git clone https://github.com/IBM/zDLC
The code examples are located in the GitHub repository.
The code examples build three deep learning models from the ONNX Model Zoo. You can download just the example model using:
wget --directory-prefix $ZDLC_MODEL_DIR https://github.com/onnx/models/raw/main/vision/classification/mnist/model/$ZDLC_MODEL_NAME.onnx
or see Obtaining the models to download the other models from the model zoo. The examples use $ZDLC_MODEL_DIR as the directory and $ZDLC_MODEL_NAME specifies the model name (without the .onnx) in that directory.
Use the --EmitLib
option to build a .so
shared library of the model specified by ZDLC_MODEL_NAME in Environment variables:
docker run --rm -v ${ZDLC_MODEL_DIR}:/workdir:z ${ZDLC_IMAGE} --EmitLib --O3 --mcpu=z14 --mtriple=s390x-ibm-loz ${ZDLC_MODEL_NAME}.onnx
Command and Parameters |
Description |
---|---|
ZDLC_MODEL_NAME | Name of the model to compile without ending suffix. |
docker run | Run the container image. |
--rm | Delete the container after running the command. |
-v ${ZDLC_MODEL_DIR}:/workdir:z | The host bind mount points to the directory with the model ONNX file. :z is required to share the volume if SELinux is installed. |
--EmitLib | Build the .so shared library of the model. |
--O3 | Optimize to the highest level. |
--mcpu=z14 | The minimum CPU architecture (for generated code instructions). |
--mtriple=s390x-ibm-loz | The target architecture for generated code. |
${ZDLC_MODEL_NAME}.onnx | Builds the .so shared library from the specified ONNX file. |
The built .so
shared library is written to the host bind mount location.
The ONNX models for the examples can be found in the ONNX Model Zoo.
The example program is written in the C++ programming language and compiled
with the g++
compiler. The example program calls the IBM Z Deep Learning
Compiler APIs built into the .so
shared library. The source code for the
example program is at
C++ example.
Some setup steps are required before building the programs to call the model. The ONNX-MLIR Runtime API files first need to be copied from the container image. Run these commands from the command line to copy files.
mkdir -p ${ZDLC_BUILD_DIR}
docker run --rm -v ${ZDLC_BUILD_DIR}:/files:z --entrypoint '/usr/bin/bash' ${ZDLC_IMAGE} -c "cp -r /usr/local/{include,lib} /files"
Command and Parameters |
Description |
---|---|
docker run | Run the container image. |
--rm | Delete the container after running the command. |
-v ${ZDLC_BUILD_DIR}:/files:z | The host bind mount points to the directory to copy the build files from IBM. :z is required to share the volume if SELinux is installed. |
cp | Run the copy command to copy the build files from IBM into the host bind mount. |
Run this optional step to see the files that were copied.
ls -laR ${ZDLC_BUILD_DIR}
Next pull a Docker image with the g++
compiler tools installed.
docker pull ${GCC_IMAGE_ID}
The setup steps have been completed. Use the g++
image and the
ONNX-MLIR C++ Runtime API files to build the program.
cp ${ZDLC_MODEL_DIR}/${ZDLC_MODEL_NAME}.so ${ZDLC_CODE_DIR}
docker run --rm -v ${ZDLC_CODE_DIR}:/code:z -v ${ZDLC_BUILD_DIR}:/build:z ${GCC_IMAGE_ID} g++ -std=c++11 -O3 -I /build/include /code/deep_learning_compiler_run_model_example.cpp -l:${ZDLC_MODEL_NAME}.so -L/code -Wl,-rpath='$ORIGIN' -o /code/deep_learning_compiler_run_model_example
The following table explains the command line:
Command and Parameters |
Description |
---|---|
docker run | Run the container image. |
--rm | Delete the container after running the command. |
-v ${ZDLC_CODE_DIR}:/code:z | The /code host bind mount points to the directory with the calling program. :z is required to share the volume if SELinux is installed. |
-v ${ZDLC_BUILD_DIR}:/build:z | The /build host bind mount points to the directory containing the build files from IBM. :z is required to share the volume if SELinux is installed. |
The following table explains the g++
command line:
Command and Parameters |
Description |
---|---|
g++ | Run the g++ compiler from the container command line. |
-std=c++11 -O3 | g++ compiler options (See the man g++ help for additional information.). |
-I /build/include | This is the location of the include header files. |
/code/deep_learning_compiler_run_model_example.cpp | The example program to build. |
-l:${ZDLC_MODEL_NAME}.so | The model .so shared library that was previously built. |
-L/code | Tell the g++ linker where to find the model .so shared library. |
-Wl,-rpath='$ORIGIN' | (This is a very important parameter for correctly building the C++ example program.) The GNU loader (LD) uses the rpath to locate the model .so file when the program is run. (See the man ld.so help for additional information.) |
-o /code/deep_learning_compiler_run_model_example | Tell the g++ linker the name of the built program. |
The program is now ready to be run from the command line. When run, the program will inference the model with randomly generated test data values.
docker run --rm -v ${ZDLC_CODE_DIR}:/code:z ${GCC_IMAGE_ID} /code/deep_learning_compiler_run_model_example
With this example, the program is linked to the built model and is run in the container. The expected program output is ten random float values (because the input was random) from the model.
Use the --EmitJNI
option to build a jar file of the model specified by ZDLC_MODEL_NAME in Environment variables.
docker run --rm -v ${ZDLC_MODEL_DIR}:/workdir:z ${ZDLC_IMAGE} --EmitJNI --O3 --mcpu=z14 --mtriple=s390x-ibm-loz ${ZDLC_MODEL_NAME}.onnx
Command and Parameters |
Description |
---|---|
ZDLC_MODEL_NAME | Name of the model to compile without ending suffix. |
docker run | Run the container image. |
--rm | Delete the container after running the command. |
-v ${ZDLC_MODEL_DIR}:/workdir:z | The host bind mount points to the directory with the model ONNX file. :z is required to share the volume if SELinux is installed. |
--EmitJNI | Build the jar file of the model. |
${ZDLC_MODEL_NAME}.onnx | Builds the .jar shared library from the specified ONNX file. |
The built jar file is written to the host bind mount location.
The example program is written in the Java programming language and compiled with a Java JDK. The example program calls the ONNX-MLIR Java Runtime APIs through the JNI interfaces built in the model jar file. The source code for the example program is at Java example.
Some setup steps are required before building the programs to call the model. The ONNX-MLIR Runtime API files first need to be copied from the container image. Run these commands from the command line to copy files.
mkdir -p ${ZDLC_BUILD_DIR}
docker run --rm -v ${ZDLC_BUILD_DIR}:/files:z --entrypoint '/usr/bin/bash' ${ZDLC_IMAGE} -c "cp -r /usr/local/{include,lib} /files"
Command and Parameters |
Description |
---|---|
docker run | Run the container image. |
--rm | Delete the container after running the command. |
-v ${ZDLC_BUILD_DIR}:/files:z | The host bind mount points to the directory to copy the build files from IBM. :z is required to share the volume if SELinux is installed. |
cp | Run the copy command to copy the build files from IBM into the host bind mount. |
Run this optional step to see the files that were copied.
ls -laR ${ZDLC_BUILD_DIR}
Pull a Java JDK image to build and run the Java example:
docker pull ${JDK_IMAGE_ID}
Build the Java calling program using the javac
command.
mkdir -p ${ZDLC_CODE_DIR}/class
docker run --rm -v ${ZDLC_CODE_DIR}:/code:z -v ${ZDLC_BUILD_DIR}:/build:z ${JDK_IMAGE_ID} javac -classpath /build/lib/javaruntime.jar -d /code/class /code/deep_learning_compiler_run_model_example.java
Command and Parameters |
Description |
---|---|
docker run | Run the container image. |
--rm | Delete the container after running the command. |
-v ${ZDLC_CODE_DIR}:/code:z | The /code host bind mount points to the directory with the calling program. :z is required to share the volume if SELinux is installed. |
-v ${ZDLC_BUILD_DIR}:/build:z | The /build host bind mount points to the directory containing the build files from IBM. :z is required to share the volume if SELinux is installed. |
javac | Run the JDK Java compiler from the container command line. |
-classpath /build/lib/javaruntime.jar | Need to specify the path to the run-time jar from IBM. |
-d /code/class | The build class files are stored at ${ZDLC_CODE_DIR}/class. |
The program is now ready to be run from the command line. When run, the program will inference the model with randomly generated test data values.
cp ${ZDLC_MODEL_DIR}/${ZDLC_MODEL_NAME}.jar ${ZDLC_CODE_DIR}
docker run --rm -v ${ZDLC_CODE_DIR}:/code:z ${JDK_IMAGE_ID} java -classpath /code/class:/code/${ZDLC_MODEL_NAME}.jar deep_learning_compiler_run_model_example
With this example, the Java classpath
contains the paths for the host
bind mounts when run within the container. The classpath
needs to be
adjusted if the Java program is run directly from the command line. The
expected program output is a list of random float values (because the
input was random) from the model.
This example program is written in Python and runs using the Python runtime.
The example program calls the ONNX-MLIR Runtime APIs by leveraging
pybind and PyExecutionSession
which is best described in sections Using PyRuntime
and PyRuntime Module
in the linked documentation.
If not already compiled, compile the model specified by ZDLC_MODEL_NAME in Environment variables to a .so shared library as described previously.
Next, copy the PyRuntime library out of the docker container using:
mkdir -p ${ZDLC_LIB_DIR}
docker run --rm -v ${ZDLC_LIB_DIR}:/files:z --entrypoint '/usr/bin/bash' ${ZDLC_IMAGE} -c "cp /usr/local/lib/PyRuntime.cpython-*-s390x-linux-gnu.so /files"
Command and Parameters |
Description |
---|---|
docker run | Run the container image. |
--rm | Delete the container after running the command. |
-v ${ZDLC_LIB_DIR}:/files:z | The /files host bind mount points to the directory we want to contain the PyRuntime library. :z is required to share the volume if SELinux is installed. |
--entrypoint '/usr/bin/bash' | The user will enter the container with /usr/bin/bash as the starting process. |
-c "cp" | Tell the entrypoint bash process to copy the PyRuntime library outside of the container into the directory bind mounted at /files . |
Run this optional step to see the files that were copied.
ls -laR ${ZDLC_LIB_DIR}
Two configuration approaches are described in
onnx-mlir's Configuring and using PyRuntime,
but we'll prefer the PYTHONPATH
approach so we avoid creating symbolic
links for this example.
Build the example Python image with the following command:
docker build -f ${ZDLC_DIR}/docker/Dockerfile.python -t zdlc-python-example .
Command and Parameters |
Description |
---|---|
docker build | Build the container image. |
-f docker/Dockerfile.python | Use docker/Dockerfile.python as the Dockerfile for this container build. |
-t zdlc-python-example | Build the image with the image:tag specification of zdlc-python-example:latest . |
Finally, run the Python client with the following command:
docker run --rm -v ${ZDLC_LIB_DIR}:/build/lib:z -v ${ZDLC_CODE_DIR}:/code:z -v ${ZDLC_MODEL_DIR}:/models:z --env PYTHONPATH=/build/lib zdlc-python-example:latest /code/deep_learning_compiler_run_model_python.py /models/${ZDLC_MODEL_NAME}.so
Command and Parameters |
Description |
---|---|
docker run | Run the container image. |
--rm | Delete the container after running the command. |
-v ${ZDLC_LIB_DIR}:/build/lib:z | The /build/lib host bind mount points to the directory containing the PyRuntime library. :z is required to share the volume if SELinux is installed. |
-v ${ZDLC_CODE_DIR}:/code:z | The /code host bind mount points to the directory with the calling program. :z is required to share the volume if SELinux is installed. |
-v ${ZDLC_MODEL_DIR}:/model:z | The /model host bind mount points to the directory with the model .so file. :z is required to share the volume if SELinux is installed. |
--env PYTHONPATH=/build/lib | When the container is launched, the PYTHONPATH environment variable is setup to point to /build/lib directory containing the PyRuntime library needed for execution. |
Once complete, you'll see output like the following:
The input tensor dimensions are:
[1, 3, 224, 224]
A brief overview of the output tensor is:
[[-2.4883294 0.4591511 1.1298141 ... -2.8113475 -1.3842212
2.6721394 ]
[-5.064701 0.17290297 -1.866698 ... 0.39307398 -4.6048536
2.116905 ]
[-3.6744304 1.906144 -2.4807017 ... -0.96054727 -3.919518
0.92789984]]
The dimensions of the output tensor are:
(3, 1000)
Note that the output values will be random since the input values are random.
IBM z16 systems include a new Integrated Accelerator for AI to enable real-time AI for transaction processing at scale. The IBM Z Deep Learning Compiler helps your new and existing deep learning models take advantage of this new accelerator.
Any IBM zSystem can be used to compile models to take advantage of the Integrated Accelerator for AI, including IBM z15 and older machines. However, if acceleration is enabled at compile time, the compiled model will only run on IBM zSystems which have the accelerator. Machines which have an accelerator can run models compiled without acceleration but those models will not take advantage of the accelerator.
Like other compilers, the IBM zDLC's default settings compile models so that they run on as many systems as possible. To use machine specific features, such as the Integrated Accelerator for AI, you must specify an additional option when compiling the model.
When set, supported ONNX Operators are directed to the accelerator instead of the CPU. The compile process handles routing the operations between the CPU and accelerator and any required data conversion. No changes are required to your model.
To compile a model to use the Integrated Accelerator for AI, The --maccel=NNPA
option needs to be specified on the command line.
Additionally, since the accelerator is only available for IBM z16 and greater,
it is recommended to also use --mcpu=16
.
Using the .so shared library example
, the command line to compile
models that take advantage of the Integrated Accelerator for AI is:
docker run --rm -v ${ZDLC_MODEL_DIR}:/workdir:z ${ZDLC_IMAGE} --EmitLib --O3 --mcpu=z16 --mtriple=s390x-ibm-loz --maccel=NNPA ${ZDLC_MODEL_NAME}.onnx
Once the model is built to use the IBM Z Integrated Accelerator for AI, no changes are required on the command line to run the model:
cp ${ZDLC_MODEL_DIR}/${ZDLC_MODEL_NAME}.so ${ZDLC_CODE_DIR}
docker run --rm -v ${ZDLC_CODE_DIR}:/code:z ${GCC_IMAGE_ID} /code/deep_learning_compiler_run_model_example
The same flags are required for compiling shared libraries for any language including Java and Python. Likewise, no additional steps are required when running the shared libraries.
When compiling models for the IBM Z Integrated Accelerator for AI, IBM zDLC optimizes models to take advantage of the accelerator when possible. In order to support a wide range of models, IBM zDLC will compile models so operators not supported by the accelerator, or operators with unsupported settings, run on the CPU.
When running models with multiple dynamic dimensions (i.e. models with
multiple -1
in their input signatures), using the --shapeInformation
flag
to set those dimensions to static values may improve model runtime performance.
For some models, this allows the IBM zDLC to better determine at compile time
which operations will be compatible with the accelerator.
For example, if a vision model has an input tensor with shape (-1, -1, -1, 3)
representing (batch, height, width, channels)
, you may see increased
performance by specifying the height
and width
dimensions at compile time.
To do so, add --shapeInformation 0:-1x640x480x3
when compiling the model.
If the model has mutliple input tensors, those can also be specified using
--shapeInformation 0:-1x640x480x3,1:-1x100,2:...
.
The --shapeInformation
flag can be used with --onnx-op-stats
to determine
if specifying the shape enables more operations to run on the IBM Z Integrated
Accelerator for AI. See View operation targets at compile time.
The IBM Z Deep Learning Compiler can optionally report the number of Operators that will run on CPU vs the IBM Z Integrated Accelerator for AI at compile time.
When compiling the model, add --onnx-op-stats [TXT|JSON]
. Operations that
begin with onnx.*
will execute on CPU and operations that begin with zhigh.*
are related to the IBM Z Integrated Accelerator for AI.
For the most up to date list, see Supported ONNX Operation for Target NNPA in the onnx-mlir repository.
First, find the IMAGE ID
for the container image.
docker images
Then delete the image using the IMAGE ID
.
docker rmi IMAGE-ID
If an in-use error occurs while attempting to delete the container image,
use the docker ps -a
command to show any running containers. Use the
docker stop
and docker rm
commands to remove the running instances
of the container. Then re-run the docker rmi
command.