模型推理运行时-AMD

type

status

date

slug

summary

AMD Inference Server

The AMD Inference Server is an easy-to-use inferencing solution specially designed for AMD CPUs, GPUs, and FPGAs. It can be deployed as a standalone executable or on a Kubernetes cluster with KServe or used to create custom applications by linking to its C++ API. This example demonstrates how to deploy a Tensorflow GraphDef model on KServe with the AMD Inference Server to run inference on AMD EPYC CPUs.

Prerequisites

This example was tested on an Ubuntu 18.04 host machine using the Bash shell.

These instructions assume:

You have a machine with a modern version of Docker (>=18.09) and sufficient disk space to build the image

You have a Kubernetes cluster set up

KServe has been installed on the Kubernetes cluster

Some familiarity with Kubernetes / KServe

Refer to the installation instructions for these tools to install them if needed.

Set up the image

This example uses the AMD ZenDNN backend to run inference on TensorFlow models on AMD EPYC CPUs.

Build the image

To build a Docker image for the AMD Inference Server that uses this backend, download the TF_v2.9_ZenDNN_v3.3_C++_API.zip package from ZenDNN. You must agree to the EULA to download this package. You need a modern version of Docker (at least 18.09) to build this image.

This builds an image on your host: <username>/amdinfer:latest. To use with KServe, you need to upload this image to a Docker registry server such as on a local server. You will also need to update the YAML files in this example to use this image.

More documentation for building a ZenDNN image for KServe is available: ZenDNN + AMD Inference Server and KServe + AMD Inference Server.

Set up the model

In this example, you will use an MNIST Tensorflow model. The AMD Inference Server also supports PyTorch, ONNX and Vitis AI models models with the appropriate Docker images. To prepare new models, look at the KServe + AMD Inference Server documentation for more information about the expected model format.

Make an inference

The AMD Inference Server can be used in single model serving mode in KServe. The code snippets below use the environment variables INGRESS_HOST and INGRESS_PORT to make requests to the cluster. Find the ingress host and port for making requests to your cluster and set these values appropriately.

Add the ClusterServingRuntime

To use the AMD Inference Server with KServe, add it as a serving runtime. A ClusterServingRuntime configuration file is included in this example. To apply it:

Single model serving

Once the AMD Inference Server has been added as a serving runtime, you can start a service that uses it.

Make a request with REST

Once the service is ready, you can make requests to it. Assuming that INGRESS_HOST, INGRESS_PORT, and SERVICE_HOSTNAME have been defined as above, the following command runs an inference over REST to the example MNIST model.

This shows the response from the server in KServe's v2 API format. For this example, it will be similar to:

Expected Output

For MNIST, the data indicates the likely classification for the input image, which is the number 9. In this response, the index with the highest value is the last one, indicating that the image was correctly classified as nine.

模型推理运行时-ONNX

模型推理运行时-Triton-Torchscript