模型推理运行时-AMD
type
status
date
slug
summary
tags
category
icon
password
网址
AMD Inference Server
The AMD Inference Server is an easy-to-use inferencing solution specially designed for AMD CPUs, GPUs, and FPGAs. It can be deployed as a standalone executable or on a Kubernetes cluster with KServe or used to create custom applications by linking to its C++ API. This example demonstrates how to deploy a Tensorflow GraphDef model on KServe with the AMD Inference Server to run inference on AMD EPYC CPUs.
Prerequisites
This example was tested on an Ubuntu 18.04 host machine using the Bash shell.
These instructions assume:
You have a machine with a modern version of Docker (>=18.09) and sufficient disk space to build the image
You have a Kubernetes cluster set up
KServe has been installed on the Kubernetes cluster
Some familiarity with Kubernetes / KServe
Refer to the installation instructions for these tools to install them if needed.
Set up the image
This example uses the AMD ZenDNN backend to run inference on TensorFlow models on AMD EPYC CPUs.
Build the image
To build a Docker image for the AMD Inference Server that uses this backend, download the
TF_v2.9_ZenDNN_v3.3_C++_API.zip
package from ZenDNN. You must agree to the EULA to download this package. You need a modern version of Docker (at least 18.09) to build this image.This builds an image on your host:
<username>/amdinfer:latest
. To use with KServe, you need to upload this image to a Docker registry server such as on a local server. You will also need to update the YAML files in this example to use this image.More documentation for building a ZenDNN image for KServe is available: ZenDNN + AMD Inference Server and KServe + AMD Inference Server.
Set up the model
In this example, you will use an MNIST Tensorflow model. The AMD Inference Server also supports PyTorch, ONNX and Vitis AI models models with the appropriate Docker images. To prepare new models, look at the KServe + AMD Inference Server documentation for more information about the expected model format.
Make an inference
The AMD Inference Server can be used in single model serving mode in KServe. The code snippets below use the environment variables
INGRESS_HOST
and INGRESS_PORT
to make requests to the cluster. Find the ingress host and port for making requests to your cluster and set these values appropriately.Add the ClusterServingRuntime
To use the AMD Inference Server with KServe, add it as a serving runtime. A
ClusterServingRuntime
configuration file is included in this example. To apply it:Single model serving
Once the AMD Inference Server has been added as a serving runtime, you can start a service that uses it.
Make a request with REST
Once the service is ready, you can make requests to it. Assuming that
INGRESS_HOST
, INGRESS_PORT
, and SERVICE_HOSTNAME
have been defined as above, the following command runs an inference over REST to the example MNIST model.This shows the response from the server in KServe's v2 API format. For this example, it will be similar to:
Expected Output
For MNIST, the data indicates the likely classification for the input image, which is the number 9. In this response, the index with the highest value is the last one, indicating that the image was correctly classified as nine.
上一篇
模型推理运行时-ONNX
下一篇
模型推理运行时-Triton-Torchscript
Loading...