模型推理运行时-AMD

type
status
date
slug
summary
tags
category
icon
password
网址

AMD Inference Server

The AMD Inference Server is an easy-to-use inferencing solution specially designed for AMD CPUs, GPUs, and FPGAs. It can be deployed as a standalone executable or on a Kubernetes cluster with KServe or used to create custom applications by linking to its C++ API. This example demonstrates how to deploy a Tensorflow GraphDef model on KServe with the AMD Inference Server to run inference on AMD EPYC CPUs.

Prerequisites

This example was tested on an Ubuntu 18.04 host machine using the Bash shell.
These instructions assume:
    • You have a machine with a modern version of Docker (>=18.09) and sufficient disk space to build the image
    • You have a Kubernetes cluster set up
    • KServe has been installed on the Kubernetes cluster
    • Some familiarity with Kubernetes / KServe
Refer to the installation instructions for these tools to install them if needed.

Set up the image

This example uses the AMD ZenDNN backend to run inference on TensorFlow models on AMD EPYC CPUs.

Build the image

To build a Docker image for the AMD Inference Server that uses this backend, download the TF_v2.9_ZenDNN_v3.3_C++_API.zip package from ZenDNN. You must agree to the EULA to download this package. You need a modern version of Docker (at least 18.09) to build this image.
This builds an image on your host: <username>/amdinfer:latest. To use with KServe, you need to upload this image to a Docker registry server such as on a local server. You will also need to update the YAML files in this example to use this image.
More documentation for building a ZenDNN image for KServe is available: ZenDNN + AMD Inference Server and KServe + AMD Inference Server.

Set up the model

In this example, you will use an MNIST Tensorflow model. The AMD Inference Server also supports PyTorch, ONNX and Vitis AI models models with the appropriate Docker images. To prepare new models, look at the KServe + AMD Inference Server documentation for more information about the expected model format.

Make an inference

The AMD Inference Server can be used in single model serving mode in KServe. The code snippets below use the environment variables INGRESS_HOST and INGRESS_PORT to make requests to the cluster. Find the ingress host and port for making requests to your cluster and set these values appropriately.

Add the ClusterServingRuntime

To use the AMD Inference Server with KServe, add it as a serving runtime. A ClusterServingRuntime configuration file is included in this example. To apply it:

Single model serving

Once the AMD Inference Server has been added as a serving runtime, you can start a service that uses it.

Make a request with REST

Once the service is ready, you can make requests to it. Assuming that INGRESS_HOST, INGRESS_PORT, and SERVICE_HOSTNAME have been defined as above, the following command runs an inference over REST to the example MNIST model.
This shows the response from the server in KServe's v2 API format. For this example, it will be similar to:
Expected Output
For MNIST, the data indicates the likely classification for the input image, which is the number 9. In this response, the index with the highest value is the last one, indicating that the image was correctly classified as nine.
上一篇
模型推理运行时-ONNX
下一篇
模型推理运行时-Triton-Torchscript
Loading...
文章列表
Kserve中文文档
快速开始
管理指南
用户指南
开发指南
机器学习概念
大模型周报