Inference performance validation with PyTorch on the AMD Instinct MI300X accelerator

Overview 🎉

The ROCm PyTorch Docker image offers a prebuilt, optimized environment for validating model inference performance on the AMD Instinct™ MI300X accelerator.

This Docker image packages includes PyTorch for an AMD Instinct™ MI300X accelerator. With this Docker image, users can quickly validate the expected inference performance numbers on the MI300X accelerator. This guide also provides tips and techniques so that users can get optimal performance with popular AI models.

Reproducing benchmark results 🚀

Use the following instructions to reproduce the benchmark results on an MI300X accelerator with a prebuilt Docker image.

MAD-integrated benchmarking

NUMA balancing setting

To optimize performance, disable automatic NUMA balancing. Otherwise, the GPU might hang until the periodic balancing is finalized. For further details, refer to the AMD Instinct MI300X system optimization guide.

# disable automatic NUMA balancing
sh -c 'echo 0 > /proc/sys/kernel/numa_balancing'
# check if NUMA balancing is disabled (returns 0 if disabled)
cat /proc/sys/kernel/numa_balancing
0

Download the Docker image 🐳

The following command pulls the Docker image from Docker Hub.

docker pull rocm/pytorch:latest

MAD-integrated benchmarking

Clone the ROCm Model Automation and Dashboarding (MAD) repository to a local directory and install the required packages on the host machine.

git clone https://github.com/ROCm/MAD
cd MAD
pip install -r requirements.txt

Use this command to run a performance benchmark test of the Chai-1 model on one GPU with float16 data type in the host machine.

export MAD_SECRETS_HFTOKEN="your personal Hugging Face token to access gated models"
madengine run --tags pyt_chai1_inference --keep-model-dir --live-output --timeout 28800

ROCm MAD launches a Docker container with the name container_ci-pyt_chai1_inference. The latency and throughput reports of the model are collected in the perf.csv.

Available models

model_name
pyt_mochi_video_inference
pyt_chai1_inference
pyt_clip_inference (ViT-B-32, laion2b_s34b_b79k)
pyt_wan2.1_inference (Wan2.1-T2V-14B)
pyt_janus_pro_inference (Janus-Pro-7B)
pyt_hy_video

Enable Tunable Operator ⚙️

To collect performance data using PyTorch’s Tunable Operators feature, include the --tunableop on argument in your run.

By default, pyt_clip_inference and pyt_janus_pro_inference models already include --tunableop off in their configurations. To customize the behavior, edit the models.json, find pyt_clip_inference or pyt_janus_pro_inference config, respectively, and modify the args field to --tunableop on accordingly.

This triggers a two-pass run: a warm-up followed by a performance-collection run, generating a gemm_result_<dataset>.csv file for analysis.

madengine run --tags [model tag] --keep-model-dir --live-output --timeout 28800

References 🔎

To learn how to run LLM models from Hugging Face or your own model, see the Using ROCm for AI section of the ROCm documentation.

To learn how to optimize inference on LLMs, see the Fine-tuning LLMs and inference optimization section of the ROCm documentation.

For a list of other ready-made Docker images for ROCm, see the ROCm Docker image support matrix.

Licensing information ⚠️

Your use of this application is subject to the terms of the applicable component-level license identified below. To the extent any subcomponent in this container requires an offer for corresponding source code, AMD hereby makes such an offer for corresponding source code form, which will be made available upon request. By accessing and using this application, you are agreeing to fully comply with the terms of this license. If you do not agree to the terms of this license, do not access or use this application.

The application is provided in a container image format that includes the following separate and independent components:

Package	License	URL
Ubuntu	Creative Commons CC-BY-SA Version 3.0 UK License	Ubuntu Legal
ROCm	Custom/MIT/Apache V2.0/UIUC OSL	ROCm Licensing Terms
PyTorch	Modified BSD	PyTorch License

Disclaimer

The information contained herein is for informational purposes only and is subject to change without notice. In addition, any stated support is planned and is also subject to change. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD's products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale.

Notices and attribution

Docker and the Docker logo are trademarks or registered trademarks of Docker, Inc. in the United States and/or other countries. Docker, Inc. and other parties may also have trademark rights in other terms used herein. Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.    

All other trademarks and copyrights are property of their respective owners and are only mentioned for informative purposes.   

Support

You can report bugs through our GitHub issue tracker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference performance validation with PyTorch on the AMD Instinct MI300X accelerator

Overview 🎉

Reproducing benchmark results 🚀

NUMA balancing setting

Download the Docker image 🐳

MAD-integrated benchmarking

Available models

Enable Tunable Operator ⚙️

References 🔎

Licensing information ⚠️

Disclaimer

Notices and attribution

Support

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Inference performance validation with PyTorch on the AMD Instinct MI300X accelerator

Overview 🎉

Reproducing benchmark results 🚀

NUMA balancing setting

Download the Docker image 🐳

MAD-integrated benchmarking

Available models

Enable Tunable Operator ⚙️

References 🔎

Licensing information ⚠️

Disclaimer

Notices and attribution

Support