You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Added ONNX support: Integrated ONNX exporter and ONNX Runtime for model conversion, inference, and benchmarking. Updated Dockerfile, requirements, and documentation to reflect the enhancements.
-[Explanation of Results](#explanation-of-results)
10
-
6.[Author](#author)
11
-
7.[References](#references)
12
-
8.[Notes](#notes)
10
+
6.[ONNX Exporter](#onnx-exporter)
11
+
7.[Author](#author)
12
+
8.[References](#references)
13
13
14
14
## Overview
15
15
This project demonstrates how to perform inference with a PyTorch model and optimize it using NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torchvision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: CPU, CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.
This command will run predictions on the image at the specified path, show the top 3 predictions, and display the image. If you do not want to display the image, omit the `--show_image` flag. For the default 5 top predictions, omit the `--topk` argument or set it to 5.
46
+
This command will run predictions on the image at the specified path and show the top 3 predictions using both PyTorch and ONNX Runtime models. For the default 5 top predictions, omit the --topk argument or set it to 5.
46
47
47
48
## Inference Benchmark Results
48
49
@@ -58,6 +59,7 @@ My prediction: %33 tabby
58
59
My prediction: %26 Egyptian cat
59
60
Running Benchmark for CPU
60
61
Average batch time: 942.47 ms
62
+
Average ONNX inference time: 15.59 ms
61
63
Running Benchmark for CUDA
62
64
Average batch time: 41.02 ms
63
65
Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32
@@ -70,16 +72,16 @@ Average batch time: 7.25 ms
70
72
- First k lines show the topk predictions. For example, `My prediction: %33 tabby` displays the highest confidence prediction made by the model for the input image, confidence level (`%33`), and the predicted class (`tabby`).
71
73
- The following lines provide information about the average batch time for running the model in different configurations:
72
74
-`Running Benchmark for CPU` and `Average batch time: 942.47 ms` indicate the average batch time when running the model on the CPU.
75
+
-`Average ONNX inference time: 15.59 ms` indicate the average batch time when running the ONNX model on the CPU.
73
76
-`Running Benchmark for CUDA` and `Average batch time: 41.02 ms` indicate the average batch time when running the model on CUDA.
74
77
-`Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32` and `Average batch time: 19.20 ms` show the average batch time when running the model with TensorRT using `float32` precision.
75
78
-`Compiling and Running Inference Benchmark for TensorRT with precision: torch.float16` and `Average batch time: 7.25 ms` indicate the average batch time when running the model with TensorRT using `float16` precision.
76
79
80
+
## ONNX Exporter
81
+
The ONNX Exporter utility is integrated into this project to allow the conversion of the PyTorch model to ONNX format, enabling inference and benchmarking using ONNX Runtime. The ONNX model can provide hardware-agnostic optimizations and is widely supported across various platforms and devices.
- The project uses a Docker container built on top of the NVIDIA TensorRT image to ensure that all dependencies, including CUDA and TensorRT, are correctly installed and configured.
85
-
- Please ensure you have the NVIDIA Container Toolkit installed to run the container with GPU support.
0 commit comments