You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Integrate OpenVINO Exporter and Benchmarking
- Added OVExporter to convert ONNX models to OpenVINO's Intermediate Representation (IR) format.
- Introduced OVBenchmark to measure the performance of OpenVINO models.
- Refactored `main.py` for better organization and flow across PyTorch, ONNX, TensorRT, and OpenVINO modes.
- Enhanced benchmark result visualization with a comparative horizontal bar plot.
- Updated README with detailed flow explanations for each mode.
- Addressed various bugs and improved error handling.
This integration provides an additional benchmarking mode, enabling comprehensive performance comparisons.
This project demonstrates how to perform inference with a PyTorch model and optimize it using ONNX or NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torchvision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: CPU, CPU (ONNX), CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.
28
+
This project demonstrates how to perform inference with a PyTorch model and optimize it using ONNX, OpenVINO, NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torchvision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: CPU, CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.
16
29
17
30
## Requirements
18
31
- This repo cloned
@@ -21,7 +34,7 @@ This project demonstrates how to perform inference with a PyTorch model and opti
21
34
- Python 3.x
22
35
-[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide) (for running the Docker container with GPU support)
23
36
24
-
## Steps to Run
37
+
###Steps to Run
25
38
26
39
```sh
27
40
# 1. Build the Docker Image
@@ -37,46 +50,132 @@ python src/main.py
37
50
### Arguments
38
51
-`--image_path`: (Optional) Specifies the path to the image you want to predict.
39
52
-`--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
40
-
-`--onnx`: (Optional) Specifies if we want export ResNet50 model to ONNX and run benchmark only for this model
53
+
-`--mode`: Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`.
This command will run predictions on the image at the specified path and show the top 3 predictions using both PyTorch and ONNX Runtime models. For the default 5 top predictions, omit the --topk argument or set it to 5.
60
+
This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions and run all models (PyTorch CPU, CUDA, ONNX, OV, TRT-FP16, TRT-FP32). At the end results plot will be saved to `./inference/plot.png`
48
61
49
-
## Inference Benchmark Results
62
+
## RESULTS
63
+
### Inference Benchmark Results
64
+
<imgsrc="./inference/plot.png"width="70%">
50
65
51
-
The results of the predictions and benchmarks are saved to `model.log`. This log file contains information about the predicted class for the input image and the average batch time for the different configurations during the benchmark.
66
+
### Results explanation
67
+
-`PyTorch_cpu: 973.52 ms` indicate the average batch time when running `PyTorch` model on `CPU` device.
68
+
-`PyTorch_cuda: 41.11 ms` indicate the average batch time when running `PyTorch` model on `CUDA` device.
69
+
-`TRT_fp32: 19.10 ms` shows the average batch time when running the model with `TensorRT` using `float32` precision.
70
+
-`TRT_fp16: 7.22 ms` indicate the average batch time when running the model with `TensorRT` using `float16` precision.
71
+
-`ONNX: 15.38 ms` indicate the average batch inference time when running the `PyTorch` converted to the `ONNX` model on the `CPU` device.
72
+
-`OpenVINO: 14.04 ms` indicate the average batch inference time when running the `ONNX` model converted to `OpenVINO` on the `CPU` device.
52
73
53
-
### Example of Results
54
-
Here is an example of the contents of `model.log` after running predictions and benchmarks on this image:
74
+
### Example Input
75
+
Here is an example of the input image to run predictions and benchmarks on:
55
76
56
77
<imgsrc="./inference/cat3.jpg"width="20%">
57
78
79
+
## Benchmark Implementation Details
80
+
Here you can see flow for each model and benchmark.
81
+
82
+
### PyTorch CPU & CUDA
83
+
In the provided code, we perform inference using the native PyTorch framework on both CPU and GPU (CUDA) configurations. This serves as a baseline to compare the performance improvements gained from other optimization techniques.
84
+
85
+
#### Flow:
86
+
1. The ResNet-50 model is loaded from torchvision and, if available, transferred to the GPU.
87
+
2. Inference is performed on the provided image using the specified model.
88
+
3. Benchmark results, including average inference time, are logged for both the CPU and CUDA setups.
89
+
90
+
### TensorRT FP32 & FP16
91
+
TensorRT offers significant performance improvements by optimizing the neural network model. In this code, we utilize TensorRT's capabilities to run benchmarks in both FP32 (single precision) and FP16 (half precision) modes.
92
+
93
+
#### Flow:
94
+
1. Load the ResNet-50 model.
95
+
2. Convert the PyTorch model to TensorRT format with the specified precision.
96
+
3. Perform inference on the provided image.
97
+
4. Log the benchmark results for the specified TensorRT precision mode.
98
+
99
+
### ONNX
100
+
The code includes an exporter that converts the PyTorch ResNet-50 model to ONNX format, allowing it to be inferred using ONNX Runtime. This provides a flexible, cross-platform solution for deploying the model.
101
+
102
+
#### Flow:
103
+
1. The ResNet-50 model is loaded.
104
+
2. Using the ONNX exporter utility, the PyTorch model is converted to ONNX format.
105
+
3. ONNX Runtime session is created.
106
+
4. Inference is performed on the provided image using the ONNX model.
107
+
5. Benchmark results are logged for the ONNX model.
108
+
109
+
### OpenVINO
110
+
OpenVINO is a toolkit from Intel that optimizes deep learning model inference for Intel CPUs, GPUs, and other hardware. In the code, we convert the ONNX model to OpenVINO's format and then run benchmarks using the OpenVINO runtime.
111
+
112
+
#### Flow:
113
+
1. The ONNX model (created in the previous step) is loaded.
114
+
2. Convert the ONNX model to OpenVINO's IR format.
115
+
3. Create an inference engine using OpenVINO's runtime.
116
+
4. Perform inference on the provided image using the OpenVINO model.
117
+
5. Benchmark results, including average inference time, are logged for the OpenVINO model.
118
+
119
+
## Used methodologies
120
+
### TensorRT Optimization
121
+
TensorRT is a high-performance deep learning inference optimizer and runtime library developed by NVIDIA. It is designed for optimizing and deploying trained neural network models on production environments. This project supports TensorRT optimizations in both FP32 (single precision) and FP16 (half precision) modes, offering different trade-offs between inference speed and model accuracy.
122
+
123
+
#### Features
124
+
-**Performance Boost**: TensorRT can significantly accelerate the inference of neural network models, making it suitable for deployment in resource-constrained environments.
125
+
-**Precision Modes**: Supports FP32 for maximum accuracy and FP16 for faster performance with a minor trade-off in accuracy.
126
+
-**Layer Fusion**: TensorRT fuses layers and tensors in the neural network to reduce memory access overhead and improve execution speed.
127
+
-**Dynamic Tensor Memory**: Efficiently handles varying batch sizes without re-optimization.
128
+
129
+
#### Usage
130
+
To employ TensorRT optimizations in the project, use the `--mode all` argument when running the main script.
131
+
This will initiate all models including PyTorch models that will be compiled to TRT model with `FP16` and `FP32` precision modes. Then, in one of the steps, will run inference on the specified image using the TensorRT-optimized model.
132
+
Example:
133
+
```sh
134
+
python src/main.py --mode all
58
135
```
59
-
My prediction: %33 tabby
60
-
My prediction: %26 Egyptian cat
61
-
Running Benchmark for CPU
62
-
Average batch time: 942.47 ms
63
-
Average ONNX inference time: 15.59 ms
64
-
Running Benchmark for CUDA
65
-
Average batch time: 41.02 ms
66
-
Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32
67
-
Average batch time: 19.20 ms
68
-
Compiling and Running Inference Benchmark for TensorRT with precision: torch.float16
69
-
Average batch time: 7.25 ms
136
+
#### Requirements
137
+
Ensure you have the TensorRT library and the torch_tensorrt package installed in your environment. Also, for FP16 optimizations, it's recommended to have a GPU that supports half-precision arithmetic (like NVIDIA GPUs with Tensor Cores).
138
+
139
+
### ONNX Exporter
140
+
ONNX Model Exporter (`ONNXExporter`) utility is incorporated within this project to enable the conversion of the native PyTorch model into the ONNX format.
141
+
Using the ONNX format, inference and benchmarking can be performed with the ONNX Runtime, which offers platform-agnostic optimizations and is widely supported across numerous platforms and devices.
142
+
143
+
#### Features
144
+
-**Standardized Format**: ONNX provides an open-source format for AI models. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types.
145
+
-**Interoperability**: Models in ONNX format can be used across a variety of frameworks, tools, runtimes, and compilers.
146
+
-**Optimizations**: The ONNX Runtime provides performance optimizations for both cloud and edge devices.
147
+
148
+
#### Usage
149
+
To leverage the `ONNXExporter` and conduct inference using the ONNX Runtime, utilize the `--mode onnx` argument when executing the main script.
150
+
This will initiate the conversion process and then run inference on the specified image using the ONNX model.
151
+
Example:
152
+
```sh
153
+
python src/main.py --mode onnx
70
154
```
71
155
72
-
### Explanation of Results
73
-
- First k lines show the topk predictions. For example, `My prediction: %33 tabby` displays the highest confidence prediction made by the model for the input image, confidence level (`%33`), and the predicted class (`tabby`).
74
-
- The following lines provide information about the average batch time for running the model in different configurations:
75
-
-`Running Benchmark for CPU` and `Average batch time: 942.47 ms` indicate the average batch time when running the model on the CPU.
76
-
-`Average ONNX inference time: 15.59 ms` indicate the average batch time when running the ONNX model on the CPU.
77
-
-`Running Benchmark for CUDA` and `Average batch time: 41.02 ms` indicate the average batch time when running the model on CUDA.
78
-
-`Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32` and `Average batch time: 19.20 ms` show the average batch time when running the model with TensorRT using `float32` precision.
79
-
-`Compiling and Running Inference Benchmark for TensorRT with precision: torch.float16` and `Average batch time: 7.25 ms` indicate the average batch time when running the model with TensorRT using `float16` precision.
156
+
#### Requirements
157
+
Ensure the ONNX library is installed in your environment to use the ONNXExporter. Additionally, if you want to run inference using the ONNX model, make sure you have the ONNX Runtime installed.
158
+
159
+
### OV Exporter
160
+
OpenVINO Model Exporter utility (`OVExporter`) has been integrated into this project to facilitate the conversion of the ONNX model to the OpenVINO format.
161
+
This enables inference and benchmarking using OpenVINO, a framework optimized for Intel hardware, providing substantial speed improvements especially on CPUs.
162
+
163
+
#### Features
164
+
-**Model Optimization**: Converts the ONNX model to OpenVINO's Intermediate Representation (IR) format. This optimized format allows for faster inference times on Intel hardware.
165
+
-**Versatility**: OpenVINO can target a variety of Intel hardware devices such as CPUs, integrated GPUs, FPGAs, and VPUs.
166
+
-**Ease of Use**: The `OVExporter` provides a seamless transition from ONNX to OpenVINO, abstracting the conversion details and providing a straightforward interface.
167
+
168
+
#### Usage
169
+
To utilize `OVExporter` and perform inference using OpenVINO, use the `--mode ov` argument when running the main script.
170
+
This will trigger the conversion process and subsequently run inference on the provided image using the optimized OpenVINO model.
171
+
Example:
172
+
```sh
173
+
python src/main.py --mode ov
174
+
```
175
+
176
+
#### Requirements
177
+
Ensure you have the OpenVINO Toolkit installed and the necessary dependencies set up to use OpenVINO's model optimizer and inference engine.
178
+
80
179
81
180
## ONNX Exporter
82
181
The ONNX Exporter utility is integrated into this project to allow the conversion of the PyTorch model to ONNX format, enabling inference and benchmarking using ONNX Runtime. The ONNX model can provide hardware-agnostic optimizations and is widely supported across various platforms and devices.
0 commit comments