Skip to content

Commit 4d951ff

Browse files
authored
Adding OpenVINO Exporter (#2)
feat: Integrate OpenVINO Exporter and Benchmarking - Added OVExporter to convert ONNX models to OpenVINO's Intermediate Representation (IR) format. - Introduced OVBenchmark to measure the performance of OpenVINO models. - Refactored `main.py` for better organization and flow across PyTorch, ONNX, TensorRT, and OpenVINO modes. - Enhanced benchmark result visualization with a comparative horizontal bar plot. - Updated README with detailed flow explanations for each mode. - Addressed various bugs and improved error handling. This integration provides an additional benchmarking mode, enabling comprehensive performance comparisons.
1 parent 3aceebb commit 4d951ff

8 files changed

Lines changed: 420 additions & 73 deletions

File tree

Dockerfile

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,21 @@ FROM nvcr.io/nvidia/tensorrt:23.08-py3
44
# Install system packages
55
RUN apt-get update && apt-get install -y \
66
python3-pip \
7-
git
7+
git \
8+
libjpeg-dev \
9+
libpng-dev
10+
11+
# Copy the requirements.txt file into the container
12+
COPY requirements.txt /workspace/requirements.txt
13+
14+
# Install Python packages
15+
RUN pip3 install --no-cache-dir -r /workspace/requirements.txt
16+
17+
# Install torch-tensorrt from the special location
18+
RUN pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases
819

920
# Set the working directory
1021
WORKDIR /workspace
1122

1223
# Copy local project files to /workspace in the image
13-
COPY . /workspace
14-
15-
# Install Python packages
16-
RUN pip3 install --no-cache-dir -r /workspace/requirements.txt
24+
COPY . /workspace

README.md

Lines changed: 137 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,31 @@
1-
# ResNet-50 Inference with ONNX/TensorRT
1+
2+
<img src="./inference/logo.png" width="60%">
3+
24
## Table of Contents
35
1. [Overview](#overview)
46
2. [Requirements](#requirements)
5-
3. [Steps to Run](#steps-to-run)
6-
4. [Example Command](#example-command)
7-
5. [Inference Benchmark Results](#inference-benchmark-results)
8-
- [Example of Results](#example-of-results)
9-
- [Explanation of Results](#explanation-of-results)
10-
6. [ONNX Exporter](#onnx-exporter) ![New](https://img.shields.io/badge/-New-red)
11-
7. [Author](#author)
12-
8. [References](#references)
7+
- [Steps to Run](#steps-to-run)
8+
- [Example Command](#example-command)
9+
5. [RESULTS](#results) ![Static Badge](https://img.shields.io/badge/update-yellow)
10+
- [Results explanation](#results-explanation)
11+
- [Example Input](#example-input)
12+
6. [Benchmark Implementation Details](#benchmark-implementation-details) ![New](https://img.shields.io/badge/-New-red)
13+
- [PyTorch CPU & CUDA](#pytorch-cpu--cuda)
14+
- [TensorRT FP32 & FP16](#tensorrt-fp32--fp16)
15+
- [ONNX](#onnx)
16+
- [OpenVINO](#openvino)
17+
7. [Used methodologies](#used-methodologies) ![New](https://img.shields.io/badge/-New-red)
18+
- [TensorRT Optimization](#tensorrt-optimization)
19+
- [ONNX Exporter](#onnx-exporter)
20+
- [OV Exporter](#ov-exporter)
21+
10. [Author](#author)
22+
11. [References](#references)
23+
24+
25+
<img src="./inference/plot.png" width="70%">
1326

1427
## Overview
15-
This project demonstrates how to perform inference with a PyTorch model and optimize it using ONNX or NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torchvision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: CPU, CPU (ONNX), CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.
28+
This project demonstrates how to perform inference with a PyTorch model and optimize it using ONNX, OpenVINO, NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torchvision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: CPU, CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.
1629

1730
## Requirements
1831
- This repo cloned
@@ -21,7 +34,7 @@ This project demonstrates how to perform inference with a PyTorch model and opti
2134
- Python 3.x
2235
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide) (for running the Docker container with GPU support)
2336

24-
## Steps to Run
37+
### Steps to Run
2538

2639
```sh
2740
# 1. Build the Docker Image
@@ -37,46 +50,132 @@ python src/main.py
3750
### Arguments
3851
- `--image_path`: (Optional) Specifies the path to the image you want to predict.
3952
- `--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
40-
- `--onnx`: (Optional) Specifies if we want export ResNet50 model to ONNX and run benchmark only for this model
53+
- `--mode`: Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`.
4154

42-
## Example Command
55+
### Example Command
4356
```sh
44-
python src/main.py --image_path ./inference/cat3.jpg --topk 3 --onnx
57+
python src/main.py --topk 3 --mode=all
4558
```
4659

47-
This command will run predictions on the image at the specified path and show the top 3 predictions using both PyTorch and ONNX Runtime models. For the default 5 top predictions, omit the --topk argument or set it to 5.
60+
This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions and run all models (PyTorch CPU, CUDA, ONNX, OV, TRT-FP16, TRT-FP32). At the end results plot will be saved to `./inference/plot.png`
4861

49-
## Inference Benchmark Results
62+
## RESULTS
63+
### Inference Benchmark Results
64+
<img src="./inference/plot.png" width="70%">
5065

51-
The results of the predictions and benchmarks are saved to `model.log`. This log file contains information about the predicted class for the input image and the average batch time for the different configurations during the benchmark.
66+
### Results explanation
67+
- `PyTorch_cpu: 973.52 ms` indicate the average batch time when running `PyTorch` model on `CPU` device.
68+
- `PyTorch_cuda: 41.11 ms` indicate the average batch time when running `PyTorch` model on `CUDA` device.
69+
- `TRT_fp32: 19.10 ms` shows the average batch time when running the model with `TensorRT` using `float32` precision.
70+
- `TRT_fp16: 7.22 ms` indicate the average batch time when running the model with `TensorRT` using `float16` precision.
71+
- `ONNX: 15.38 ms` indicate the average batch inference time when running the `PyTorch` converted to the `ONNX` model on the `CPU` device.
72+
- `OpenVINO: 14.04 ms` indicate the average batch inference time when running the `ONNX` model converted to `OpenVINO` on the `CPU` device.
5273

53-
### Example of Results
54-
Here is an example of the contents of `model.log` after running predictions and benchmarks on this image:
74+
### Example Input
75+
Here is an example of the input image to run predictions and benchmarks on:
5576

5677
<img src="./inference/cat3.jpg" width="20%">
5778

79+
## Benchmark Implementation Details
80+
Here you can see flow for each model and benchmark.
81+
82+
### PyTorch CPU & CUDA
83+
In the provided code, we perform inference using the native PyTorch framework on both CPU and GPU (CUDA) configurations. This serves as a baseline to compare the performance improvements gained from other optimization techniques.
84+
85+
#### Flow:
86+
1. The ResNet-50 model is loaded from torchvision and, if available, transferred to the GPU.
87+
2. Inference is performed on the provided image using the specified model.
88+
3. Benchmark results, including average inference time, are logged for both the CPU and CUDA setups.
89+
90+
### TensorRT FP32 & FP16
91+
TensorRT offers significant performance improvements by optimizing the neural network model. In this code, we utilize TensorRT's capabilities to run benchmarks in both FP32 (single precision) and FP16 (half precision) modes.
92+
93+
#### Flow:
94+
1. Load the ResNet-50 model.
95+
2. Convert the PyTorch model to TensorRT format with the specified precision.
96+
3. Perform inference on the provided image.
97+
4. Log the benchmark results for the specified TensorRT precision mode.
98+
99+
### ONNX
100+
The code includes an exporter that converts the PyTorch ResNet-50 model to ONNX format, allowing it to be inferred using ONNX Runtime. This provides a flexible, cross-platform solution for deploying the model.
101+
102+
#### Flow:
103+
1. The ResNet-50 model is loaded.
104+
2. Using the ONNX exporter utility, the PyTorch model is converted to ONNX format.
105+
3. ONNX Runtime session is created.
106+
4. Inference is performed on the provided image using the ONNX model.
107+
5. Benchmark results are logged for the ONNX model.
108+
109+
### OpenVINO
110+
OpenVINO is a toolkit from Intel that optimizes deep learning model inference for Intel CPUs, GPUs, and other hardware. In the code, we convert the ONNX model to OpenVINO's format and then run benchmarks using the OpenVINO runtime.
111+
112+
#### Flow:
113+
1. The ONNX model (created in the previous step) is loaded.
114+
2. Convert the ONNX model to OpenVINO's IR format.
115+
3. Create an inference engine using OpenVINO's runtime.
116+
4. Perform inference on the provided image using the OpenVINO model.
117+
5. Benchmark results, including average inference time, are logged for the OpenVINO model.
118+
119+
## Used methodologies
120+
### TensorRT Optimization
121+
TensorRT is a high-performance deep learning inference optimizer and runtime library developed by NVIDIA. It is designed for optimizing and deploying trained neural network models on production environments. This project supports TensorRT optimizations in both FP32 (single precision) and FP16 (half precision) modes, offering different trade-offs between inference speed and model accuracy.
122+
123+
#### Features
124+
- **Performance Boost**: TensorRT can significantly accelerate the inference of neural network models, making it suitable for deployment in resource-constrained environments.
125+
- **Precision Modes**: Supports FP32 for maximum accuracy and FP16 for faster performance with a minor trade-off in accuracy.
126+
- **Layer Fusion**: TensorRT fuses layers and tensors in the neural network to reduce memory access overhead and improve execution speed.
127+
- **Dynamic Tensor Memory**: Efficiently handles varying batch sizes without re-optimization.
128+
129+
#### Usage
130+
To employ TensorRT optimizations in the project, use the `--mode all` argument when running the main script.
131+
This will initiate all models including PyTorch models that will be compiled to TRT model with `FP16` and `FP32` precision modes. Then, in one of the steps, will run inference on the specified image using the TensorRT-optimized model.
132+
Example:
133+
```sh
134+
python src/main.py --mode all
58135
```
59-
My prediction: %33 tabby
60-
My prediction: %26 Egyptian cat
61-
Running Benchmark for CPU
62-
Average batch time: 942.47 ms
63-
Average ONNX inference time: 15.59 ms
64-
Running Benchmark for CUDA
65-
Average batch time: 41.02 ms
66-
Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32
67-
Average batch time: 19.20 ms
68-
Compiling and Running Inference Benchmark for TensorRT with precision: torch.float16
69-
Average batch time: 7.25 ms
136+
#### Requirements
137+
Ensure you have the TensorRT library and the torch_tensorrt package installed in your environment. Also, for FP16 optimizations, it's recommended to have a GPU that supports half-precision arithmetic (like NVIDIA GPUs with Tensor Cores).
138+
139+
### ONNX Exporter
140+
ONNX Model Exporter (`ONNXExporter`) utility is incorporated within this project to enable the conversion of the native PyTorch model into the ONNX format.
141+
Using the ONNX format, inference and benchmarking can be performed with the ONNX Runtime, which offers platform-agnostic optimizations and is widely supported across numerous platforms and devices.
142+
143+
#### Features
144+
- **Standardized Format**: ONNX provides an open-source format for AI models. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types.
145+
- **Interoperability**: Models in ONNX format can be used across a variety of frameworks, tools, runtimes, and compilers.
146+
- **Optimizations**: The ONNX Runtime provides performance optimizations for both cloud and edge devices.
147+
148+
#### Usage
149+
To leverage the `ONNXExporter` and conduct inference using the ONNX Runtime, utilize the `--mode onnx` argument when executing the main script.
150+
This will initiate the conversion process and then run inference on the specified image using the ONNX model.
151+
Example:
152+
```sh
153+
python src/main.py --mode onnx
70154
```
71155

72-
### Explanation of Results
73-
- First k lines show the topk predictions. For example, `My prediction: %33 tabby` displays the highest confidence prediction made by the model for the input image, confidence level (`%33`), and the predicted class (`tabby`).
74-
- The following lines provide information about the average batch time for running the model in different configurations:
75-
- `Running Benchmark for CPU` and `Average batch time: 942.47 ms` indicate the average batch time when running the model on the CPU.
76-
- `Average ONNX inference time: 15.59 ms` indicate the average batch time when running the ONNX model on the CPU.
77-
- `Running Benchmark for CUDA` and `Average batch time: 41.02 ms` indicate the average batch time when running the model on CUDA.
78-
- `Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32` and `Average batch time: 19.20 ms` show the average batch time when running the model with TensorRT using `float32` precision.
79-
- `Compiling and Running Inference Benchmark for TensorRT with precision: torch.float16` and `Average batch time: 7.25 ms` indicate the average batch time when running the model with TensorRT using `float16` precision.
156+
#### Requirements
157+
Ensure the ONNX library is installed in your environment to use the ONNXExporter. Additionally, if you want to run inference using the ONNX model, make sure you have the ONNX Runtime installed.
158+
159+
### OV Exporter
160+
OpenVINO Model Exporter utility (`OVExporter`) has been integrated into this project to facilitate the conversion of the ONNX model to the OpenVINO format.
161+
This enables inference and benchmarking using OpenVINO, a framework optimized for Intel hardware, providing substantial speed improvements especially on CPUs.
162+
163+
#### Features
164+
- **Model Optimization**: Converts the ONNX model to OpenVINO's Intermediate Representation (IR) format. This optimized format allows for faster inference times on Intel hardware.
165+
- **Versatility**: OpenVINO can target a variety of Intel hardware devices such as CPUs, integrated GPUs, FPGAs, and VPUs.
166+
- **Ease of Use**: The `OVExporter` provides a seamless transition from ONNX to OpenVINO, abstracting the conversion details and providing a straightforward interface.
167+
168+
#### Usage
169+
To utilize `OVExporter` and perform inference using OpenVINO, use the `--mode ov` argument when running the main script.
170+
This will trigger the conversion process and subsequently run inference on the provided image using the optimized OpenVINO model.
171+
Example:
172+
```sh
173+
python src/main.py --mode ov
174+
```
175+
176+
#### Requirements
177+
Ensure you have the OpenVINO Toolkit installed and the necessary dependencies set up to use OpenVINO's model optimizer and inference engine.
178+
80179

81180
## ONNX Exporter
82181
The ONNX Exporter utility is integrated into this project to allow the conversion of the PyTorch model to ONNX format, enabling inference and benchmarking using ONNX Runtime. The ONNX model can provide hardware-agnostic optimizations and is widely supported across various platforms and devices.

inference/logo.png

40.8 KB
Loading

inference/plot.png

32 KB
Loading

requirements.txt

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
torch
22
torchvision
3-
torch-tensorrt
43
pandas
54
Pillow
65
numpy
76
packaging
87
onnx
9-
onnxruntime
8+
onnxruntime
9+
openvino==2023.1.0.dev20230811
10+
seaborn
11+
matplotlib

src/benchmark.py

Lines changed: 67 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import torch.backends.cudnn as cudnn
88
import logging
99
import onnxruntime as ort
10+
import openvino as ov
1011

1112
# Configure logging
1213
logging.basicConfig(filename="model.log", level=logging.INFO)
@@ -22,7 +23,7 @@ def __init__(self, nruns: int = 100, nwarmup: int = 50):
2223
self.nwarmup = nwarmup
2324

2425
@abstractmethod
25-
def run(self) -> None:
26+
def run(self):
2627
"""
2728
Abstract method to run the benchmark.
2829
"""
@@ -58,7 +59,7 @@ def __init__(
5859

5960
cudnn.benchmark = True # Enable cuDNN benchmarking optimization
6061

61-
def run(self) -> None:
62+
def run(self):
6263
"""
6364
Run the benchmark with the given model, input shape, and other parameters.
6465
Log the average batch time and print the input shape and output feature size.
@@ -93,6 +94,7 @@ def run(self) -> None:
9394
print(f"Input shape: {input_data.size()}")
9495
print(f"Output features size: {features.size()}")
9596
logging.info(f"Average batch time: {np.mean(timings) * 1000:.2f} ms")
97+
return np.mean(timings) * 1000
9698

9799

98100
class ONNXBenchmark(Benchmark):
@@ -113,7 +115,8 @@ def __init__(
113115
self.nwarmup = nwarmup
114116
self.nruns = nruns
115117

116-
def run(self) -> None:
118+
119+
def run(self):
117120
print("Warming up ...")
118121
# Adjusting the batch size in the input shape to match the expected input size of the model.
119122
input_shape = (1,) + self.input_shape[1:]
@@ -133,3 +136,64 @@ def run(self) -> None:
133136

134137
avg_time = np.mean(timings) * 1000
135138
logging.info(f"Average ONNX inference time: {avg_time:.2f} ms")
139+
return avg_time
140+
141+
142+
class OVBenchmark(Benchmark):
143+
def __init__(
144+
self, model: ov.frontend.FrontEnd, input_shape: Tuple[int, int, int, int]
145+
):
146+
"""
147+
Initialize the OVBenchmark with the OpenVINO model and the input shape.
148+
149+
:param model: ov.frontend.FrontEnd
150+
The OpenVINO model.
151+
:param input_shape: Tuple[int, int, int, int]
152+
The shape of the model input.
153+
"""
154+
self.ov_model = model
155+
self.core = ov.Core()
156+
self.compiled_model = None
157+
self.input_shape = input_shape
158+
self.warmup_runs = 50
159+
self.num_runs = 100
160+
self.dummy_input = np.random.randn(*input_shape).astype(np.float32)
161+
162+
def warmup(self):
163+
"""
164+
Compile the OpenVINO model for optimal execution on available hardware.
165+
"""
166+
self.compiled_model = self.core.compile_model(self.ov_model, "AUTO")
167+
168+
def inference(self, input_data) -> dict:
169+
"""
170+
Perform inference on the input data using the compiled OpenVINO model.
171+
172+
:param input_data: np.ndarray
173+
The input data for the model.
174+
:return: dict
175+
The model's output as a dictionary.
176+
"""
177+
outputs = self.compiled_model(inputs={"input": input_data})
178+
return outputs
179+
180+
def run(self):
181+
"""
182+
Run the benchmark on the OpenVINO model. It first warms up by compiling the model and then measures
183+
the average inference time over a set number of runs.
184+
"""
185+
# Warm-up runs
186+
logging.info("Warming up ...")
187+
for _ in range(self.warmup_runs):
188+
self.warmup()
189+
190+
# Benchmarking
191+
total_time = 0
192+
for _ in range(self.num_runs):
193+
start_time = time.time()
194+
_ = self.inference(self.dummy_input)
195+
total_time += time.time() - start_time
196+
197+
avg_time = total_time / self.num_runs
198+
logging.info(f"Average inference time: {avg_time * 1000:.2f} ms")
199+
return avg_time * 1000

0 commit comments

Comments
 (0)