Skip to content

Commit 58813ba

Browse files
authored
Dev (#1)
* Added ONNX support: Integrated ONNX exporter and ONNX Runtime for model conversion, inference, and benchmarking. Updated Dockerfile, requirements, and documentation to reflect the enhancements.
1 parent 3d1c226 commit 58813ba

10 files changed

Lines changed: 215 additions & 84 deletions

File tree

Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ RUN apt-get update && apt-get install -y \
66
python3-pip \
77
git
88

9-
# Install Python packages
10-
RUN pip3 install torch torchvision torch-tensorrt pandas Pillow numpy packaging onnx
11-
ø
129
# Set the working directory
1310
WORKDIR /workspace
1411

1512
# Copy local project files to /workspace in the image
1613
COPY . /workspace
14+
15+
# Install Python packages
16+
RUN pip3 install --no-cache-dir -r /workspace/requirements.txt

README.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@
77
5. [Inference Benchmark Results](#inference-benchmark-results)
88
- [Example of Results](#example-of-results)
99
- [Explanation of Results](#explanation-of-results)
10-
6. [Author](#author)
11-
7. [References](#references)
12-
8. [Notes](#notes)
10+
6. [ONNX Exporter](#onnx-exporter)
11+
7. [Author](#author)
12+
8. [References](#references)
1313

1414
## Overview
1515
This project demonstrates how to perform inference with a PyTorch model and optimize it using NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torchvision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: CPU, CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.
@@ -30,19 +30,20 @@ docker build -t awesome-tesnorrt .
3030
docker run --gpus all --rm -it awesome-tesnorrt
3131

3232
# 3. Run the Script inside the Container
33-
python src/main.py --image_path /path-to-image/image.jpg --topk 2
33+
python src/main.py
3434
```
3535

3636
### Arguments
37-
- `--image_path`: Specifies the path to the image you want to predict.
37+
- `--image_path`: (Optional) Specifies the path to the image you want to predict.
3838
- `--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
39+
- `--onnx`: (Optional) Specifies if we want export ResNet50 model to ONNX and run benchmark only for this model
3940

4041
## Example Command
4142
```sh
42-
python src/main.py --image_path ./inference/cat3.jpg --topk 3 --show_image
43+
python src/main.py --image_path ./inference/cat3.jpg --topk 3 --onnx
4344
```
4445

45-
This command will run predictions on the image at the specified path, show the top 3 predictions, and display the image. If you do not want to display the image, omit the `--show_image` flag. For the default 5 top predictions, omit the `--topk` argument or set it to 5.
46+
This command will run predictions on the image at the specified path and show the top 3 predictions using both PyTorch and ONNX Runtime models. For the default 5 top predictions, omit the --topk argument or set it to 5.
4647

4748
## Inference Benchmark Results
4849

@@ -58,6 +59,7 @@ My prediction: %33 tabby
5859
My prediction: %26 Egyptian cat
5960
Running Benchmark for CPU
6061
Average batch time: 942.47 ms
62+
Average ONNX inference time: 15.59 ms
6163
Running Benchmark for CUDA
6264
Average batch time: 41.02 ms
6365
Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32
@@ -70,16 +72,16 @@ Average batch time: 7.25 ms
7072
- First k lines show the topk predictions. For example, `My prediction: %33 tabby` displays the highest confidence prediction made by the model for the input image, confidence level (`%33`), and the predicted class (`tabby`).
7173
- The following lines provide information about the average batch time for running the model in different configurations:
7274
- `Running Benchmark for CPU` and `Average batch time: 942.47 ms` indicate the average batch time when running the model on the CPU.
75+
- `Average ONNX inference time: 15.59 ms` indicate the average batch time when running the ONNX model on the CPU.
7376
- `Running Benchmark for CUDA` and `Average batch time: 41.02 ms` indicate the average batch time when running the model on CUDA.
7477
- `Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32` and `Average batch time: 19.20 ms` show the average batch time when running the model with TensorRT using `float32` precision.
7578
- `Compiling and Running Inference Benchmark for TensorRT with precision: torch.float16` and `Average batch time: 7.25 ms` indicate the average batch time when running the model with TensorRT using `float16` precision.
7679

80+
## ONNX Exporter
81+
The ONNX Exporter utility is integrated into this project to allow the conversion of the PyTorch model to ONNX format, enabling inference and benchmarking using ONNX Runtime. The ONNX model can provide hardware-agnostic optimizations and is widely supported across various platforms and devices.
82+
7783
## Author
7884
[DimaBir](https://github.com/DimaBir)
7985

8086
## References
8187
- [ResNetTensorRT Project](https://github.com/DimaBir/ResNetTensorRT/tree/main)
82-
83-
## Notes
84-
- The project uses a Docker container built on top of the NVIDIA TensorRT image to ensure that all dependencies, including CUDA and TensorRT, are correctly installed and configured.
85-
- Please ensure you have the NVIDIA Container Toolkit installed to run the container with GPU support.

inference/cat3.jpg

4.21 MB
Loading

inference/fan.jpg

2.74 MB
Loading

inference/image-2.jpg

-29.8 KB
Binary file not shown.

inference/vase.jpg

2.66 MB
Loading

requirements.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
torch
2+
torchvision
3+
torch-tensorrt
4+
pandas
5+
Pillow
6+
numpy
7+
packaging
8+
onnx
9+
onnxruntime

src/benchmark.py

Lines changed: 60 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,35 @@
11
import time
22
from typing import Tuple
33

4+
from abc import ABC, abstractmethod
45
import numpy as np
56
import torch
67
import torch.backends.cudnn as cudnn
78
import logging
9+
import onnxruntime as ort
810

911
# Configure logging
1012
logging.basicConfig(filename="model.log", level=logging.INFO)
1113

1214

13-
class Benchmark:
15+
class Benchmark(ABC):
16+
"""
17+
Abstract class representing a benchmark.
18+
"""
19+
20+
def __init__(self, nruns: int = 100, nwarmup: int = 50):
21+
self.nruns = nruns
22+
self.nwarmup = nwarmup
23+
24+
@abstractmethod
25+
def run(self) -> None:
26+
"""
27+
Abstract method to run the benchmark.
28+
"""
29+
pass
30+
31+
32+
class PyTorchBenchmark:
1433
def __init__(
1534
self,
1635
model: torch.nn.Module,
@@ -74,3 +93,43 @@ def run(self) -> None:
7493
print(f"Input shape: {input_data.size()}")
7594
print(f"Output features size: {features.size()}")
7695
logging.info(f"Average batch time: {np.mean(timings) * 1000:.2f} ms")
96+
97+
98+
class ONNXBenchmark(Benchmark):
99+
"""
100+
A class used to benchmark the performance of an ONNX model.
101+
"""
102+
103+
def __init__(
104+
self,
105+
ort_session: ort.InferenceSession,
106+
input_shape: tuple,
107+
nruns: int = 100,
108+
nwarmup: int = 50,
109+
):
110+
super().__init__(nruns)
111+
self.ort_session = ort_session
112+
self.input_shape = input_shape
113+
self.nwarmup = nwarmup
114+
self.nruns = nruns
115+
116+
def run(self) -> None:
117+
print("Warming up ...")
118+
# Adjusting the batch size in the input shape to match the expected input size of the model.
119+
input_shape = (1,) + self.input_shape[1:]
120+
input_data = np.random.randn(*input_shape).astype(np.float32)
121+
122+
for _ in range(self.nwarmup): # Warm-up runs
123+
_ = self.ort_session.run(None, {"input": input_data})
124+
125+
print("Starting benchmark ...")
126+
timings = []
127+
128+
for _ in range(self.nruns):
129+
start_time = time.time()
130+
_ = self.ort_session.run(None, {"input": input_data})
131+
end_time = time.time()
132+
timings.append(end_time - start_time)
133+
134+
avg_time = np.mean(timings) * 1000
135+
logging.info(f"Average ONNX inference time: {avg_time:.2f} ms")

0 commit comments

Comments
 (0)