You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Enhanced Inference Benchmarking
- Handled batched input data for benchmarking.
- Added throughput measurement in benchmark results.
- Fixed TensorRT precision-specific issues.
- Updated visualization to include both time and throughput.
This project showcases inference with a PyTorch ResNet-50 model and its optimization using ONNX, OpenVINO, and NVIDIA TensorRT. The script infers a user-specified image and displays top-K predictions. Benchmarking covers configurations like PyTorch CPU, ONNX CPU, OpenVINO CPU, PyTorch CUDA, TensorRT-FP32, and TensorRT-FP16.
@@ -50,13 +43,13 @@ Refer to the [Steps to Run](#steps-to-run) section for Docker instructions.
50
43
1.**CPU Deployment**:
51
44
For systems without a GPU or CUDA support, simply use the default base image.
52
45
```bash
53
-
docker build -t my_image_cpu.
46
+
docker build -t cpu_img.
54
47
```
55
48
56
49
2.**GPU Deployment**:
57
50
If your system has GPU and CUDA support, you can use the TensorRT base image to leverage GPU acceleration.
-`--image_path`: (Optional) Specifies the path to the image you want to predict.
80
73
-`--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
81
-
-`--mode`: (Optional) Specifies the model's mode for exporting and running. Choices are: `onnx`, `ov`, `cuda`, and `all`. If not provided, it defaults to `all`.
74
+
-`--mode`: (Optional) Specifies the model's mode for exporting and running. Choices are: `onnx`, `ov`, `cpu`, `cuda`, `tensorrt`, and `all`. If not provided, it defaults to `all`.
This command will run predictions on the chosen image (`./inference/train.jpg`), show the top 3 predictions, and run all available models. Note: plot created only for `--mode=all` and results plotted and saved to `./inference/plot.png`
-`PyTorch_cpu: 978.71 ms` indicates the average batch time when running the `PyTorch` model on `CPU` device.
96
-
-`PyTorch_cuda: 30.11 ms` indicates the average batch time when running the `PyTorch` model on the `CUDA` device.
97
-
-`TRT_fp32: 19.20 ms` shows the average batch time when running the model with `TensorRT` using `float32` precision.
98
-
-`TRT_fp16: 7.32 ms` indicates the average batch time when running the model with `TensorRT` using `float16` precision.
99
-
-`ONNX: 15.95 ms` indicates the average batch inference time when running the `PyTorch` converted to the `ONNX` model on the `CPU` device.
100
-
-`OpenVINO: 13.37 ms` indicates the average batch inference time when running the `ONNX` model converted to `OpenVINO` on the `CPU` device.
104
+
-`PyTorch_cpu: 32.83 ms` indicates the average batch time when running the `PyTorch` model on `CPU` device.
105
+
-`PyTorch_cuda: 5.59 ms` indicates the average batch time when running the `PyTorch` model on the `CUDA` device.
106
+
-`TRT_fp32: 1.69 ms` shows the average batch time when running the model with `TensorRT` using `float32` precision.
107
+
-`TRT_fp16: 1.69 ms` indicates the average batch time when running the model with `TensorRT` using `float16` precision.
108
+
-`ONNX: 16.01 ms` indicates the average batch inference time when running the `PyTorch` converted to the `ONNX` model on the `CPU` device.
109
+
-`OpenVINO: 15.65 ms` indicates the average batch inference time when running the `ONNX` model converted to `OpenVINO` on the `CPU` device.
101
110
102
111
### Example Input
103
112
Here is an example of the input image to run predictions and benchmarks on:
@@ -158,33 +167,6 @@ OpenVINO is a toolkit from Intel that optimizes deep learning model inference fo
158
167
4. Perform inference on the provided image using the OpenVINO model.
159
168
5. Benchmark results, including average inference time, are logged for the OpenVINO model.
0 commit comments