Refactor Docker Deployment and Add Linux Server CPU Results (#5)

DimaBir · web-flow · commit 9366cc4c85d0 · 2023-10-11T22:33:16.000+03:00
Refactor Docker setup, update README, and add Linux Server CPU benchmarks.
diff --git a/Dockerfile b/Dockerfile
@@ -1,24 +1,27 @@
-# Use an official TensorRT base image
-FROM nvcr.io/nvidia/tensorrt:23.08-py3
+# Argument for base image. Default is a neutral Python image.
+ARG BASE_IMAGE=python:3.8-slim
 
-# Install system packages
-RUN apt-get update && apt-get install -y \
-    python3-pip \
-    git \
-    libjpeg-dev \
-    libpng-dev
+# Use the base image specified by the BASE_IMAGE argument
+FROM $BASE_IMAGE
 
-# Copy the requirements.txt file into the container
+# Argument to determine environment: cpu or gpu (default is cpu)
+ARG ENVIRONMENT=cpu
+
+# Install required system packages conditionally
+RUN apt-get update && apt-get install -y python3-pip git && \
+    if [ "$ENVIRONMENT" = "gpu" ] ; then apt-get install -y libjpeg-dev libpng-dev ; fi
+
+# Copy the requirements file based on the environment into the container
 COPY requirements.txt /workspace/requirements.txt
 
 # Install Python packages
 RUN pip3 install --no-cache-dir -r /workspace/requirements.txt
 
-# Install torch-tensorrt from the special location
-RUN pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases
+# Only install torch-tensorrt for GPU environment
+RUN if [ "$ENVIRONMENT" = "gpu" ] ; then pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases ; fi
 
 # Set the working directory
 WORKDIR /workspace
 
 # Copy local project files to /workspace in the image
-COPY . /workspace
+COPY . /workspace
diff --git a/README.md b/README.md
@@ -6,44 +6,73 @@
 2. [Requirements](#requirements)
     - [Steps to Run](#steps-to-run)
     - [Example Command](#example-command)
-3. [RESULTS](#results) ![Static Badge](https://img.shields.io/badge/update-orange)
+3. [GPU-CUDA Results](#gpu-cuda-results) ![Static Badge](https://img.shields.io/badge/update-orange)
     - [Results explanation](#results-explanation)
     - [Example Input](#example-input)
     - [Example prediction results](#example-prediction-results)
+    - [PC Setup](#pc-setup)
 4. [Benchmark Implementation Details](#benchmark-implementation-details) ![New](https://img.shields.io/badge/-New-842E5B)
     - [PyTorch CPU & CUDA](#pytorch-cpu--cuda)
     - [TensorRT FP32 & FP16](#tensorrt-fp32--fp16)
     - [ONNX](#onnx)
     - [OpenVINO](#openvino)
-5. [Benchmarking and Visualization](#benchmarking-and-visualization) ![New](https://img.shields.io/badge/-New-842E5B)
+5. [Extra](#extra) ![New](https://img.shields.io/badge/-New-842E5B)
+   - [Remote Linux Server - CPU only - Inference](#remote-linux-server-cpu-only-inference)
+   - [Prediction results](#prediction-results)
 6. [Author](#author)
-7. [PC Setup](#pc-setup)
-8. [References](#references)
+7. [References](#references)
 
 
 <img src="./inference/plot_latest.png" width="100%">
 
 ## Overview
-This project demonstrates how to perform inference with a PyTorch model and optimize it using ONNX, OpenVINO, and NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torch-vision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: PyTorch CPU, ONNX CPU, OpenVINO CPU, PyTorch CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.
+This project showcases inference with a PyTorch ResNet-50 model and its optimization using ONNX, OpenVINO, and NVIDIA TensorRT. The script infers a user-specified image and displays top-K predictions. Benchmarking covers configurations like PyTorch CPU, ONNX CPU, OpenVINO CPU, PyTorch CUDA, TensorRT-FP32, and TensorRT-FP16.
+
+The project is Dockerized for easy deployment:
+1. **CPU-only Deployment** - Suitable for non-GPU systems (supports `PyTorch CPU`, `ONNX CPU` and `OpenVINO CPU` models only).
+2. **GPU Deployment** - Optimized for NVIDIA GPUs (supports all models: `PyTorch CPU`, `ONNX CPU`, `OpenVINO CPU`, `PyTorch CUDA`, `TensorRT-FP32`, and `TensorRT-FP16`).
+
+For Docker instructions, refer to the [Steps to Run](#steps-to-run) section.
+
 
 ## Requirements
 - This repo cloned
 - Docker
 - NVIDIA GPU (for CUDA and TensorRT benchmarks and optimizations)
 - Python 3.x
-- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide) (for running the Docker container with GPU support)
-- ![New](https://img.shields.io/badge/-New-842E5B)[OpenVINO Toolkit](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html) (for running OpenVINO model)
-
-### Steps to Run
-
+- NVIDIA drivers installed on the host machine.
+- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide) (for running the Docker container with GPU support). Pre-installed withing the GPU docker image.
+
+## Steps to Run
+### Building the Docker Image
+
+Depending on target environment (CPU or GPU), choose a different base image.
+
+1. **CPU Deployment**:
+   For systems without a GPU or CUDA support, simply use the default base image.
+   ```bash
+   docker build -t my_image_cpu .
+   ```
+   
+2. **GPU Deployment**:
+   If your system has GPU support and you have NVIDIA Docker runtime installed, you can use the TensorRT base image to leverage GPU acceleration.
+   ```bash
+   docker build --build-arg ENVIRONMENT=gpu --build-arg BASE_IMAGE=nvcr.io/nvidia/tensorrt:23.08-py3 -t my_project_image_gpu .
+   ```
+
+### Running the Docker Container
+1. **CPU Version**:
+   ```bash
+   docker run -it --rm my_image_cpu
+   ```
+
+2. **GPU Version**:
+   ```bash
+   docker run --gpus all -it --rm my_image_gpu
+   ```
+
+### Run the Script inside the Container
 ```sh
-# 1. Build the Docker Image
-docker build -t awesome-tensorrt
-
-# 2. Run the Docker Container
-docker run --gpus all --rm -it awesome-tensorrt
-
-# 3. Run the Script inside the Container
 python main.py [--mode all]
 ```
 
@@ -59,7 +88,7 @@ python main.py --topk 3 --mode=all --image_path="./inference/train.jpg"
 
 This command will run predictions on the chosen image (`./inference/train.jpg`), show the top 3 predictions, and run all available models. Note: plot created only for `--mode=all` and results plotted and saved to `./inference/plot.png`
 
-## RESULTS
+## GPU-CUDA Results
 ### Inference Benchmark Results
 <img src="./inference/plot_latest.png" width="70%">
 
@@ -85,6 +114,11 @@ Here is an example of the input image to run predictions and benchmarks on:
 #5: 2% lynx
 ```
 
+### PC Setup 
+- CPU: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
+- RAM: 32 GB
+- GPU: GeForce RTX 3070
+
 ## Benchmark Implementation Details
 Here you can see the flow for each model and benchmark.
 
@@ -125,16 +159,32 @@ OpenVINO is a toolkit from Intel that optimizes deep learning model inference fo
 4. Perform inference on the provided image using the OpenVINO model.
 5. Benchmark results, including average inference time, are logged for the OpenVINO model.
 
-## Benchmarking and Visualization
-The results of the benchmarks for all modes are saved and visualized in a bar chart, showcasing the average inference times across different backends. The visualization aids in comparing the performance gains achieved with different optimizations.
+## Extra
+### Remote Linux Server - CPU only - Inference
+<img src="./inference/plot_linux_server.png" width="70%">
+
+### Prediction results
+`model.log` file content
+```
+Running prediction for OV model
+#1: 15% Egyptian cat
+#2: 14% tiger cat
+#3: 9% tabby
+#4: 2% doormat
+#5: 2% lynx
+
+
+Running prediction for ONNX model
+#1: 15% Egyptian cat
+#2: 14% tiger cat
+#3: 9% tabby
+#4: 2% doormat
+#5: 2% lynx
+```
+
 
 ## Author
 [DimaBir](https://github.com/DimaBir)
-
-## PC Setup 
-- CPU: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
-- RAM: 32 GB
-- GPU: GeForce RTX 3070
   
 ## References
 - **PyTorch**: [Official Documentation](https://pytorch.org/docs/stable/index.html)
diff --git a/benchmark/benchmark_models.py b/benchmark/benchmark_models.py
@@ -72,7 +72,9 @@ def run(self):
         with torch.no_grad():
             for _ in range(self.nwarmup):
                 features = self.model(input_data)
-        torch.cuda.synchronize()
+
+        if self.device == "cuda":
+            torch.cuda.synchronize()
 
         # Start timing
         print("Start timing ...")
@@ -81,7 +83,8 @@ def run(self):
             for i in range(1, self.nruns + 1):
                 start_time = time.time()
                 features = self.model(input_data)
-                torch.cuda.synchronize()
+                if self.device == "cuda":
+                    torch.cuda.synchronize()
                 end_time = time.time()
                 timings.append(end_time - start_time)
 
diff --git a/benchmark/benchmark_utils.py b/benchmark/benchmark_utils.py
@@ -6,7 +6,6 @@
 import seaborn as sns
 from typing import Dict, Any
 import torch
-import onnxruntime as ort
 
 from benchmark.benchmark_models import PyTorchBenchmark, ONNXBenchmark, OVBenchmark
 
@@ -43,6 +42,9 @@ def run_all_benchmarks(
         ("cuda", torch.float16, True),
     ]
     for device, precision, is_trt in configs:
+        if not torch.cuda.is_available() and device == "cuda":
+            continue
+
         model_to_use = models[f"PyTorch_{device}"].to(device)
 
         if not is_trt:
diff --git a/inference/plot_linux_server.png b/inference/plot_linux_server.png
diff --git a/main.py b/main.py
@@ -1,6 +1,14 @@
 import logging
 import os.path
-import torch_tensorrt
+import torch
+
+CUDA_AVAILABLE = False
+if torch.cuda.is_available():
+    try:
+        import torch_tensorrt
+        CUDA_AVAILABLE = True
+    except ImportError:
+        print("torch-tensorrt is not installed. Running on CPU mode only.")
 
 from benchmark.benchmark_models import benchmark_onnx_model, benchmark_ov_model
 from benchmark.benchmark_utils import run_all_benchmarks, plot_benchmark_results
@@ -79,6 +87,10 @@ def main():
             precision = config["precision"]
             is_trt = config["is_trt"]
 
+            # check if CUDA is available
+            if device.lower() == "cuda" and not CUDA_AVAILABLE:
+                continue
+
             model = init_cuda_model(model_loader, device, precision)
 
             # If the configuration is not for TensorRT, store the model under a PyTorch key
diff --git a/prediction/prediction_utils.py b/prediction/prediction_utils.py
@@ -4,8 +4,6 @@
 import torch
 import onnxruntime as ort
 import numpy as np
-import torch_tensorrt
-
 
 
 def make_prediction(