This project focuses on vehicle detection, multi-object tracking, and vehicle counting in traffic videos using multiple YOLO versions combined with ByteTrack.
The objective was to evaluate detection performance across different YOLO versions, apply preprocessing techniques to improve robustness (especially in low-light and night scenes), and analyze performance before and after custom dataset training.
The following YOLO versions were evaluated:
- YOLOv8
- YOLOv9
- YOLOv10
- YOLOv11
Model variants analyzed:
- n (nano)
- s (small)
- m (medium)
- l (large)
- x (extra large)
Higher-capacity models (l, x) achieved better accuracy but with slower inference speed.
Tracking was performed using ByteTrack, which improves ID consistency by leveraging both high-confidence and low-confidence detections.
Two counting methods were implemented:
Each detected vehicle was assigned a unique ID using ByteTrack.
Vehicles were counted based on unique tracked IDs across frames.
A predefined virtual line was placed in the frame.
Vehicles were counted when crossing this line.
Both methods were evaluated before and after retraining.
To improve detection performance, especially in challenging lighting conditions, several preprocessing techniques were applied.
Contrast Limited Adaptive Histogram Equalization was used with:
- clipLimit = 2.0
- tileGridSize = (8, 8)
This improved local contrast in low-contrast frames.
Steps:
- Convert image to HSV color space
- Normalize the V channel using:
cv2.normalize(v, None, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX)
This reduced dark shadow regions and improved detection stability.
Gamma value used: gamma = 2.2
Applied using a lookup table transformation to enhance brightness in dark scenes.
Applied filters:
- Gaussian Blur
- Bilateral Filter
These reduced noise while preserving object boundaries.
Custom dataset creation was performed using Roboflow.
- 1 frame extracted per second from each video
- Manual annotation
- Dataset split:
- 80% Train
- 10% Validation
- 10% Test
- Base model: YOLOv8n
- Epochs: 100
- Training environment: Google Colab
Precision = 0.87
Recall = 0.62
F1 = 0.72
mAP50 = 0.66
Precision = 0.88
Recall = 0.50
F1 = 0.64
mAP50 = 0.57
Precision = 1.00
Recall = 0.97
F1 = 0.99
mAP50 = 0.99
Precision = 0.93
Recall = 0.61
F1 = 0.74
mAP50 = 0.96
Precision = 0.95
Recall = 0.64
F1 = 0.77
mAP50 = 0.97
After merging all five datasets:
Precision = 0.96
Recall = 0.64
F1 = 0.77
mAP50 = 0.75
Merged training improved generalization across different traffic conditions.
Final processing steps applied before detection:
- Convert image to grayscale
- Apply CLAHE
- Convert back to RGB
- Apply sharpening filter
- Apply Gaussian Blur
- Run detection using the retrained merged model
This final pipeline improved robustness, particularly in challenging lighting scenarios.
The original training and experimentation code was developed in Google Colab. Due to loss of access to the original notebook, only the project report and results are available in this repository.