A thesis project for detecting and classifying aerial objects (birds, drones, and unknowns) using an ensemble of object detection models with advanced post-processing and multi-object tracking.
This repository implements a full pipeline for aerial object detection:
- Dataset Preprocessing — label remapping, OOD filtering, and format conversion
- Model Training — multiple YOLO variants and Faster R-CNN (ResNet-50 FPN)
- Model Evaluation — per-class and macro-averaged metrics with confusion matrices
- Ensemble Fusion — Weighted Boxes Fusion (WBF) and WC-NMS across all models
- Hyperparameter Tuning — Bayesian optimisation (Optuna) for fusion and confidence thresholds
- Video Inference — real-time detection with Kalman filter + LSTM multi-object tracking
| ID | Label | Description |
|---|---|---|
| 0 | bird | Bird (known class) |
| 1 | drone | Drone (known class) |
| 2 | unknown | Ambiguous / out-of-set |
thesis/
├── data.yaml # Dataset configuration (paths + class names)
├── requirements.txt # Python dependencies
│
├── dataset/
│ ├── train/ images/ labels/
│ ├── validation/ images/ labels/
│ └── test/ images/ labels/ videos/
│
├── scripts/
│ ├── dataset_preprocess/
│ │ ├── prepare_training_yolo.py # Full preprocessing pipeline for YOLO
│ │ ├── prepare_training_fasterrcnn.py # Full preprocessing pipeline for Faster R-CNN
│ │ └── prepare_validating_yolo_fasterrcnn.py
│ ├── model_training/
│ │ ├── training_yolo.py # Train any YOLO model
│ │ ├── training_fasterrcnn.py # Train Faster R-CNN
│ │ ├── training_yolo_kaggle.ipynb # Kaggle notebook (YOLO)
│ │ └── training_fasterrcnn_kaggle.ipynb # Kaggle notebook (Faster R-CNN)
│ ├── model_evaluation/
│ │ ├── eval_yolo.py # Evaluate individual YOLO models
│ │ ├── eval_fasterrcnn.py # Evaluate Faster R-CNN
│ │ ├── eval_wbf.py # Evaluate WBF ensemble
│ │ ├── eval_wc-nms.py # Evaluate WC-NMS ensemble
│ │ ├── plot_confusion_from_cache.py # Plot confusion matrices from cache
│ │ └── plot_wcnms_confusion_from_cache.py
│ └── inference_video/
│ ├── inference_video.py # Basic video inference (YOLO)
│ ├── inference_video_wbf_tracking.py # WBF ensemble + Kalman/LSTM tracking
│ └── inference_video_wbf_tracking_stl.py # WBF + tracking + STL/RTM overrides
│
├── tools/
│ ├── labels_train.py / labels_validation.py # Label ID remapping
│ ├── convert_labels_fasterrcnn.py # Convert YOLO labels → Faster R-CNN format
│ ├── tune_wbf_3.py / tune_wbf_6.py # Bayesian tuning for 3/6-model WBF
│ ├── tune_wbf_tracking.py # Bayesian tuning for WBF + tracking
│ ├── tune_yolo_f1.py / tune_yolo_f1_sweep.py # Confidence threshold tuning for YOLO
│ ├── tune_fasterrcnn_f1.py # Confidence threshold tuning for Faster R-CNN
│ ├── plot_pr_curves.py # PR curve plotting
│ ├── plot_yolo_f1_curve.py # F1 curve plotting
│ ├── check_dataset_overlap.py # Detect train/test overlap
│ ├── split_by_class.py # Split dataset by class
│ ├── eval_yolo_map50.py # mAP@50 evaluation (YOLO)
│ ├── eval_fasterrcnn_map50.py # mAP@50 evaluation (Faster R-CNN)
│ └── gui_remap.py / gui_delete_empty.py # GUI utilities
│
└── runs/
├── detect/ # YOLO training runs and ensemble outputs
├── fasterrcnn/ # Faster R-CNN training runs
└── eval_wcnms/ # WC-NMS evaluation results per model
| Model | Backbone / Variant | Framework |
|---|---|---|
| YOLOv8n | YOLOv8 nano | Ultralytics |
| YOLOv8m | YOLOv8 medium | Ultralytics |
| YOLOv8s | YOLOv8 small | Ultralytics |
| YOLOv9t | YOLOv9 tiny | Ultralytics |
| YOLOv10n | YOLOv10 nano | Ultralytics |
| YOLOv11n | YOLO11 nano | Ultralytics |
| YOLOv12n | YOLO12 nano | Ultralytics |
| YOLO26n | YOLO26 nano | Ultralytics |
| Faster R-CNN | ResNet-50 FPN | torchvision |
All YOLO models are fine-tuned from pretrained weights. Faster R-CNN uses FasterRCNN_ResNet50_FPN_Weights.DEFAULT as backbone initialisation.
# Clone the repository
git clone <repo-url>
cd thesis
# Install dependencies (CUDA 12.6 build of PyTorch)
pip install -r requirements.txtNote: The
requirements.txtinstalls PyTorch with CUDA 12.6 support. Adjust the--index-urlline for your CUDA version or for CPU-only use.
# 1. Preprocess dataset
python scripts/dataset_preprocess/prepare_training_yolo.py
# 2. Train a single YOLO model (edit model choice inside script)
python scripts/model_training/training_yolo.py
# 3. Evaluate the trained model
python scripts/model_evaluation/eval_yolo.py
# 4. Run inference on a video with WBF ensemble
python scripts/inference_video/inference_video_wbf_tracking.py# Clone the repository
git clone <repo-url>
cd thesis
# Install dependencies (CUDA 12.6 build of PyTorch)
pip install -r requirements.txtNote: The
requirements.txtinstalls PyTorch with CUDA 12.6 support. Adjust the--index-urlline for your CUDA version or for CPU-only use.
The dataset follows the YOLO label format (class x_center y_center width height, normalised). Class IDs are remapped during preprocessing:
| Original ID | Original Label | Remapped ID | Remapped Label |
|---|---|---|---|
| 1 | bird | 0 | bird |
| 2 | drone | 1 | drone |
| 0, 3 | airplane / helicopter | 2 | unknown |
Update data.yaml to point to your dataset root before training or evaluation.
Before training or evaluation, remap label IDs to standardised format (bird=0, drone=1, unknown=2):
# For YOLO training
python scripts/dataset_preprocess/prepare_training_yolo.py
# For Faster R-CNN training
python scripts/dataset_preprocess/prepare_training_fasterrcnn.py
# For evaluation only (validation/test remapping)
python scripts/dataset_preprocess/prepare_validating_yolo_fasterrcnn.pyConfiguration: Update data.yaml to point to your dataset root before running any preprocessing or training scripts.
YOLO (nano, small, medium, etc.):
# Edit `training_yolo.py` to select model variant:
# MODEL = "yolov8n.pt" # nano
# MODEL = "yolov8m.pt" # medium
# MODEL = "yolov8s.pt" # small
#
# Optionally adjust hyperparameters: epochs, batch_size, learning_rate, device
python scripts/model_training/training_yolo.pyFaster R-CNN:
# Edit `CONFIG` dict inside training_fasterrcnn.py for:
# - batch_size
# - num_epochs
# - learning_rate
# - device (cuda/cpu)
python scripts/model_training/training_fasterrcnn.pyTraining output (model weights, logs) is saved to runs/detect/ and runs/fasterrcnn/.
# Individual YOLO model evaluation (generates confusion matrices, metrics)
python scripts/model_evaluation/eval_yolo.py
# Faster R-CNN evaluation
python scripts/model_evaluation/eval_fasterrcnn.py
# WBF ensemble evaluation (fuses detections from multiple models)
python scripts/model_evaluation/eval_wbf.py
# WC-NMS ensemble evaluation (alternative fusion strategy)
python scripts/model_evaluation/eval_wc-nms.pyOutput: Confusion matrix PNGs and metrics CSVs are saved under runs/ directory.
# Tune WBF per-class weights (6-model ensemble, ~100 trials)
python tools/tune_wbf_6.py --trials 100 --seed 42
# Tune WBF per-class weights (3-model ensemble, ~50 trials)
python tools/tune_wbf_3.py --trials 50 --seed 42
# Tune per-model YOLO confidence thresholds (for F1-score maximisation)
python tools/tune_yolo_f1.py
# Tune Faster R-CNN confidence threshold
python tools/tune_fasterrcnn_f1.py
# Tune WBF + Kalman/LSTM tracking parameters
python tools/tune_wbf_tracking.py --trials 100Storage: Optuna databases are saved as SQLite files in the project root or runs/ directory (see /memories/repo/optuna_storage_paths.md for details).
# WBF ensemble + Kalman filter + LSTM smoothing (standard inference)
python scripts/inference_video/inference_video_wbf_tracking.py
# WBF + tracking + STL/RTM physical requirement overrides
python scripts/inference_video/inference_video_wbf_tracking_stl.pyInteraction: Scripts prompt for a video file via an interactive file selector. Output (annotated video + JSON detections) is saved to runs/detect/inference_video/.
| Task | Script | Notes |
|---|---|---|
| Check train/test overlap | tools/check_dataset_overlap.py |
Detects duplicate images across splits |
| Split dataset by class | tools/split_by_class.py |
Organise images by bird/drone/unknown |
| Label statistics | tools/labels_*.py |
Generate class distribution reports |
| Delete empty labels | tools/gui_delete_empty.py |
Interactive GUI for label cleanup |
| Remap label IDs | tools/gui_remap.py |
Interactive GUI for ID remapping |
| Plot PR curves | tools/plot_pr_curves.py |
Visualise precision-recall trade-offs |
| Plot F1 curves | tools/plot_yolo_f1_curve.py |
Visualise F1 vs. confidence threshold |
| Evaluate mAP@50 | tools/eval_yolo_map50.py |
Compute mean average precision |
| Issue | Solution |
|---|---|
| CUDA out of memory during training | Reduce batch_size in training scripts |
| Models not found during inference | Ensure model weights are in runs/detect/ or specify full path |
| Label format errors | Run prepare_training_yolo.py or prepare_training_fasterrcnn.py to remap IDs |
| Optuna resuming fails | Check /memories/repo/optuna_resume_notes.md for storage and session recovery |
| Tracking inaccurate | Tune Kalman/LSTM parameters with tools/tune_wbf_tracking.py |
Boxes from all enabled models are clustered by IoU and merged into a single fused detection. Each model has per-class confidence weights tuned via Bayesian optimisation. An unknown fallback is applied when:
- Fewer than
MIN_MODEL_SUPPORTmodels agree on a detection - The fused confidence is below
KNOWN_FUSED_CONF_THRESH - The score margin between the top two classes is below
SCORE_MARGIN_THRESH - The disagreement ratio exceeds
DISAGREEMENT_RATIO_THRESH
An alternative ensemble approach that operates on raw pre-NMS class scores from multiple YOLO models, clusters overlapping boxes, and applies weighted NMS to produce final detections.
Each video frame is processed as follows:
- All enabled ensemble models run inference on the frame.
- Detections are clustered and merged with WBF.
- Fused detections are matched to existing tracks via greedy IoU matching.
- Each track is smoothed with an 8-D constant-velocity Kalman filter.
- An LSTM (window = 8 frames) predicts the next bounding box centre.
Visual output per track includes a coloured bounding box, class label + track ID, a fading trail (last 30 frames), and the LSTM-predicted next position.
inference_video_wbf_tracking_stl.py additionally applies physical requirement-based overrides:
| Requirement | Rule | Override |
|---|---|---|
| REQ-03 | Fused confidence < 0.65 | → bird |
| REQ-04 | Shape deformation > ε over 10 frames | → bird |
All evaluation scripts compute and save:
- Confusion matrix (normalised by ground-truth column totals or raw counts)
- Per-class: Precision, Recall, F1-score
- Macro-averaged: Precision, Recall, F1-score
- Summary: Overall accuracy, mAP@50 (where applicable)
Results are cached as JSON files under runs/<model>/eval_cache/ for fast re-plotting without re-running inference.
- Python 3.10+
- PyTorch (CUDA 12.6 recommended, CPU supported)
- ultralytics
- torchvision
- numpy, pillow, pyyaml
- questionary (interactive CLI prompts)
- optuna (Bayesian hyperparameter tuning)
- fpdf2, reportlab (report generation)