Skip to content

cacybernetic/deepSORT

DeepSORT From Scratch

A complete DeepSORT multi-object tracking pipeline implemented from scratch — training, evaluation, fine-tuning, ONNX export, and real-time inference on webcam or video files. The Re-Identification (Re-ID) backbone is OSNet (Zhou et al., 2019) trained with the "Bag of Tricks" recipe (Luo et al., 2019). The full tracker (Kalman filter + Hungarian matching + cascade by age) runs without any PyTorch dependency at inference time.


Table of Contents


Description

This project is a complete re-implementation of the DeepSORT tracker (Wojke et al., 2017) with a modern, configurable training pipeline for the Re-Identification backbone. The Re-ID model is OSNet (Zhou et al., 2019) trained following the "Bag of Tricks" recipe (Luo et al., 2019): BNNeck, label-smoothed cross-entropy combined with batch-hard triplet loss, PK sampling, warmup learning rate, no weight decay on BatchNorm and biases, and Random Erasing augmentation.

The tracker itself follows the original DeepSORT algorithm: an 8-state Kalman filter for motion prediction, cosine distance on a per-track feature gallery for appearance matching, Mahalanobis gating for impossible motions, and a cascade matching strategy that prioritizes recently-seen tracks. The detector (YOLOv8 ONNX) and the Re-ID feature extractor (OSNet ONNX) are both loaded via onnxruntimeno PyTorch is required at inference time.

Features

  • Train from scratch any OSNet variant on a folder-based Re-ID dataset.
  • Automatic dataset validation with persistent cache: corrupted images, files at the wrong location, text files renamed with image extensions, and other malformed inputs are detected and excluded before any training starts. The result is fingerprint-cached so subsequent runs reuse the validated file list at zero cost.
  • Validation during training with optional separate test directory (Market-1501-style protocols supported).
  • Fine-tune a pretrained model on a new dataset: keep the backbone, rebuild the classifier head, optionally freeze layers or apply discriminative learning rates per layer.
  • Evaluate with the full Market-1501 protocol: mAP, CMC at multiple ranks, distance distributions, per-query and per-identity AP, retrieval visualizations.
  • Export to ONNX with batch dynamics, graph simplification (onnxsim), embedded metadata, and post-export numerical verification.
  • Inference with the complete DeepSORT pipeline (YOLO + OSNet + Kalman + Hungarian) on webcam or video file, with FPS overlay and optional video recording.
  • "Bag of Tricks" training: BNNeck, label-smoothed CE + batch-hard triplet, no-decay on BN/biases, warmup + multi-step / cosine schedulers.
  • PK identity sampler, Random Erasing, ImageNet-normalized eval transforms.
  • Layer-wise learning rate multipliers and partial freezing via glob patterns on parameter names.
  • Automatic checkpoint rotation, comprehensive evaluation reports (CSV + plots + summary), training history plots regenerated each epoch.

Project structure

.
├── src/deepsort/
│   ├── reid/                       # Re-ID module (PyTorch)
│   │   ├── models/
│   │   │   └── osnet.py            # OSNet (5 variants, BNNeck, optional IBN)
│   │   ├── dataset.py              # ReIDDataset + RandomIdentitySampler
│   │   ├── data_scan.py            # Dataset validation + fingerprint cache
│   │   ├── augmentations.py        # Train/eval transforms (Bag of Tricks)
│   │   ├── lossfn.py               # CE label-smooth + triplet batch-hard + wrapper
│   │   ├── optimizers.py           # Param groups, freezing, layer-wise LR
│   │   ├── schedulers.py           # WarmupMultiStepLR + WarmupCosineLR
│   │   ├── metrics.py              # CMC, mAP, top-K, feature extraction
│   │   ├── utils.py                # set_seed, AverageMeter, checkpointing, splits
│   │   └── entrypoints/
│   │       ├── train.py            # Training loop with validation
│   │       ├── evaluate.py         # Full evaluation with CSV + plots + report
│   │       ├── finetuning.py       # Prepare a fine-tunable checkpoint
│   │       ├── scan.py             # Audit a dataset before training
│   │       └── export.py           # Export to ONNX
│   ├── tracker/                    # DeepSORT tracker (NumPy + SciPy, no PyTorch)
│   │   ├── kalman_filter.py        # 8-state KF with size-aware noise
│   │   ├── track.py                # Track + lifecycle (Tentative/Confirmed/Deleted)
│   │   ├── matching.py             # IoU + cosine + Hungarian + cascade by age
│   │   └── tracker.py              # Top-level Tracker
│   ├── detection.py                # YOLOv8 ONNX detector (onnxruntime + NMS)
│   ├── extractor.py                # Re-ID ONNX feature extractor
│   └── entrypoints/
│       └── infer.py                # Webcam / video inference orchestrator
├── configs/
│   ├── train.yaml
│   ├── eval.yaml
│   ├── finetune.yaml
│   ├── export.yaml
│   └── infer.yaml
├── docs/
│   ├── CONCEPTS.en.md              # Academic documentation (English)
│   └── CONCEPTS.fr.md              # Academic documentation (French)
├── scripts/
│   └── reorganize_reid_dataset.py  # Convert Market-1501 / Duke flat layout to id_xxxx/
├── tests/
└── pyproject.toml

Installation

Quick install (without cloning)

pip install git+https://github.com/cacybernetic/deepSORT
# or with uv:
uv pip install git+https://github.com/cacybernetic/deepSORT

After installation, the following CLI tools are available: trainrid, evalrid, exportw, finetuneprep, runds. See Usage.

Python — Linux

1. Install uv (fast Python package manager):

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Clone the repository:

git clone https://github.com/cacybernetic/deepSORT
cd deepSORT

3. Create a virtual environment with Python 3.10:

uv venv --python 3.10
source .venv/bin/activate

4. Install the package:

# CPU-only training (or pure inference):
make install

# With CUDA support (GPU training):
make gpu_install

The CPU target installs PyTorch wheels from the CPU index of pytorch.org. The GPU target uses the default PyPI wheels (CUDA 12.x by default — match your driver).

Note — headless server: install OpenGL system libraries first:

sudo apt-get install libgl1-mesa-glx libglib2.0-0

Python — Windows

  1. Install Python 3.10 from python.org.
  2. Open PowerShell in the project folder.
  3. Install uv and create the venv:
    pip install uv
    uv venv --python 3.10
    .venv\Scripts\activate
    uv pip install -e .

Dataset format

The project expects a Re-ID dataset where each identity has its own sub-directory:

dataset/
├── id_0001/
│   ├── img_001.jpg
│   ├── img_002.jpg
│   └── ...
├── id_0002/
│   ├── img_001.jpg
│   └── ...
└── ...

Each id_xxxx/ folder contains all images of one person (one identity). Supported extensions: .jpg, .jpeg, .png, .bmp, .webp.

Public datasets (Market-1501, DukeMTMC-reID, MSMT17)

These datasets are distributed with a flat layout — all images of bounding_box_train/, bounding_box_test/ and query/ are in single directories, with the identity encoded in the filename (<pid>_c<cam>_f<frame>.jpg). A helper script reorganizes them into the expected id_xxxx/ layout in seconds:

python scripts/reorganize_reid_dataset.py \
    --src /path/to/Market-1501/bounding_box_train \
    --dst /path/to/market1501_train \
    --mode link

--mode link creates symbolic links instead of copying files (no extra disk space). Use --mode copy if symlinks aren't supported on your filesystem.

Automatic validation and cache

Every entry point that reads the dataset (trainrid, evalrid, finetuneprep) runs a validation pass before doing anything else. Each file is checked for a recognized image extension, a parseable header, supported PIL format, and reasonable dimensions. Anything that fails — corrupted images, truncated downloads, text files renamed with .jpg, files at the wrong location — is excluded with a clear diagnostic instead of crashing the training loop hours later.

The validated file list is fingerprinted (SHA1 over relative paths and file sizes) and cached under <root>/.reid_cache/scan_*.json. The cache is rebuilt automatically when:

  • a file is added, removed, or replaced in the dataset, or
  • the scanner version is bumped (very rare; only when validation rules themselves change).

Copying or moving the dataset does not invalidate the cache (file sizes don't change), so a one-time scan covers all downstream runs. To audit your dataset manually before a long training, see Section 1 below.


Usage

After installation, six commands are available:

Command Role
scanrid Audit a dataset for corrupted/invalid files (optional pre-flight)
trainrid Train the Re-ID model
evalrid Evaluate a checkpoint and produce a full report
finetunerid Prepare a checkpoint for fine-tuning on a new dataset
exportw Export weights to ONNX
runds Run the complete DeepSORT tracker on webcam/video

Each command takes a single --config argument pointing to its YAML file (scanrid also accepts --root directly).

1. Audit your dataset

Optional but recommended before launching a long training. Validates every file under the dataset root, builds the cache, and produces a human-readable report so you know exactly which files were excluded and why:

# Standalone usage (no config needed):
scanrid --root /path/to/dataset

# Read the dataset root from a YAML config:
scanrid --config configs/train.yaml

# Strict mode: fully decode each image (slower but catches mid-file corruption):
scanrid --root /path/to/dataset --paranoid

# Force a rebuild of the cache (e.g. after a scanner-version bump):
scanrid --root /path/to/dataset --force-rescan

# Also drop identities with fewer than K samples (required for batch-hard triplet):
scanrid --root /path/to/dataset --min-samples 4

# CI/CD-friendly: exit with code 2 if any invalid file is found:
scanrid --root /path/to/dataset --fail-on-invalid

Output is a printed summary plus a JSON cache file under <root>/.reid_cache/. The full list of excluded files (valid / invalid / skipped) is preserved in the cache for later inspection. Subsequent calls to trainrid and evalrid automatically reuse this cache — no double scan.

2. Train the Re-ID model

Edit configs/train.yaml to point to your dataset, then run:

trainrid --config configs/train.yaml

Outputs in output_dir/ (default ./runs/reid_osnet_x1_0/):

  • checkpoints/epoch_*.pth — automatic rotation, keeps the N most recent
  • checkpoints/best.pth — best mAP checkpoint (never deleted by rotation)
  • plots/training_curves.png — losses + validation mAP regenerated every epoch
  • logs/train_*.log — full training log
  • history.json — training history in JSON for external analysis

The training loop supports:

  • Gradient accumulation (accum_steps) for large effective batches on small GPUs
  • Layer freezing (freeze_feature_layers: true) or per-layer LR multipliers (layer_lr_multipliers)
  • Resume from checkpoint (auto by default), with priority checkpoint > pretrained_weights > fresh
  • Validation source: same root as training (val carved out) OR separate test_root for proper Re-ID protocol

3. Evaluate a checkpoint

Edit configs/eval.yaml (dataset, checkpoint path, output dir), then run:

evalrid --config configs/eval.yaml

Each run creates a unique timestamped subfolder under output_dir (default ./results/):

  • summary.txt — human-readable report with metric interpretations and recommendations
  • metrics_explanation.md — academic definition of each metric
  • csv/summary_metrics.csv, per_query_ap.csv, per_identity_ap.csv, cmc_curve.csv, top_k_retrievals.csv
  • plots/ — CMC curve, per-query AP histogram, distance distributions, per-identity bar chart, PCA embeddings, retrieval examples grid

Two evaluation modes:

  • Auto-split (single root + val_ratio): for custom datasets without official query/gallery split
  • Explicit (separate query_root + gallery_root): for Market-1501-style protocols

4. Fine-tune on a new dataset

Step 1 — prepare a compatible checkpoint:

Edit configs/finetune.yaml with the source checkpoint and new target dataset, then run:

finetunerid --config configs/finetune.yaml

This rebuilds the model with the new number of identities, keeps the backbone (and optionally FC + BNNeck), and re-initializes the classifier head. Outputs a .pth file ready for training.

Step 2 — train on the new dataset:

In configs/train.yaml, set model.pretrained_weights to the output of step 1 and configure layer-wise LR or freezing:

model:
  pretrained_weights: ./runs/finetune_custom/pretrained_for_finetune.pth

optimizer:
  layer_lr_multipliers:
    conv1.*: 0.1            # 10% of base LR on the backbone
    conv2.*: 0.1
    classifier.*: 1.0       # full LR on the fresh classifier

training:
  freeze_feature_layers: false   # or true for small datasets

Then:

trainrid --config configs/train.yaml

5. Export to ONNX

Edit configs/export.yaml, then run:

exportw --config configs/export.yaml

The exported .onnx file includes L2-normalization in the graph, supports dynamic batch axis, and is simplified with onnxsim. The output is verified numerically against the PyTorch model by default (max absolute diff reported in logs).

6. Run the complete DeepSORT tracker

You need two ONNX files:

  • A detector (YOLOv8 ONNX): export one yourself with Ultralytics, e.g.:
    python -c "from ultralytics import YOLO; YOLO('yolov8n.pt').export(format='onnx', opset=17, simplify=True, dynamic=False)"
  • The Re-ID extractor (best.onnx produced by step 4)

Edit configs/infer.yaml to point to both files and the video source, then run:

runds --config configs/infer.yaml
  • source: 0 opens the first webcam; source: path/to/video.mp4 uses a file
  • output.display: true shows a live OpenCV window (press q or ESC to quit)
  • output.save_video: path.mp4 records the annotated stream
  • output.show_fps: true overlays the current FPS

Configuration files

All behavior is controlled through YAML files in configs/. Key fields summary:

File Key fields
train.yaml model.name, data.root, data.test_root, data.scan.*, optimizer.lr, scheduler.milestones, training.batch_size, training.freeze_feature_layers
eval.yaml checkpoint, data.root or data.query_root+data.gallery_root, data.scan.*, evaluation.metric, reports.*
finetune.yaml source_checkpoint, target_dataset.root, reset.classifier, output_path
export.yaml checkpoint, output_path, export.dynamic_batch, export.simplify, export.verify
infer.yaml source, reid.model_path, detector.model_path, tracker.*

Dataset scanning options

The train.yaml and eval.yaml files accept an optional data.scan block to control dataset validation and caching:

data:
  root: ./dataset/train
  scan:
    cache_dir: null          # null = <root>/.reid_cache/ ; or a global path
    paranoid: false          # full pixel decode (slower but bulletproof)
    workers: 8               # parallel validation threads
    force_rescan: false      # set true once to rebuild a stale cache
    use_cache: true

Defaults are sensible for most cases; the block can be omitted entirely. Set paranoid: true for a one-time deep audit of a new dataset, then leave it false afterwards.


Documentation

For an in-depth academic walkthrough of every concept used in this project — Re-ID problem, OSNet architecture, BNNeck, batch-hard triplet, Kalman filter equations, Hungarian assignment, cascade matching, etc. — see:

These documents cover the theory in detail with mathematical formulations, references to the relevant code files, and practical considerations for each design choice.


To contribute

Contributions are welcome. Please follow these steps:

  1. Fork the repository and clone it locally.
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Commit your changes: git commit -m 'Add a new feature'
  4. Push to the branch: git push origin feature/my-feature
  5. Open a Pull Request.

Licence

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

This project would not exist without the work of several outstanding researchers and engineers:

  • Nicolai Wojke for the original deep_sort repository, the canonical reference implementation of the DeepSORT algorithm. The structure of our Kalman filter, the cascade matching strategy, and the gating mechanism are directly inspired by this work.
  • Kaiyang Zhou for the deep-person-reid library and the OSNet architecture. Our OSNet implementation closely follows the reference code from this library, with the addition of the BNNeck head from Luo et al.
  • Hao Luo and the authors of the "Bag of Tricks" baseline, whose training recipe (BNNeck, warmup, no-decay on BN/biases) is the basis of our training pipeline.

If you find this project useful, please consider starring the repositories above to support the educational and research work that made it possible.

References

The implementation is based on the following papers:

Tracking

  • DeepSORT — Wojke, N., Bewley, A., & Paulus, D. (2017). Simple Online and Realtime Tracking with a Deep Association Metric. ICIP 2017. arXiv:1703.07402
  • SORT — Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). Simple Online and Realtime Tracking. ICIP 2016. arXiv:1602.00763
  • Kalman Filter — Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 82(1), 35–45.
  • Hungarian algorithm — Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2), 83–97.

Re-Identification architecture

  • OSNet — Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2019). Omni-Scale Feature Learning for Person Re-Identification. ICCV 2019. arXiv:1905.00953
  • IBN-Net — Pan, X., Luo, P., Shi, J., & Tang, X. (2018). Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net. ECCV 2018. arXiv:1807.09441

Training methodology

  • Bag of Tricks — Luo, H., Gu, Y., Liao, X., Lai, S., & Jiang, W. (2019). Bag of Tricks and A Strong Baseline for Deep Person Re-Identification. CVPRW 2019. arXiv:1903.07071
  • Triplet loss with batch-hard mining — Hermans, A., Beyer, L., & Leibe, B. (2017). In Defense of the Triplet Loss for Person Re-Identification. arXiv preprint. arXiv:1703.07737
  • Label smoothing — Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. CVPR 2016. arXiv:1512.00567
  • Random Erasing — Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2017). Random Erasing Data Augmentation. arXiv preprint. arXiv:1708.04896
  • No weight decay on BN — Goyal, P., et al. (2017). Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv:1706.02677

Evaluation protocol and datasets

  • Market-1501 — Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable Person Re-identification: A Benchmark. ICCV 2015. Source of the CMC + mAP evaluation protocol used in this project.
  • DukeMTMC-reID — Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. ECCV Workshops 2016.

Reference implementations

  • nwojke/deep_sort — Wojke, N. (2017). Original DeepSORT reference implementation.
  • KaiyangZhou/deep-person-reid — Zhou, K. (2019). Torchreid library and OSNet reference code.
  • michuanhaohao/reid-strong-baseline — Luo, H. (2019). Bag of Tricks reference implementation.

Contact

For questions or suggestions:

About

A complete DeepSORT multi-object tracking pipeline implemented from scratch — training, evaluation, fine-tuning, ONNX export, and real-time inference on webcam or video files.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors