A complete DeepSORT multi-object tracking pipeline implemented from scratch — training, evaluation, fine-tuning, ONNX export, and real-time inference on webcam or video files. The Re-Identification (Re-ID) backbone is OSNet (Zhou et al., 2019) trained with the "Bag of Tricks" recipe (Luo et al., 2019). The full tracker (Kalman filter + Hungarian matching + cascade by age) runs without any PyTorch dependency at inference time.
Table of Contents
- Description
- Features
- Project structure
- Installation
- Dataset format
- Usage
- Configuration files
- Documentation
- To contribute
- Licence
- Acknowledgments
- References
- Contact
This project is a complete re-implementation of the DeepSORT tracker (Wojke et al., 2017) with a modern, configurable training pipeline for the Re-Identification backbone. The Re-ID model is OSNet (Zhou et al., 2019) trained following the "Bag of Tricks" recipe (Luo et al., 2019): BNNeck, label-smoothed cross-entropy combined with batch-hard triplet loss, PK sampling, warmup learning rate, no weight decay on BatchNorm and biases, and Random Erasing augmentation.
The tracker itself follows the original DeepSORT algorithm: an 8-state Kalman filter for motion prediction, cosine distance on a per-track feature gallery for appearance matching, Mahalanobis gating for impossible motions, and a cascade matching strategy that prioritizes recently-seen tracks. The detector (YOLOv8 ONNX) and the Re-ID feature extractor (OSNet ONNX) are both loaded via onnxruntime — no PyTorch is required at inference time.
- Train from scratch any OSNet variant on a folder-based Re-ID dataset.
- Automatic dataset validation with persistent cache: corrupted images, files at the wrong location, text files renamed with image extensions, and other malformed inputs are detected and excluded before any training starts. The result is fingerprint-cached so subsequent runs reuse the validated file list at zero cost.
- Validation during training with optional separate test directory (Market-1501-style protocols supported).
- Fine-tune a pretrained model on a new dataset: keep the backbone, rebuild the classifier head, optionally freeze layers or apply discriminative learning rates per layer.
- Evaluate with the full Market-1501 protocol: mAP, CMC at multiple ranks, distance distributions, per-query and per-identity AP, retrieval visualizations.
- Export to ONNX with batch dynamics, graph simplification (
onnxsim), embedded metadata, and post-export numerical verification. - Inference with the complete DeepSORT pipeline (YOLO + OSNet + Kalman + Hungarian) on webcam or video file, with FPS overlay and optional video recording.
- "Bag of Tricks" training: BNNeck, label-smoothed CE + batch-hard triplet, no-decay on BN/biases, warmup + multi-step / cosine schedulers.
- PK identity sampler, Random Erasing, ImageNet-normalized eval transforms.
- Layer-wise learning rate multipliers and partial freezing via glob patterns on parameter names.
- Automatic checkpoint rotation, comprehensive evaluation reports (CSV + plots + summary), training history plots regenerated each epoch.
.
├── src/deepsort/
│ ├── reid/ # Re-ID module (PyTorch)
│ │ ├── models/
│ │ │ └── osnet.py # OSNet (5 variants, BNNeck, optional IBN)
│ │ ├── dataset.py # ReIDDataset + RandomIdentitySampler
│ │ ├── data_scan.py # Dataset validation + fingerprint cache
│ │ ├── augmentations.py # Train/eval transforms (Bag of Tricks)
│ │ ├── lossfn.py # CE label-smooth + triplet batch-hard + wrapper
│ │ ├── optimizers.py # Param groups, freezing, layer-wise LR
│ │ ├── schedulers.py # WarmupMultiStepLR + WarmupCosineLR
│ │ ├── metrics.py # CMC, mAP, top-K, feature extraction
│ │ ├── utils.py # set_seed, AverageMeter, checkpointing, splits
│ │ └── entrypoints/
│ │ ├── train.py # Training loop with validation
│ │ ├── evaluate.py # Full evaluation with CSV + plots + report
│ │ ├── finetuning.py # Prepare a fine-tunable checkpoint
│ │ ├── scan.py # Audit a dataset before training
│ │ └── export.py # Export to ONNX
│ ├── tracker/ # DeepSORT tracker (NumPy + SciPy, no PyTorch)
│ │ ├── kalman_filter.py # 8-state KF with size-aware noise
│ │ ├── track.py # Track + lifecycle (Tentative/Confirmed/Deleted)
│ │ ├── matching.py # IoU + cosine + Hungarian + cascade by age
│ │ └── tracker.py # Top-level Tracker
│ ├── detection.py # YOLOv8 ONNX detector (onnxruntime + NMS)
│ ├── extractor.py # Re-ID ONNX feature extractor
│ └── entrypoints/
│ └── infer.py # Webcam / video inference orchestrator
├── configs/
│ ├── train.yaml
│ ├── eval.yaml
│ ├── finetune.yaml
│ ├── export.yaml
│ └── infer.yaml
├── docs/
│ ├── CONCEPTS.en.md # Academic documentation (English)
│ └── CONCEPTS.fr.md # Academic documentation (French)
├── scripts/
│ └── reorganize_reid_dataset.py # Convert Market-1501 / Duke flat layout to id_xxxx/
├── tests/
└── pyproject.toml
pip install git+https://github.com/cacybernetic/deepSORT
# or with uv:
uv pip install git+https://github.com/cacybernetic/deepSORTAfter installation, the following CLI tools are available: trainrid, evalrid, exportw, finetuneprep, runds. See Usage.
1. Install uv (fast Python package manager):
curl -LsSf https://astral.sh/uv/install.sh | sh2. Clone the repository:
git clone https://github.com/cacybernetic/deepSORT
cd deepSORT3. Create a virtual environment with Python 3.10:
uv venv --python 3.10
source .venv/bin/activate4. Install the package:
# CPU-only training (or pure inference):
make install
# With CUDA support (GPU training):
make gpu_installThe CPU target installs PyTorch wheels from the CPU index of pytorch.org. The GPU target uses the default PyPI wheels (CUDA 12.x by default — match your driver).
Note — headless server: install OpenGL system libraries first:
sudo apt-get install libgl1-mesa-glx libglib2.0-0
- Install Python 3.10 from python.org.
- Open PowerShell in the project folder.
- Install
uvand create the venv:pip install uv uv venv --python 3.10 .venv\Scripts\activate uv pip install -e .
The project expects a Re-ID dataset where each identity has its own sub-directory:
dataset/
├── id_0001/
│ ├── img_001.jpg
│ ├── img_002.jpg
│ └── ...
├── id_0002/
│ ├── img_001.jpg
│ └── ...
└── ...
Each id_xxxx/ folder contains all images of one person (one identity). Supported extensions: .jpg, .jpeg, .png, .bmp, .webp.
These datasets are distributed with a flat layout — all images of bounding_box_train/, bounding_box_test/ and query/ are in single directories, with the identity encoded in the filename (<pid>_c<cam>_f<frame>.jpg). A helper script reorganizes them into the expected id_xxxx/ layout in seconds:
python scripts/reorganize_reid_dataset.py \
--src /path/to/Market-1501/bounding_box_train \
--dst /path/to/market1501_train \
--mode link--mode link creates symbolic links instead of copying files (no extra disk space). Use --mode copy if symlinks aren't supported on your filesystem.
Every entry point that reads the dataset (trainrid, evalrid, finetuneprep) runs a validation pass before doing anything else. Each file is checked for a recognized image extension, a parseable header, supported PIL format, and reasonable dimensions. Anything that fails — corrupted images, truncated downloads, text files renamed with .jpg, files at the wrong location — is excluded with a clear diagnostic instead of crashing the training loop hours later.
The validated file list is fingerprinted (SHA1 over relative paths and file sizes) and cached under <root>/.reid_cache/scan_*.json. The cache is rebuilt automatically when:
- a file is added, removed, or replaced in the dataset, or
- the scanner version is bumped (very rare; only when validation rules themselves change).
Copying or moving the dataset does not invalidate the cache (file sizes don't change), so a one-time scan covers all downstream runs. To audit your dataset manually before a long training, see Section 1 below.
After installation, six commands are available:
| Command | Role |
|---|---|
scanrid |
Audit a dataset for corrupted/invalid files (optional pre-flight) |
trainrid |
Train the Re-ID model |
evalrid |
Evaluate a checkpoint and produce a full report |
finetunerid |
Prepare a checkpoint for fine-tuning on a new dataset |
exportw |
Export weights to ONNX |
runds |
Run the complete DeepSORT tracker on webcam/video |
Each command takes a single --config argument pointing to its YAML file (scanrid also accepts --root directly).
Optional but recommended before launching a long training. Validates every file under the dataset root, builds the cache, and produces a human-readable report so you know exactly which files were excluded and why:
# Standalone usage (no config needed):
scanrid --root /path/to/dataset
# Read the dataset root from a YAML config:
scanrid --config configs/train.yaml
# Strict mode: fully decode each image (slower but catches mid-file corruption):
scanrid --root /path/to/dataset --paranoid
# Force a rebuild of the cache (e.g. after a scanner-version bump):
scanrid --root /path/to/dataset --force-rescan
# Also drop identities with fewer than K samples (required for batch-hard triplet):
scanrid --root /path/to/dataset --min-samples 4
# CI/CD-friendly: exit with code 2 if any invalid file is found:
scanrid --root /path/to/dataset --fail-on-invalidOutput is a printed summary plus a JSON cache file under <root>/.reid_cache/. The full list of excluded files (valid / invalid / skipped) is preserved in the cache for later inspection. Subsequent calls to trainrid and evalrid automatically reuse this cache — no double scan.
Edit configs/train.yaml to point to your dataset, then run:
trainrid --config configs/train.yamlOutputs in output_dir/ (default ./runs/reid_osnet_x1_0/):
checkpoints/epoch_*.pth— automatic rotation, keeps the N most recentcheckpoints/best.pth— best mAP checkpoint (never deleted by rotation)plots/training_curves.png— losses + validation mAP regenerated every epochlogs/train_*.log— full training loghistory.json— training history in JSON for external analysis
The training loop supports:
- Gradient accumulation (
accum_steps) for large effective batches on small GPUs - Layer freezing (
freeze_feature_layers: true) or per-layer LR multipliers (layer_lr_multipliers) - Resume from checkpoint (auto by default), with priority
checkpoint > pretrained_weights > fresh - Validation source: same root as training (val carved out) OR separate
test_rootfor proper Re-ID protocol
Edit configs/eval.yaml (dataset, checkpoint path, output dir), then run:
evalrid --config configs/eval.yamlEach run creates a unique timestamped subfolder under output_dir (default ./results/):
summary.txt— human-readable report with metric interpretations and recommendationsmetrics_explanation.md— academic definition of each metriccsv/—summary_metrics.csv,per_query_ap.csv,per_identity_ap.csv,cmc_curve.csv,top_k_retrievals.csvplots/— CMC curve, per-query AP histogram, distance distributions, per-identity bar chart, PCA embeddings, retrieval examples grid
Two evaluation modes:
- Auto-split (single
root+val_ratio): for custom datasets without official query/gallery split - Explicit (separate
query_root+gallery_root): for Market-1501-style protocols
Step 1 — prepare a compatible checkpoint:
Edit configs/finetune.yaml with the source checkpoint and new target dataset, then run:
finetunerid --config configs/finetune.yamlThis rebuilds the model with the new number of identities, keeps the backbone (and optionally FC + BNNeck), and re-initializes the classifier head. Outputs a .pth file ready for training.
Step 2 — train on the new dataset:
In configs/train.yaml, set model.pretrained_weights to the output of step 1 and configure layer-wise LR or freezing:
model:
pretrained_weights: ./runs/finetune_custom/pretrained_for_finetune.pth
optimizer:
layer_lr_multipliers:
conv1.*: 0.1 # 10% of base LR on the backbone
conv2.*: 0.1
classifier.*: 1.0 # full LR on the fresh classifier
training:
freeze_feature_layers: false # or true for small datasetsThen:
trainrid --config configs/train.yamlEdit configs/export.yaml, then run:
exportw --config configs/export.yamlThe exported .onnx file includes L2-normalization in the graph, supports dynamic batch axis, and is simplified with onnxsim. The output is verified numerically against the PyTorch model by default (max absolute diff reported in logs).
You need two ONNX files:
- A detector (YOLOv8 ONNX): export one yourself with Ultralytics, e.g.:
python -c "from ultralytics import YOLO; YOLO('yolov8n.pt').export(format='onnx', opset=17, simplify=True, dynamic=False)" - The Re-ID extractor (
best.onnxproduced by step 4)
Edit configs/infer.yaml to point to both files and the video source, then run:
runds --config configs/infer.yamlsource: 0opens the first webcam;source: path/to/video.mp4uses a fileoutput.display: trueshows a live OpenCV window (pressqorESCto quit)output.save_video: path.mp4records the annotated streamoutput.show_fps: trueoverlays the current FPS
All behavior is controlled through YAML files in configs/. Key fields summary:
| File | Key fields |
|---|---|
train.yaml |
model.name, data.root, data.test_root, data.scan.*, optimizer.lr, scheduler.milestones, training.batch_size, training.freeze_feature_layers |
eval.yaml |
checkpoint, data.root or data.query_root+data.gallery_root, data.scan.*, evaluation.metric, reports.* |
finetune.yaml |
source_checkpoint, target_dataset.root, reset.classifier, output_path |
export.yaml |
checkpoint, output_path, export.dynamic_batch, export.simplify, export.verify |
infer.yaml |
source, reid.model_path, detector.model_path, tracker.* |
The train.yaml and eval.yaml files accept an optional data.scan block to control dataset validation and caching:
data:
root: ./dataset/train
scan:
cache_dir: null # null = <root>/.reid_cache/ ; or a global path
paranoid: false # full pixel decode (slower but bulletproof)
workers: 8 # parallel validation threads
force_rescan: false # set true once to rebuild a stale cache
use_cache: trueDefaults are sensible for most cases; the block can be omitted entirely. Set paranoid: true for a one-time deep audit of a new dataset, then leave it false afterwards.
For an in-depth academic walkthrough of every concept used in this project — Re-ID problem, OSNet architecture, BNNeck, batch-hard triplet, Kalman filter equations, Hungarian assignment, cascade matching, etc. — see:
- English:
docs/CONCEPTS.en.md - Français:
docs/CONCEPTS.fr.md
These documents cover the theory in detail with mathematical formulations, references to the relevant code files, and practical considerations for each design choice.
Contributions are welcome. Please follow these steps:
- Fork the repository and clone it locally.
- Create a feature branch:
git checkout -b feature/my-feature - Commit your changes:
git commit -m 'Add a new feature' - Push to the branch:
git push origin feature/my-feature - Open a Pull Request.
This project is licensed under the MIT License. See the LICENSE file for details.
This project would not exist without the work of several outstanding researchers and engineers:
- Nicolai Wojke for the original deep_sort repository, the canonical reference implementation of the DeepSORT algorithm. The structure of our Kalman filter, the cascade matching strategy, and the gating mechanism are directly inspired by this work.
- Kaiyang Zhou for the deep-person-reid library and the OSNet architecture. Our OSNet implementation closely follows the reference code from this library, with the addition of the BNNeck head from Luo et al.
- Hao Luo and the authors of the "Bag of Tricks" baseline, whose training recipe (BNNeck, warmup, no-decay on BN/biases) is the basis of our training pipeline.
If you find this project useful, please consider starring the repositories above to support the educational and research work that made it possible.
The implementation is based on the following papers:
- DeepSORT — Wojke, N., Bewley, A., & Paulus, D. (2017). Simple Online and Realtime Tracking with a Deep Association Metric. ICIP 2017. arXiv:1703.07402
- SORT — Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). Simple Online and Realtime Tracking. ICIP 2016. arXiv:1602.00763
- Kalman Filter — Kalman, R. E. (1960). A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 82(1), 35–45.
- Hungarian algorithm — Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1–2), 83–97.
- OSNet — Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2019). Omni-Scale Feature Learning for Person Re-Identification. ICCV 2019. arXiv:1905.00953
- IBN-Net — Pan, X., Luo, P., Shi, J., & Tang, X. (2018). Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net. ECCV 2018. arXiv:1807.09441
- Bag of Tricks — Luo, H., Gu, Y., Liao, X., Lai, S., & Jiang, W. (2019). Bag of Tricks and A Strong Baseline for Deep Person Re-Identification. CVPRW 2019. arXiv:1903.07071
- Triplet loss with batch-hard mining — Hermans, A., Beyer, L., & Leibe, B. (2017). In Defense of the Triplet Loss for Person Re-Identification. arXiv preprint. arXiv:1703.07737
- Label smoothing — Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. CVPR 2016. arXiv:1512.00567
- Random Erasing — Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2017). Random Erasing Data Augmentation. arXiv preprint. arXiv:1708.04896
- No weight decay on BN — Goyal, P., et al. (2017). Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. arXiv:1706.02677
- Market-1501 — Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., & Tian, Q. (2015). Scalable Person Re-identification: A Benchmark. ICCV 2015. Source of the CMC + mAP evaluation protocol used in this project.
- DukeMTMC-reID — Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking. ECCV Workshops 2016.
- nwojke/deep_sort — Wojke, N. (2017). Original DeepSORT reference implementation.
- KaiyangZhou/deep-person-reid — Zhou, K. (2019). Torchreid library and OSNet reference code.
- michuanhaohao/reid-strong-baseline — Luo, H. (2019). Bag of Tricks reference implementation.
For questions or suggestions:
- Author: DOCTOR MOKIRA — dr.mokira@gmail.com
- Maintainer: CONSOLE ART CYBERNETIC — ca.cybernetic@gmail.com
- GitHub: cacybernetic/deepSORT