Sentinel Vision is a real-time safety analytics pipeline for webcam feeds, video files, or RTSP streams. It detects people and vehicles, tracks them across frames, evaluates safety events, saves operator-friendly alert evidence, and exposes a persistent alert backend with filtering and dashboard views.
Real-time detection, tracking, and intrusion alerting on a MOT17 clip. Each pedestrian is tracked with persistent IDs; entering the restricted polygon triggers an alert event.
- Input: webcam, local video file, or RTSP stream
- Detection: YOLO-based people and vehicle detection
- Tracking: ByteTrack by default, with an optional BoT-SORT-style learned re-ID path for harder occlusion cases
- Event logic: intrusion, loitering, line crossing, wrong-way, after-hours occupancy, vehicle zone violations, abandoned-object detection
- Output: annotated video, snapshots, pre/post alert clips, JSON events, SQLite alert history
- Backend: FastAPI alert API plus Streamlit dashboard
Short demo clip from a real pipeline run on a MOT17 scene with tracked pedestrians, a restricted zone, and intrusion alerts.
| Evaluation slice | MOTA | IDF1 | ID switches | Effective FPS | Note |
|---|---|---|---|---|---|
| Bundled micro-benchmark | 0.833 | 0.778 | 2 | n/a | Checked-in score-only artifact from bundled reference predictions |
| Public-dataset CPU baseline | 0.228 | 0.347 | 89 | 19.869 | edge_cpu stress-test baseline on MOT17 + VisDrone subset |
| Public-dataset tuned tracking pass | 0.247 | 0.369 | 89 | 19.159 | benchmark_association_tuned profile with stricter association |
Current checked-in public-dataset results are a CPU baseline stress test used to guide tracker and event-logic tuning, not a final tuned deployment profile.
More detail lives in docs/architecture.md.
- Detector, tracker, event logic, rendering, and persistence are split into separate modules so the runtime can be tuned or swapped without rewriting the whole pipeline.
- Config-driven profiles under
configs/profiles/make it easy to compare low-latency, edge CPU, crowded-scene, and benchmark-oriented settings on the same code path. - Benchmarking is wired into the repo with MOT17/VisDrone subset manifests, JSON result artifacts, markdown reports, and reproducible commands instead of one-off notebook screenshots.
- The backend is intentionally simple: FastAPI handles persistence and filtering, while Streamlit provides operator-facing triage on top of the same alert store.
- YOLO-backed object detection via Ultralytics
- ByteTrack-based multi-object tracking for stronger ID continuity
- Optional BoT-SORT-style tracking with lightweight learned re-ID embeddings for harder crossings, occlusion recovery, and lower ID-switch rates
- Configurable trajectory prediction that smooths recent motion for short-occlusion association and renders forecast paths on annotated video
- Optional feature-based camera motion compensation with affine or homography estimation
- Optional image-to-ground-plane homography for normalized zone reasoning and occupancy estimates
- Polygon and line zone definitions with tags and metadata
- Multi-event safety analytics for restricted areas, tripwires, motion direction, occupancy policy, and unattended asset detection
- Snapshot plus pre/post event alert clip generation with metadata sidecars
- SQLite-backed alert API with filterable history and stats
- Streamlit dashboard for operator triage
- Multi-camera runtime with per-camera outputs and a central alert service
- Benchmark/evaluation pipeline for detection, tracking, and event quality
- HTML metrics dashboard report for evaluation results
- Docker packaging for API, dashboard, and pipeline services
- Camera health status snapshots with reconnect/degradation tracking
- Demo still: readme_demo_mot17_09_intrusion.jpg
- Demo GIF: readme_demo_mot17_09_intrusion.gif
- Annotated demo video: readme_demo_mot17_09.mp4
- Benchmark/evaluation assets: data/eval/README.md
- Public benchmark report: results_public_dataset_benchmark_association_tuned.md
- Additional checked-in benchmark clips live under data/eval/videos
- Evaluation bundle and reference clips: data/eval/README.md
python -m venv .venv
.venv\Scripts\activate
.\.venv\Scripts\python.exe -m pip install -r requirements.txtFor linting, tests, and local CI-style checks:
.\.venv\Scripts\python.exe -m pip install -r requirements-dev.txtRun repo commands with the project virtual environment to avoid Windows PATH conflicts between system Python, Conda, and .venv:
.\.venv\Scripts\python.exe -m pytest -qRun against a local video file:
python -m src.main --config configs/default.yaml --source data/eval/videos/office_intrusion_short.mp4Use a webcam:
python -m src.main --config configs/default.yaml --source 0Use RTSP:
python -m src.main --config configs/default.yaml --source rtsp://user:pass@camera/streamRun multiple cameras from one config:
python -m src.main --config configs/multi_camera.yamlRun the API:
uvicorn src.api.app:app --reloadRun the dashboard:
streamlit run src/api/dashboard.pyRun the stack with Docker Compose:
docker compose up --buildUse a deployment profile and explicit device/model overrides:
python -m src.main --config configs/default.yaml --profile edge_cpu --device cpu --model yolo11n.ptUse the learned-embedding tracking profile for crowded or occlusion-heavy scenes:
python -m src.main --config configs/default.yaml --profile crowded_tracking --source data/eval/videos/mot17_04_clip.mp4{
"event_id": "evt_000001",
"timestamp": "2026-03-08T18:42:31Z",
"camera_id": "office_cam_1",
"event_type": "intrusion",
"track_id": 7,
"class": "person",
"confidence": 0.91,
"zone": "restricted_lab",
"frame_index": 3812,
"snapshot_path": "data/outputs/alerts/20260308T184231Z_intrusion_restricted_lab_track7_evt_000001.jpg",
"clip_path": "data/outputs/alerts/20260308T184231Z_intrusion_restricted_lab_track7_evt_000001.mp4",
"metadata_path": "data/outputs/alerts/20260308T184231Z_intrusion_restricted_lab_track7_evt_000001.json"
}The default YAML config is at configs/default.yaml.
- Tracking backends:
bytetrack,botsort,simple - Zones can be
polygonorline - Zone
tagsandmetadataroute zones to specific event logic abandoned_objectzones detect stationary assets whose nearest owner has moved beyond a configurable distance for a configurable dwell timeperspective.image_pointsandperspective.world_pointsenable ground-plane projection for normalized-space intrusion/loitering reasoning- Event payloads include
world_positionand zone occupancy stats when perspective calibration is configured - Alert output settings include
buffer_seconds,post_event_seconds, andduplicate_suppression_seconds - Recorder writes clips on a background queue, guards invalid FPS, and records codec fallback status in metadata
- Runtime controls include RTSP reconnect attempts, frame skip policy, inference resize, and per-stage timing logs
- Motion compensation can be toggled per camera with
runtime.motion_compensation, including astatic_camera_assumptionswitch andaffine/homographymodes - Deployment profiles under
configs/profiles/can tune runtime/model/output settings for edge CPU, edge GPU, or low-latency use cases configs/profiles/crowded_tracking.yamlswitches tracking to BoT-SORT with a lightweight MobileNetV3 appearance encoder for harder multi-object tracking casesconfigs/profiles/benchmark_association_tuned.yamlapplies stricter ByteTrack association and threshold tuning for the checked-in MOT17/VisDrone public benchmark subset--deviceand--modelCLI flags override inference backend and model selection without editing YAML- Multi-camera configs can define a top-level
cameraslist; each camera gets its own event log, health file, annotated output, and alert clip directory automatically tracking.trajectory_*tunes the tracker-side motion predictor used for short-gap association and predicted path overlaysoutput.zone_heatmapcan render and persist a zone occupancy heatmap PNG plus JSON summary for demos and traffic-pattern analysis- The API and dashboard use a persistent SQLite DB at
data/outputs/alerts.dbby default - Supported detector classes now include
backpack,handbag, andsuitcasefor unattended-object workflows
Config loading now uses schema validation with explicit checks for malformed zones, invalid thresholds, unsupported classes, and invalid dashboard URLs.
Minimum operator API:
GET /healthGET /camerasGET /alerts?camera_id=...&event_type=...&zone=...&start_time=...&end_time=...GET /alerts/{event_id}GET /statsPOST /ingest
Reproducibility details live in docs/reproducibility.md.
Generate predictions and score the benchmark in one pass:
python -m scripts.run_benchmark --manifest data/eval/benchmark_manifest.json --config configs/default.yaml --device cpu --output-json data/eval/results/latest.json --output-markdown docs/results.mdFor quicker local smoke runs on larger clips, you can subsample the benchmark:
python -m scripts.run_benchmark --manifest data/eval/benchmark_manifest_public_datasets.json --config configs/default.yaml --profile edge_cpu --device cpu --frame-skip 2 --max-frames 200 --output-json data/eval/results/quickcheck.json --output-markdown docs/results_quickcheck.mdUse --frame-skip and --max-frames for iteration speed only. Full benchmark numbers in the README and reports should come from unskipped runs.
Or score an existing set of prediction JSON files:
python -m scripts.evaluate_events --manifest data/eval/benchmark_manifest.json --output-json data/eval/results/latest.json --output-markdown docs/results.mdCurrent benchmark summary is in docs/results.md. Visual metrics dashboard report: docs/results_dashboard.html
For the checked-in docs/results.md artifact:
-
Reproduction path: score-only from checked-in prediction JSON files
-
Manifest:
data/eval/benchmark_manifest.json -
Config used for live benchmark runs:
configs/default.yaml -
Default model for live benchmark runs:
yolo11n.pt -
Hardware: not applicable for the checked-in score-only artifact because no live inference was run
-
Detection
person: precision1.000, recall0.944 -
Tracking: MOTA
0.833, MOTP1.000, IDF10.778, ID switches2 -
Events overall: precision
0.667, recall1.000, false alerts2.308/min
The benchmark manifest now supports per-clip scene_types, challenge_tags, subject_classes, clip-specific config_override, and optional runtime payloads so CPU and GPU passes can be compared directly.
Exact reproduction commands:
python -m scripts.evaluate_events --manifest data/eval/benchmark_manifest.json --output-json data/eval/results/latest.json --output-markdown docs/results.md
python -m scripts.render_metrics_report --input-json data/eval/results/latest.json --output-html docs/results_dashboard.htmlThe repo now includes a curated public-dataset tracking subset built from:
MOT17-04-FRCNNMOT17-09-FRCNNVisDrone2019-MOT-train/uav0000099_02109_vVisDrone2019-MOT-val/uav0000268_05773_v
Artifacts:
- Manifest: benchmark_manifest_public_datasets.json
- CPU results: public_dataset_cpu.json
- CPU report: results_public_dataset_cpu.md
- CPU dashboard: results_public_dataset_cpu.html
Important distinction:
- Tracking metrics on this subset are real dataset-derived results.
- Event metrics are currently false-alert-only stress numbers because MOT17 and VisDrone do not ship event-level ground truth for this repo's rule-based events.
Profile comparison snapshot:
| Profile | MOTA | IDF1 | ID switches | Effective FPS | Notes |
|---|---|---|---|---|---|
edge_cpu / public_dataset_cpu |
0.228 | 0.347 | 89 | 19.869 | Checked-in CPU baseline |
benchmark_association_tuned |
0.247 | 0.369 | 89 | 19.159 | Global assignment + stricter association thresholds |
Tracking baseline context:
- These public-dataset numbers come from the
edge_cpuprofile on CPU withyolo11n.ptand the repo's default ByteTrack-style tracker. - The subset mixes crowded MOT17 street scenes with moving-camera VisDrone aerial scenes and does not use dataset-specific detector or tracker tuning.
- Read the public-dataset results as a baseline stress test, not as a tuned SOTA tracking claim.
- For harder occlusion cases, switch to
--profile crowded_trackingto enable the learned-embedding BoT-SORT path. That mode is intended to improve IDF1 and reduce ID switches rather than maximize raw throughput.
Current public-dataset CPU baseline:
- MOTA
0.228 - IDF1
0.347 - ID switches
89 - Effective CPU runtime
19.869 FPS
Checked-in tuned public-dataset profile:
- Profile:
benchmark_association_tuned - Result artifact: public_dataset_benchmark_association_tuned.json
- MOTA
0.247 - IDF1
0.369 - ID switches
89 - Effective CPU runtime
19.159 FPS
The checked-in VisDrone clip visdrone_uav0000268_05773_clip.mp4 was re-encoded with H.264 to keep the repo lighter. If you re-encode benchmark videos, rerun python -m scripts.run_benchmark and regenerate the reports so predictions and published metrics stay aligned.
The repeatable compression workflow is documented in data/eval/README.md and implemented by python -m scripts.compress_benchmark_videos.
Original datasets are not included in Git. Raw extractions live under data/external/, which is ignored by the repo.
The pipeline writes camera health status to the configured output.health_status_path. In the default single-camera config that is data/outputs/camera_health.json; in multi-camera mode each camera gets its own namespaced health file automatically.
Health status includes:
- online/degraded/offline state
- read failure count
- detector failure count
- reconnect attempts
- last successful frame timestamp
.\.venv\Scripts\python.exe -m pytest -qLocal lint and format checks:
.\.venv\Scripts\python.exe -m ruff check src tests scripts
.\.venv\Scripts\python.exe -m black --check src tests scriptsGitHub Actions runs:
ruff check src tests scriptsblack --check src tests scriptspytest -q- an offline smoke path that evaluates the bundled benchmark, renders the HTML report, and checks
python -m scripts.run_benchmark --help
Deployment notes for Docker, Compose profiles, CPU/GPU selection, config profiles, and systemd startup are in docs/deployment.md.
- The bundled asset set is still a starter benchmark; for a credible external evaluation pass you should expand it to roughly 8 to 15 labeled clips with occlusion, false-positive traps, and both person and vehicle footage.
- The BoT-SORT-style tracker uses lightweight appearance features, not a full learned re-identification model.
- The Streamlit dashboard is intentionally minimal and optimized for speed of implementation over UI polish.
- Event logic is rule-based and depends on well-configured zones and camera placement.
- Motion compensation is intended for mild camera jitter, not large PTZ moves or scene cuts.
- Ground-plane occupancy is approximate and depends on usable camera calibration and reasonably planar scenes.
- Multi-camera mode centralizes alerts and per-camera outputs, but it does not yet do cross-camera re-identification.
- Add stronger learned re-identification for BoT-SORT-style tracking
- Expand the checked-in benchmark beyond the current MOT17 and VisDrone subset with more labeled clips, harder occlusions, and richer event-ground-truth coverage

