Vision Hub is a multi-demo platform to discover and run real-time computer vision + ML demos from a single UI. It’s designed for lab / kiosk / demo-room setups: one web frontend, a small orchestrator API, and multiple independent demo backends.
Made by: Tom Burellier (Maintainer / Integrator)
Context: Demo platform for applied AI / computer vision experiments (LTU lab-friendly).
- A web UI (Astro) listing demos and opening a live viewer page
- A lightweight orchestrator API that can start/stop demos on demand (and manage “consent” tokens)
- A video hub (MediaMTX) to route streams (RTSP / WebRTC WHEP) between capture + demos + browsers
- Independent demo backends:
- YOLO Object Detection (boxes via WebSocket overlay)
- YOLO Pose Estimation (keypoints overlay)
- Chang demo (Arabic line detection pipeline; ONNX Runtime + C++/Qt, integrated into Vision Hub)
- House Price Estimation (FastAPI + LightGBM; trained on data provided by Booli)
Vision Hub separates video transport (MediaMTX) from inference (demos) and from UI rendering (browser + canvas overlay).
UI (Astro) http://<host>:4321
/projects /demo/<id> /docs
|
| start/stop/status + consent + heartbeat
v
Orchestrator API http://<host>:8090
/demos/<id>/start|stop|status
|
| starts / stops demo containers
v
+-----------------------+---------------------------+-----------------------+
| | | |
| | | |
Capture YOLO / Pose Chang demo Price demo
(USB cam -> RTSP) (RTSP in + WS out) (Arabic line detection) (tabular ML)
| | | |
| RTSP publish | RTSP subscribe | RTSP publish/relay | HTTP (FastAPI)
v v v v
MediaMTX (video hub) (still goes via UI)
RTSP <-> WebRTC (WHEP) <-> HLS
|
| WHEP (low-latency playback)
v
Browser <video> + <canvas> overlay
- Boxes: WS detections -> canvas
- Pose: WS keypoints -> canvas
You’ll typically find these top-level folders:
ui-astro/— the web UI (Astro + Starlight docs)API/— orchestrator API (start/stop/status/consent/heartbeat)mediamtx/— MediaMTX configuration (RTSP/WebRTC hub)capture/— camera ingest (USB cam → RTSP)yolo/— YOLO object detection service (WS boxes + optional debug)pose/— YOLO pose estimation service (WS keypoints)chang-demo/— Chang’s demo integration (Arabic line detection pipeline)price-estimation/— house price estimation (training + FastAPI serving)
Folder names may vary slightly depending on how you publish images in your stack, but the roles above are stable.
The UI does not directly start containers.
Instead, it talks to the orchestrator:
GET /demos/<id>/status→ is it running?POST /demos/<id>/start?wait=1→ bring it up if neededPOST /demos/<id>/stop→ stop itPOST /consent?demo=<id>→ mint a short-lived consent token for viewer startPOST /demos/<id>/heartbeat→ keep-alive while the viewer page stays open
This keeps the UI simple, and lets you enforce:
- per-demo start rules
- timeouts / auto-stop policies
- “consent required” gating for camera-based demos
Most demos are viewed via WebRTC WHEP exposed by MediaMTX.
Typical flow:
- Capture publishes RTSP to MediaMTX (raw camera)
- A demo service subscribes (RTSP) and may publish an annotated stream (RTSP)
- MediaMTX exposes a WHEP endpoint for the stream
- The UI viewer page uses
connectWhep()to play it in<video>
This gives:
- very low latency
- good network behavior in a lab LAN
- one consistent “viewer” implementation across demos
The video stream stays clean.
Overlays are drawn in the browser using a <canvas> on top of the <video>:
- Boxes mode: demo backend pushes detections over WebSocket
The viewer converts them to{x,y,w,h,label}and draws them on a canvas. - Pose mode: demo backend pushes keypoints over WebSocket
The viewer draws skeleton edges + points with smoothing/linger.
This is why the UI stays fast:
the browser draws lightweight geometry instead of decoding a fully annotated stream.
This integration focuses on an Arabic line detection pipeline.
High level:
- Model(s) are distilled/exported to ONNX for portability
- Inference runs with ONNX Runtime
- The pipeline is implemented in C++/Qt to stay cross-platform friendly
- Vision Hub wraps it into a demo: start/stop via orchestrator + view as a stream
This demo predicts Swedish home prices with uncertainty bounds.
- Data: Provided by Booli (Sweden). Huge thanks to Booli for enabling this dataset access.
- Model: LightGBM (median + quantile bounds)
- Serving: FastAPI endpoint consumed by the UI
In the UI, this is a “tabular ML” demo (not video), but it lives in Vision Hub the same way: a project page + a runnable demo route.
This repo is designed to run via container orchestration (Docker Swarm in the lab, but Compose also works).
At a minimum you need:
- UI
- Orchestrator
- MediaMTX
- One demo (YOLO / Pose / Chang / Price)
Then add:
- capture, if you want live camera input
- Maintainer / Integrator: Tom Burellier
- Chang demo contributor: Chang (Arabic line detection pipeline integration)
- Data partner (price demo): Booli (Sweden)
Upstream projects are listed per-demo in the UI project pages.