Vision Hub

Vision Hub is a multi-demo platform to discover and run real-time computer vision + ML demos from a single UI. It’s designed for lab / kiosk / demo-room setups: one web frontend, a small orchestrator API, and multiple independent demo backends.

Made by: Tom Burellier (Maintainer / Integrator)
Context: Demo platform for applied AI / computer vision experiments (LTU lab-friendly).

What you get

A web UI (Astro) listing demos and opening a live viewer page
A lightweight orchestrator API that can start/stop demos on demand (and manage “consent” tokens)
A video hub (MediaMTX) to route streams (RTSP / WebRTC WHEP) between capture + demos + browsers
Independent demo backends:
- YOLO Object Detection (boxes via WebSocket overlay)
- YOLO Pose Estimation (keypoints overlay)
- Chang demo (Arabic line detection pipeline; ONNX Runtime + C++/Qt, integrated into Vision Hub)
- House Price Estimation (FastAPI + LightGBM; trained on data provided by Booli)

High-level architecture

Vision Hub separates video transport (MediaMTX) from inference (demos) and from UI rendering (browser + canvas overlay).

                    UI (Astro)                         http://<host>:4321
              /projects  /demo/<id>  /docs
                          |
                          | start/stop/status + consent + heartbeat
                          v
                Orchestrator API                        http://<host>:8090
                 /demos/<id>/start|stop|status
                          |
                          | starts / stops demo containers
                          v
  +-----------------------+---------------------------+-----------------------+
  |                       |                           |                       |
  |                       |                           |                       |
Capture               YOLO / Pose                 Chang demo             Price demo
(USB cam -> RTSP)     (RTSP in + WS out)          (Arabic line detection) (tabular ML)
  |                       |                           |                       |
  | RTSP publish          | RTSP subscribe            | RTSP publish/relay    | HTTP (FastAPI)
  v                       v                           v                       v
                    MediaMTX (video hub)              (still goes via UI)
            RTSP  <->  WebRTC (WHEP)  <->  HLS
                          |
                          | WHEP (low-latency playback)
                          v
                 Browser <video> + <canvas> overlay
                 - Boxes: WS detections -> canvas
                 - Pose:  WS keypoints  -> canvas

Repository layout (human view)

You’ll typically find these top-level folders:

ui-astro/ — the web UI (Astro + Starlight docs)
API/ — orchestrator API (start/stop/status/consent/heartbeat)
mediamtx/ — MediaMTX configuration (RTSP/WebRTC hub)
capture/ — camera ingest (USB cam → RTSP)
yolo/ — YOLO object detection service (WS boxes + optional debug)
pose/ — YOLO pose estimation service (WS keypoints)
chang-demo/ — Chang’s demo integration (Arabic line detection pipeline)
price-estimation/ — house price estimation (training + FastAPI serving)

Folder names may vary slightly depending on how you publish images in your stack, but the roles above are stable.

How demos are launched

The UI does not directly start containers.
Instead, it talks to the orchestrator:

GET /demos/<id>/status → is it running?
POST /demos/<id>/start?wait=1 → bring it up if needed
POST /demos/<id>/stop → stop it
POST /consent?demo=<id> → mint a short-lived consent token for viewer start
POST /demos/<id>/heartbeat → keep-alive while the viewer page stays open

This keeps the UI simple, and lets you enforce:

per-demo start rules
timeouts / auto-stop policies
“consent required” gating for camera-based demos

How video reaches the browser (WHEP)

Most demos are viewed via WebRTC WHEP exposed by MediaMTX.

Typical flow:

Capture publishes RTSP to MediaMTX (raw camera)
A demo service subscribes (RTSP) and may publish an annotated stream (RTSP)
MediaMTX exposes a WHEP endpoint for the stream
The UI viewer page uses connectWhep() to play it in <video>

This gives:

very low latency
good network behavior in a lab LAN
one consistent “viewer” implementation across demos

Overlays (boxes / keypoints)

The video stream stays clean. Overlays are drawn in the browser using a <canvas> on top of the <video>:

Boxes mode: demo backend pushes detections over WebSocket
The viewer converts them to {x,y,w,h,label} and draws them on a canvas.
Pose mode: demo backend pushes keypoints over WebSocket
The viewer draws skeleton edges + points with smoothing/linger.

This is why the UI stays fast:
the browser draws lightweight geometry instead of decoding a fully annotated stream.

Chang demo (Arabic line detection)

This integration focuses on an Arabic line detection pipeline.

High level:

Model(s) are distilled/exported to ONNX for portability
Inference runs with ONNX Runtime
The pipeline is implemented in C++/Qt to stay cross-platform friendly
Vision Hub wraps it into a demo: start/stop via orchestrator + view as a stream

House Price Estimation demo (Booli)

This demo predicts Swedish home prices with uncertainty bounds.

Data: Provided by Booli (Sweden). Huge thanks to Booli for enabling this dataset access.
Model: LightGBM (median + quantile bounds)
Serving: FastAPI endpoint consumed by the UI

In the UI, this is a “tabular ML” demo (not video), but it lives in Vision Hub the same way: a project page + a runnable demo route.

Running the stack

This repo is designed to run via container orchestration (Docker Swarm in the lab, but Compose also works).

At a minimum you need:

UI
Orchestrator
MediaMTX
One demo (YOLO / Pose / Chang / Price)

Then add:

capture, if you want live camera input

Credits

Maintainer / Integrator: Tom Burellier
Chang demo contributor: Chang (Arabic line detection pipeline integration)
Data partner (price demo): Booli (Sweden)

Upstream projects are listed per-demo in the UI project pages.

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
API		API
capture		capture
chang-demo		chang-demo
chang-demo_ctw-11n-swin		chang-demo_ctw-11n-swin
mediamtx		mediamtx
old		old
pose		pose
price-estimation		price-estimation
ui-astro		ui-astro
web.old		web.old
yolo		yolo
.gitignore		.gitignore
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.swarm.yml		docker-compose.swarm.yml
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
stack.yml		stack.yml
usb-autosuspend.conf		usb-autosuspend.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Hub

What you get

High-level architecture

Repository layout (human view)

How demos are launched

How video reaches the browser (WHEP)

Overlays (boxes / keypoints)

Chang demo (Arabic line detection)

House Price Estimation demo (Booli)

Running the stack

Credits

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

LTU-Machine-Learning/research-demos

Folders and files

Latest commit

History

Repository files navigation

Vision Hub

What you get

High-level architecture

Repository layout (human view)

How demos are launched

How video reaches the browser (WHEP)

Overlays (boxes / keypoints)

Chang demo (Arabic line detection)

House Price Estimation demo (Booli)

Running the stack

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages