A multi-tenant resource monitoring platform that collects real system metrics from registered machines, persists them for historical analysis, and displays them live in a mission-control-style browser dashboard. Built as a TypeScript monorepo using npm workspaces.
Current release: v0.2.0 — historical metrics, time-range charts, threshold alerting, profile management, platform hardening · Spec · Architecture Previous: v0.1.0 · v0.1.1 · v0.1.2
overwatch-homelab/
├── apps/
│ ├── hub-server/ # Express API + Socket.IO (Prisma + JWT + Vitest)
│ ├── hub-client/ # React + Tailwind CSS dashboard (nginx)
│ └── lab-agent/ # Lightweight agent that runs on each monitored machine
├── packages/
│ └── shared-types/ # Shared Zod schemas & TypeScript interfaces
├── spec/
│ ├── architecture.md # Living architecture document
│ ├── 0.1.0/ 0.1.1/ 0.1.2/
│ └── 0.2.0/ # current release spec
├── docker-compose.yaml # Full local stack (postgres, hub-server, hub-client)
├── tsconfig.base.json
└── package.json # npm workspaces root
- Node.js ≥ 20
- Docker & Docker Compose
- npm ≥ 9
v0.2.0 requires JWT_SECRET. Copy the template once — docker-compose auto-loads .env from the repo root:
cp .env.example .env
# Edit .env — replace the placeholder with: openssl rand -hex 32
docker compose up --build| Service | URL |
|---|---|
| Dashboard | http://localhost:5174 |
| API | http://localhost:3002 |
| Database | localhost:5432 |
Open the dashboard and use the Sign up tab to create your first account. You'll be shown a one-time recovery token — save it in your password manager; it's what lets you reset the password if you forget it.
Note: The
lab-agentservice is gated behind theagentcompose profile — defaultdocker compose uponly starts postgres + hub-server + hub-client. On macOS, run the agent natively (see below). On Linux, opt in withdocker compose --profile agent up -dafter settingLAB_IDin.env.
To monitor real host hardware (required on macOS, recommended everywhere), run the agent as a native Node.js process directly on the machine you want to monitor.
Log in, click New Resource, complete the 3-step wizard (name, type, labels) and create it. You'll be taken to the Resource detail page.
The Configuration tab shows the agent panel with your LAB_ID (UUID) and HUB_URL pre-filled. Copy the .env snippet.
cp apps/lab-agent/.env.example apps/lab-agent/.env
# Edit and set LAB_ID and HUB_URLLAB_ID=<your-homelab-uuid>
HUB_URL=http://<hub-server-host>:3002
HEARTBEAT_INTERVAL_MS=15000
METRICS_INTERVAL_MS=60000npm install
npm run build --workspace=packages/shared-types
npm run build --workspace=apps/lab-agent
cd apps/lab-agent && node dist/index.jsOr for development (auto-reloads):
cd apps/lab-agent && npm run devv0.2.0 adds an exponential reconnect backoff (2 s → 30 s cap with jitter) and a structured startup banner so it's easier to see the agent's configuration at a glance.
# 1. Start just the database
docker compose up -d postgres
# 2. Copy and configure env files
cp apps/hub-server/.env.example apps/hub-server/.env # set JWT_SECRET, DATABASE_URL, CORS_ORIGIN
cp apps/lab-agent/.env.example apps/lab-agent/.env # set LAB_ID, HUB_URL
# 3. Install dependencies
npm install
# 4. Push the DB schema (v0.2.0 adds MetricSnapshot + Alert tables)
cd apps/hub-server && npx prisma db push && cd ../..
# 5. Start hub-server and hub-client in separate terminals
cd apps/hub-server && npm run dev
cd apps/hub-client && npm run dev
# 6. Run the agent natively
cd apps/lab-agent && npm run devDashboard: http://localhost:5173 | API: http://localhost:3001
npm run test --workspace=apps/hub-server # 47 Vitest unit/middleware tests (~1.1 s)
npm run typecheck # cascades across all workspaces; rebuilds shared-types firstCoverage in v0.2.0: cursor pagination, time-bucket downsampling, password policy, env validator, alert evaluator, retention pruner, socket auth middleware, recovery tokens, reset-password schema. Client + lab-agent tests are deferred to v0.3.0.
Express API with:
- JWT authentication —
POST /api/auth/register(returns one-timerecoveryToken),POST /api/auth/login,PATCH /api/auth/profile - Password reset (no email) —
POST /api/auth/reset-passwordtakes{ email, recoveryToken, newPassword }, rotates the token on success;POST /api/auth/recovery-tokenregenerates the token from an authenticated session - Resource CRUD (
/api/homelabs) — create, list (cursor-paginated), get, update, delete - Historical metrics —
GET /api/homelabs/:id/metricswithfrom/to/resolutionquery params and server-side time-bucket averaging - Alerts —
GET /api/homelabs/:id/alerts,POST /api/homelabs/:id/alerts/:alertId/acknowledge - Socket.IO — JWT-authenticated dashboard handshake; ownership-checked
dashboard:subscribe;agent:registervalidates labId exists (rejects unknown labs withhub:error UNKNOWN_LAB);lab:alert/lab:alert-resolvedbroadcasts; stale-agent pruner (60 s) - Background jobs — retention pruner (6 h) deletes snapshots older than each lab's
retentionDays - Startup env validation — fails fast with a diagnostic list if
DATABASE_URL,JWT_SECRET, orCORS_ORIGINare missing or malformed
React dashboard with:
- Auth page — Sign in / Sign up / Reset password tabs. Signup and reset each show the one-time recovery token in a dedicated modal (copy-once, warning framing).
/profilepage — change display name, change password (with current-password confirm), regenerate recovery token- Overview — Resource cards with type badge and labels
- Resource detail with three tabs:
- Monitor — summary cards (avg CPU 1h, peak mem 24h, worst disk %), time-range selector, historical line charts for CPU/memory/disk, live ring gauges + sparklines
- Alerts — filterable alert log (active/resolved/all) with acknowledge action
- Configuration — agent config panel + alert thresholds & retention settings
- Global alert banner — dismissable toast when a new alert fires on any owned resource
- Sidebar badges — red pip on any resource with active alerts
- TanStack Query for data fetching and cache management
- Socket.IO client presents a JWT on handshake
- Tailwind CSS, Lucide icons,
recharts
Lightweight service that runs natively on each monitored machine:
- Connects to
hub-servervia Socket.IO with exponential reconnect backoff (2 s → 30 s cap, ~25 % jitter) - Registers with a
LAB_ID(UUID of an existing HomeLab) - Sends heartbeats every 15 s (configurable per-lab)
- Collects and pushes system metrics every 60 s (configurable per-lab) using
systeminformation:- CPU (model, cores, usage %, temperature)
- Memory (RAM + swap: total, used, free, available)
- Filesystems (per-mount: size, used, type)
- Network interfaces (IP, MAC, operstate, speed)
- OS info (platform, distro, hostname, arch)
- Uptime
- Prints a structured startup banner with
HUB_URL,LAB_ID, and interval settings
macOS / Docker note: Docker on macOS runs inside a Linux VM. Running the agent inside Docker on macOS will report VM specs, not real host hardware. Run the agent natively for accurate metrics on macOS (and on Linux for full
/procaccess).
Shared Zod schemas and TypeScript types for:
User,HomeLab(Resource) models,ResourceTypeenumLabMetrics,MetricPoint,MetricsRangeResponseAlert,AlertMetric,AlertThresholdsCreateUserSchema,PasswordPolicySchema,UpdateProfileSchema,ResetPasswordSchemaCursorPageSchema,PaginationQuerySchema- Agent ↔ Hub Socket.IO event payloads
- Generic API response wrappers
Exports both ESM and CJS via dual exports in package.json.
User
├── id, email, name, password
├── recoveryTokenHash? (bcrypt hash of 64-char hex recovery token)
└── homelabs → HomeLab[]
HomeLab (Resource)
├── id, name, description, ownerId
├── resourceType (HOMELAB | SERVER | PC), labels (string[])
├── agentHubUrl, heartbeatIntervalMs, metricsIntervalMs
├── retentionDays (default 30)
├── alertThresholds? { cpuPercent, memPercent, diskPercent, consecutiveBreaches }
├── owner → User
├── snapshots → MetricSnapshot[]
└── alerts → Alert[]
MetricSnapshot (v0.2.0)
├── id, labId, recordedAt
├── cpuPercent, memTotalBytes, memActiveBytes
├── diskSnapshots (JSON), rawPayload (JSON)
└── @@index([labId, recordedAt DESC])
Alert (v0.2.0)
├── id, labId, metric (cpu|memory|disk)
├── threshold, peakValue
├── firedAt, resolvedAt, acknowledgedAt
└── @@index([labId, firedAt DESC])
Metrics are persisted and pruned per-lab every 6 h using the configured retentionDays.
| Version | Theme | Status |
|---|---|---|
| v0.1.0 | Core platform: auth, resource management, live metrics, mission control UI | ✅ Released |
| v0.1.1 | macOS seamless agent launch + collapsible sidebar | ✅ Released |
| v0.1.2 | In-app Help Center with markdown rendering | ✅ Released |
| v0.2.0 | Historical metrics, time-range charts, threshold alerting, profile management, hardening | ✅ Released |
| v0.3.0+ | Email/webhook alert delivery, multi-user roles, resource sharing, TLS hardening, mobile layout | 🔮 Future |
MIT