High availability — what happens when the broker restarts

**Current state:** Single broker instance, SQLite on disk. The signing key IS persistent — tokens already issued survive a restart. Audit trail and revocation lists are persisted to SQLite and reloaded. What's lost: challenge nonces (30s TTL) and in-memory agent records.

**The question from the community:** "Broker goes down, every agent loses its credential source. What is the HA plan?"

**The real answer today:** Agents with valid tokens keep working during a restart — they're self-contained JWTs verified against the persistent key. They just can't register NEW agents until the broker is back.

**What's needed:**
- [ ] Document the restart story clearly (what survives, what doesn't)
- [ ] Investigate PostgreSQL backend as alternative to SQLite (enables multi-instance)
- [ ] Investigate Redis for transient state (nonces, agent records) to enable shared state across instances
- [ ] Health check integration with orchestrators (Kubernetes readiness/liveness probes already work via `/v1/health`)

**Who needs this:** Any small company running agents in production where broker downtime means agents can't authenticate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High availability — what happens when the broker restarts #37

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

High availability — what happens when the broker restarts #37

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions