Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 23 additions & 23 deletions book/src/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,43 +28,43 @@ The renewal pipeline composes generically across vendors. Adding support for a n

### `CABackend`

Issues certificates from a Certificate Authority. The trait has two methods:
Issues certificates from a Certificate Authority. Two methods:

- `submit(domains, csr_pem, preferred_kinds)` — Submit a CSR. Returns one or more `DcvChallenge` values the caller must satisfy via the DCV backend. `preferred_kinds` lets the caller hint at DNS-01 vs HTTP-01 (CAs that offer a choice walk the list and pick).
- `await_issuance(domains)` — Poll until the cert is signed.
* `submit(domains, csr_pem, preferred_kinds)`. Submits a CSR. Returns one or more `DcvChallenge` values the caller must satisfy via the DCV backend. `preferred_kinds` lets the caller hint at DNS-01 vs HTTP-01; CAs that offer a choice walk the list and pick.
* `await_issuance(domains)`. Polls until the cert is signed.

Today: `NamecheapCa` (traditional reissue, DNS-01 only) and `AcmeCa` (RFC 8555: Let's Encrypt, ZeroSSL with EAB, BuyPass, any directory that speaks the spec).

### `DcvBackend`

Solves the CA's domain-control challenge. The trait:
Solves the CA's domain-control challenge.

- `supported_kinds()`Which `ChallengeKind`s the backend can satisfy (DNS-01, HTTP-01).
- `supports(challenge)` Whether this specific challenge is satisfiable. Default impl matches against `supported_kinds()`.
- `publish(challenge)` Make the response visible to the CA. Idempotent.
- `remove(challenge)` Clean up after issuance. Idempotent.
* `supported_kinds()`. Which `ChallengeKind`s the backend can satisfy: `Dns01`, `Http01`.
* `supports(challenge)`. Whether this specific challenge is satisfiable. Default impl matches against `supported_kinds()`.
* `publish(challenge)`. Make the response visible to the CA. Idempotent.
* `remove(challenge)`. Clean up after issuance. Idempotent.

`DcvChallenge` is a tagged enum:

- `Dns01 { record_name, record_value, ttl }` TXT record at `record_name`. Solvers: `NamecheapDcv`, `CloudflareDcv`.
- `Http01 { domain, token, key_authorization }` `key_authorization` body served at `http://<domain>/.well-known/acme-challenge/<token>`. Solver: `WebrootDcv` (drops the file under a directory served by an existing webserver).
* `Dns01 { record_name, record_value, ttl }`. TXT record at `record_name`. Solvers: `NamecheapDcv`, `CloudflareDcv`.
* `Http01 { domain, token, key_authorization }`. `key_authorization` body served at `http://<domain>/.well-known/acme-challenge/<token>`. Solver: `WebrootDcv` (drops the file under a directory served by an existing webserver).

### `InstallBackend`

Places the issued cert + chain where the system serving the domain can read them. Implementations may also trigger a service reload.
Places the issued cert and chain where the system serving the domain can read them. Implementations may also trigger a service reload.

- `install(cert, private_key_pem, domains)` Land the artifacts.
- `current_cert_pem(cert_id)` Read back the installed leaf cert for the scheduler's days-until-expiry calculation. Default returns `None`; backends opt in.
* `install(cert, private_key_pem, domains)`. Land the artifacts.
* `current_cert_pem(cert_id)`. Read back the installed leaf cert for the scheduler's days-until-expiry calculation. Default returns `None`; backends opt in.

Today: `DsmInstall` (Synology), `FilesystemInstall` (mode-600 key + mode-644 cert + chain + fullchain), `NginxInstall` (filesystem + reload subprocess), `HaproxyInstall` (filesystem + runtime API hot-swap), `K8sSecretInstall` (server-side-apply a `kubernetes.io/tls` Secret).

### `AlertBackend`

Daemon-wide notification sinks. Every renewal failure fans out to every configured sink.

- `dispatch(event)` Deliver. Errors are logged but never break the renewal pipeline.
* `dispatch(event)`. Deliver. Errors are logged but never break the renewal pipeline.

Today: `EmailAlert` (lettre, STARTTLS / implicit TLS / plaintext), `WebhookAlert` (generic JSON envelope POST).
Today: `EmailAlert` (lettre, STARTTLS / implicit TLS / plaintext) and `WebhookAlert` (generic JSON envelope POST).

## Renewer pipeline

Expand All @@ -73,29 +73,29 @@ For one cert, one renewal:
1. Load (or generate) the persistent private key from `key_path`.
2. Generate a CSR against that key.
3. `ca.submit()` returns DCV challenges.
4. Pre-flight check: `dcv.supports()` for each challenge; fast-fail if the configured solver can't handle what the CA returned.
4. Pre-flight check: `dcv.supports()` for each challenge. Fast-fail if the configured solver can't handle what the CA returned.
5. `dcv.publish()` every challenge.
6. `ca.await_issuance()` waits for the CA to validate and sign.
7. `dcv.remove()` cleans up — runs unconditionally, even if issuance failed, so a partial run doesn't leave stray records.
8. Persist the issued cert + chain to the audit store (for cluster cert distribution).
7. `dcv.remove()` cleans up. Runs unconditionally, even if issuance failed, so a partial run doesn't leave stray records.
8. Persist the issued cert and chain to the audit store (for cluster cert distribution).
9. `install.install()` writes locally.
10. Audit log records every step.
10. The audit log records every step.

## Scheduler

Ticks every `check_interval_seconds`. For each cert: read the install backend's `current_cert_pem`, parse `notAfter`, compare to `renew_threshold_days`, queue a renewal if due. Per-cert failure cooldown prevents a flaky CA from getting hammered every tick.
Ticks every `check_interval_seconds`. For each cert: read the install backend's `current_cert_pem`, parse `notAfter`, compare to `renew_threshold_days`, queue a renewal if due. A per-cert failure cooldown prevents a flaky CA from getting hammered every tick.

In a cluster, the scheduler's sweep is gated on `cluster.is_leader()`. Followers skip silently; the leader keeps doing the work.

## Audit store

Every renewal opens a row, appends step events, and closes with a status. Two backends:

- **`SqliteAuditStore`** (default) — single-file SQLite, no external service. Good for single-node deployments.
- **`SurrealAuditStore`** — connects to an existing SurrealDB. Required for cluster federation (the lock and cert distribution rows live in the same database).
* `SqliteAuditStore` (default). Single-file SQLite, no external service. Good for single-node deployments.
* `SurrealAuditStore`. Connects to an existing SurrealDB. Required for cluster federation: the lock and cert distribution rows live in the same database.

## Cluster

When `cluster.enabled = true` and audit is SurrealDB, the daemon runs a `SurrealClusterCoordinator` that holds a lock at `cluster_lock:singleton` with a TTL refresh. The leader's renewer pipeline writes successful issuances to `issued_cert` rows; an `InstallSyncTask` on every node polls the table and runs the local install backend with the operator-pre-provisioned private key when the audit cert is fresher than what's installed locally. Private keys are NEVER distributed through the audit store.
When `cluster.enabled = true` and audit is SurrealDB, the daemon runs a `SurrealClusterCoordinator` that holds a lock at `cluster_lock:singleton` with a TTL refresh. The leader's renewer pipeline writes successful issuances to `issued_cert` rows. An `InstallSyncTask` on every node polls the table and runs the local install backend with the operator-pre-provisioned private key when the audit cert is fresher than what's installed locally. Private keys are never distributed through the audit store.

See the [federation runbook](./federation.md) for the operator-side walkthrough.
30 changes: 14 additions & 16 deletions book/src/backends.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ What ships today, what's coming, and the design choices behind each.

### Namecheap (traditional reissue)

`kind: namecheap` with `ssl_id: <numeric SSL id>`. Uses Namecheap's `namecheap.ssl.reissue` + `namecheap.ssl.getInfo` flow. Activation is one-shot: rota only handles **reissue** within an existing SSL subscription. First-time activation requires a long list of admin-contact fields rota does not model in the config — operators activate once by hand in the Namecheap dashboard, and rota handles every renewal after that.
`kind: namecheap` with `ssl_id: <numeric SSL id>`. Uses Namecheap's `namecheap.ssl.reissue` and `namecheap.ssl.getInfo` flow. Activation is one-shot: rota only handles **reissue** within an existing SSL subscription. First-time activation requires a long list of admin-contact fields rota does not model in the config. Operators activate once by hand in the Namecheap dashboard; rota handles every renewal after that.

DNS-01 only. The Namecheap reissue API folds every SAN under one DCV record, so rota's multi-challenge trait sees a single-element vec.

### ACME (RFC 8555)

`kind: acme`. Speaks Let's Encrypt, ZeroSSL (with External Account Binding), BuyPass, and any directory that follows the spec. Uses the [`instant-acme`](https://crates.io/crates/instant-acme) crate.

rota manages its own persistent ECDSA key per cert (operators rely on key continuity for cert pinning). The ACME submit path uses `finalize_csr(csr_der)` so the operator's key stays canonical across renewals.
rota manages its own persistent ECDSA key per cert because operators rely on key continuity for cert pinning. The ACME submit path uses `finalize_csr(csr_der)` so the operator's key stays canonical across renewals.

The ACME backend walks the configured DCV solver's `supported_kinds()` to pick a challenge type per authorization. So `dcv: { kind: webroot }` automatically gets HTTP-01; `dcv: { kind: cloudflare }` automatically gets DNS-01.

Expand All @@ -24,19 +24,19 @@ The ACME backend walks the configured DCV solver's `supported_kinds()` to pick a

`kind: namecheap`. DNS-01 via `namecheap.domains.dns.{getHosts,setHosts}`.

Critical gotcha: Namecheap's `setHosts` is a **full replacement** of every record on the domain, not a per-record edit. Publishing one TXT therefore requires reading every existing record first, merging the new one in, and writing the merged set back. rota does this transparently.
Watch out: Namecheap's `setHosts` is a **full replacement** of every record on the domain, not a per-record edit. Publishing one TXT therefore requires reading every existing record first, merging the new one in, and writing the merged set back. rota does this transparently.

### Cloudflare DNS

`kind: cloudflare`. DNS-01 via Cloudflare's v4 API with Bearer-token auth. Token scopes: `Zone.DNS:Edit` on every zone rota will manage. rota does not support the legacy Global API Key.

Cloudflare's per-record edit API means rota doesn't have to read every record on the zone first. The flow is: resolve the apex zone for the record name, look for an existing TXT match (idempotency), POST the record if absent. Removal mirrors the lookup + delete shape.
Cloudflare's per-record edit API means rota doesn't have to read every record on the zone first. The flow: resolve the apex zone for the record name, look for an existing TXT match (idempotency), POST the record if absent. Removal mirrors the lookup-and-delete shape.

### Webroot (HTTP-01)

`kind: webroot` with `directory: <document root>`. rota writes the key authorization to `<directory>/.well-known/acme-challenge/<token>` (mode 644) and removes it after issuance. The operator's existing webserver (nginx, Caddy, Apache, anything that serves static files over HTTP on port 80) is responsible for actually exposing the directory.

Why webroot instead of a daemon-internal listener: most self-hosters already run a webserver on 80/443. Asking rota to bind 80 means coordinating port handoff (or running rota as root) for one purpose: serving a five-byte file the existing webserver could serve in its sleep.
Why webroot rather than a daemon-internal listener: most self-hosters already run a webserver on 80 and 443. Asking rota to bind 80 means coordinating port handoff (or running rota as root) for one purpose: serving a five-byte file the existing webserver could serve in its sleep.

Defensive against malformed challenge tokens: rota refuses path-shaped tokens (`/`, `\`, `..`, empty) so a misbehaving CA can't traverse out of the challenge directory.

Expand All @@ -48,19 +48,19 @@ Defensive against malformed challenge tokens: rota refuses path-shaped tokens (`

### Filesystem

`kind: filesystem` with `directory: <path>`. Lays the issued cert + chain + private key down under predictable filenames so any service that reads disk-based PEM (nginx, HAProxy, Caddy, custom Rust + rustls) can pick them up.
`kind: filesystem` with `directory: <path>`. Lays the issued cert, chain, and private key down under predictable filenames so any service that reads disk-based PEM (nginx, HAProxy, Caddy, custom Rust + rustls) can pick them up.

Filenames mirror the certbot convention so existing reload scripts that grep for `fullchain.pem` / `privkey.pem` work unchanged. Writes are atomic per file: each artifact goes to a sibling `.tmp`, fsync, rename.
Filenames mirror the certbot convention so existing reload scripts that grep for `fullchain.pem` and `privkey.pem` work unchanged. Writes are atomic per file: each artifact goes to a sibling `.tmp`, fsync, rename.

### nginx

`kind: nginx` with `directory: <path>` and optional `reload_command: [<argv>]`. Filesystem write + nginx reload subprocess.
`kind: nginx` with `directory: <path>` and optional `reload_command: [<argv>]`. Filesystem write plus an nginx reload subprocess.

Default reload is `["nginx", "-s", "reload"]`. Operators on systemd typically override with `["systemctl", "reload", "nginx"]` and a sudoers rule that keeps the daemon unprivileged. The reload runs without a shell wrapper, so argv entries are not interpreted (no globbing, no env interpolation). A non-zero exit surfaces as an `Install` error so the renewer records the failure on the audit log.
Default reload is `["nginx", "-s", "reload"]`. Operators on systemd typically override with `["systemctl", "reload", "nginx"]` and a sudoers rule that keeps the daemon unprivileged. The reload runs without a shell wrapper, so argv entries are not interpreted: no globbing, no env interpolation. A non-zero exit surfaces as an `Install` error so the renewer records the failure on the audit log.

### HAProxy

`kind: haproxy` with `directory:`, `socket_path:`, and `cert_storage_name:`. Filesystem write + HAProxy runtime API hot-swap.
`kind: haproxy` with `directory:`, `socket_path:`, and `cert_storage_name:`. Filesystem write plus HAProxy runtime API hot-swap.

The runtime API sequence:

Expand All @@ -71,7 +71,7 @@ EOL
commit ssl cert <storage_name>
```

No reload, no dropped TCP connections. HAProxy hands the new certificate to live SNI lookups on next handshake. Requires HAProxy 2.x or later with the admin socket exposed:
No reload, no dropped TCP connections. HAProxy hands the new certificate to live SNI lookups on the next handshake. Requires HAProxy 2.x or later with the admin socket exposed:

```text
global
Expand All @@ -84,8 +84,8 @@ global

Auth resolution:

- `kubeconfig_path` omitted: in-cluster ServiceAccount (run rotad as a Pod).
- `kubeconfig_path` set: load the named kubeconfig (run rotad outside the cluster).
* `kubeconfig_path` omitted: in-cluster ServiceAccount (run rotad as a Pod).
* `kubeconfig_path` set: load the named kubeconfig (run rotad outside the cluster).

Required RBAC on `secrets` in the target namespace: `get`, `create`, `patch`. Server-side apply with FieldManager `"rota"` so concurrent managers (cert-manager, helm, etc.) get clean conflict signaling rather than silent overwrites.

Expand All @@ -109,6 +109,4 @@ Optional Bearer token auth from a file. Per-request timeout defaults to 10 secon

## Roadmap

- More CAs: Sectigo direct, GoDaddy.
- More DNS-01 solvers: Route 53, DigitalOcean, Porkbun.
- More install targets: native HTTP-01 listener (instead of webroot), more reload integrations as operators surface needs.
More CAs: Sectigo direct, GoDaddy. More DNS-01 solvers: Route 53, DigitalOcean, Porkbun. More install targets: a native HTTP-01 listener (instead of webroot), more reload integrations as operators surface needs.
22 changes: 11 additions & 11 deletions book/src/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ audit:
password_file: /etc/rota/secrets/surreal.password
```

`endpoint` accepts `mem://`, `file://path`, `ws://`, `wss://`, `http://`, `https://`. Embedded engines (`mem://`, `file://`) skip auth; remote engines need `username` + `password_file`.
`endpoint` accepts `mem://`, `file://path`, `ws://`, `wss://`, `http://`, `https://`. Embedded engines (`mem://`, `file://`) skip auth; remote engines need `username` and `password_file`.

## CA accounts

Expand All @@ -62,7 +62,7 @@ namecheap:
client_ip: 192.0.2.1
```

`client_ip` must be on the account's whitelisted IPs in Namecheap, or the API rejects every call. Same credentials authenticate both the CA backend (reissue) and the DCV backend (DNS).
`client_ip` must be on the account's whitelisted IPs in Namecheap, or the API rejects every call. The same credentials authenticate both the CA backend (reissue) and the DCV backend (DNS).

### `cloudflare`

Expand All @@ -87,10 +87,10 @@ acme:

Common directory URLs:

- Let's Encrypt prod: `https://acme-v02.api.letsencrypt.org/directory`
- Let's Encrypt staging: `https://acme-staging-v02.api.letsencrypt.org/directory`
- ZeroSSL: `https://acme.zerossl.com/v2/DV90`
- BuyPass: `https://api.buypass.com/acme/directory`
* Let's Encrypt prod: `https://acme-v02.api.letsencrypt.org/directory`
* Let's Encrypt staging: `https://acme-staging-v02.api.letsencrypt.org/directory`
* ZeroSSL: `https://acme.zerossl.com/v2/DV90`
* BuyPass: `https://api.buypass.com/acme/directory`

`account_credentials_file` is created on first run; treat like a private key (mode 0o600).

Expand All @@ -105,7 +105,7 @@ cluster:
lease_seconds: 60 # refresh cadence is lease/3 (~20s here)
```

Requires `audit.kind: surrealdb` because the lock + cert blobs live in that database. See the [federation runbook](./federation.md) for end-to-end setup.
Requires `audit.kind: surrealdb` because the lock and cert blobs live in that database. See the [federation runbook](./federation.md) for end-to-end setup.

## `alerts`

Expand Down Expand Up @@ -183,8 +183,8 @@ install:

## Migration from earlier versions

### v0.5 v0.6
### v0.5 to v0.6

- `rota.yaml`: rename `registrar:` `dcv:` on every cert. The kind values (`namecheap`, `cloudflare`) are unchanged; only the parent field name moves.
- New optional `cluster:` block enables multi-host federation.
- Wire protocol bumped from 1 to 2 (`CertSummary.registrar_backend` `dcv_backend`). The `rota` CLI must upgrade alongside `rotad`; older clients hit a clean version-mismatch error rather than silent misparse.
* `rota.yaml`: rename `registrar:` to `dcv:` on every cert. The kind values (`namecheap`, `cloudflare`) are unchanged; only the parent field name moves.
* New optional `cluster:` block enables multi-host federation.
* Wire protocol bumped from 1 to 2 (`CertSummary.registrar_backend` becomes `dcv_backend`). The `rota` CLI must upgrade alongside `rotad`; older clients hit a clean version-mismatch error rather than silent misparse.
Loading
Loading