From f0858c568ef1a0eea6d01a81dc32376d5eb5e9bc Mon Sep 17 00:00:00 2001 From: Shon Thomas Date: Fri, 8 May 2026 10:21:31 -0800 Subject: [PATCH] docs: strip em/en dashes and tighten AI-tell prose User flag: docs were leaning AI-generated, which is a turn-off for operators in 2026. Pass through every chapter to: * Remove every em-dash (U+2014) and en-dash (U+2013). Bullet-list definitions go from "name() -- desc" to "name(). desc." Sentence asides go from "X -- Y" to either "X. Y." or "X (Y)". * Cut hedging adjectives (comprehensive, robust, seamless, etc.). * Drop section transitions ("Now let's look at...", "Here's..."). * Replace triplet bullet rhythms with prose where the list pattern was filler. * Match the README's voice: terse, opinionated, period-heavy. No content changes. Same chapters, same examples, same migration notes. Just voice and punctuation. --- book/src/architecture.md | 46 ++++++++++++++++++------------------- book/src/backends.md | 30 +++++++++++------------- book/src/configuration.md | 22 +++++++++--------- book/src/federation.md | 37 ++++++++++++++--------------- book/src/getting-started.md | 25 ++++++++------------ book/src/introduction.md | 19 ++++----------- 6 files changed, 82 insertions(+), 97 deletions(-) diff --git a/book/src/architecture.md b/book/src/architecture.md index bb84c32..e7ab4aa 100644 --- a/book/src/architecture.md +++ b/book/src/architecture.md @@ -28,33 +28,33 @@ The renewal pipeline composes generically across vendors. Adding support for a n ### `CABackend` -Issues certificates from a Certificate Authority. The trait has two methods: +Issues certificates from a Certificate Authority. Two methods: -- `submit(domains, csr_pem, preferred_kinds)` — Submit a CSR. Returns one or more `DcvChallenge` values the caller must satisfy via the DCV backend. `preferred_kinds` lets the caller hint at DNS-01 vs HTTP-01 (CAs that offer a choice walk the list and pick). -- `await_issuance(domains)` — Poll until the cert is signed. +* `submit(domains, csr_pem, preferred_kinds)`. Submits a CSR. Returns one or more `DcvChallenge` values the caller must satisfy via the DCV backend. `preferred_kinds` lets the caller hint at DNS-01 vs HTTP-01; CAs that offer a choice walk the list and pick. +* `await_issuance(domains)`. Polls until the cert is signed. Today: `NamecheapCa` (traditional reissue, DNS-01 only) and `AcmeCa` (RFC 8555: Let's Encrypt, ZeroSSL with EAB, BuyPass, any directory that speaks the spec). ### `DcvBackend` -Solves the CA's domain-control challenge. The trait: +Solves the CA's domain-control challenge. -- `supported_kinds()` — Which `ChallengeKind`s the backend can satisfy (DNS-01, HTTP-01). -- `supports(challenge)` — Whether this specific challenge is satisfiable. Default impl matches against `supported_kinds()`. -- `publish(challenge)` — Make the response visible to the CA. Idempotent. -- `remove(challenge)` — Clean up after issuance. Idempotent. +* `supported_kinds()`. Which `ChallengeKind`s the backend can satisfy: `Dns01`, `Http01`. +* `supports(challenge)`. Whether this specific challenge is satisfiable. Default impl matches against `supported_kinds()`. +* `publish(challenge)`. Make the response visible to the CA. Idempotent. +* `remove(challenge)`. Clean up after issuance. Idempotent. `DcvChallenge` is a tagged enum: -- `Dns01 { record_name, record_value, ttl }` — TXT record at `record_name`. Solvers: `NamecheapDcv`, `CloudflareDcv`. -- `Http01 { domain, token, key_authorization }` — `key_authorization` body served at `http:///.well-known/acme-challenge/`. Solver: `WebrootDcv` (drops the file under a directory served by an existing webserver). +* `Dns01 { record_name, record_value, ttl }`. TXT record at `record_name`. Solvers: `NamecheapDcv`, `CloudflareDcv`. +* `Http01 { domain, token, key_authorization }`. `key_authorization` body served at `http:///.well-known/acme-challenge/`. Solver: `WebrootDcv` (drops the file under a directory served by an existing webserver). ### `InstallBackend` -Places the issued cert + chain where the system serving the domain can read them. Implementations may also trigger a service reload. +Places the issued cert and chain where the system serving the domain can read them. Implementations may also trigger a service reload. -- `install(cert, private_key_pem, domains)` — Land the artifacts. -- `current_cert_pem(cert_id)` — Read back the installed leaf cert for the scheduler's days-until-expiry calculation. Default returns `None`; backends opt in. +* `install(cert, private_key_pem, domains)`. Land the artifacts. +* `current_cert_pem(cert_id)`. Read back the installed leaf cert for the scheduler's days-until-expiry calculation. Default returns `None`; backends opt in. Today: `DsmInstall` (Synology), `FilesystemInstall` (mode-600 key + mode-644 cert + chain + fullchain), `NginxInstall` (filesystem + reload subprocess), `HaproxyInstall` (filesystem + runtime API hot-swap), `K8sSecretInstall` (server-side-apply a `kubernetes.io/tls` Secret). @@ -62,9 +62,9 @@ Today: `DsmInstall` (Synology), `FilesystemInstall` (mode-600 key + mode-644 cer Daemon-wide notification sinks. Every renewal failure fans out to every configured sink. -- `dispatch(event)` — Deliver. Errors are logged but never break the renewal pipeline. +* `dispatch(event)`. Deliver. Errors are logged but never break the renewal pipeline. -Today: `EmailAlert` (lettre, STARTTLS / implicit TLS / plaintext), `WebhookAlert` (generic JSON envelope POST). +Today: `EmailAlert` (lettre, STARTTLS / implicit TLS / plaintext) and `WebhookAlert` (generic JSON envelope POST). ## Renewer pipeline @@ -73,17 +73,17 @@ For one cert, one renewal: 1. Load (or generate) the persistent private key from `key_path`. 2. Generate a CSR against that key. 3. `ca.submit()` returns DCV challenges. -4. Pre-flight check: `dcv.supports()` for each challenge; fast-fail if the configured solver can't handle what the CA returned. +4. Pre-flight check: `dcv.supports()` for each challenge. Fast-fail if the configured solver can't handle what the CA returned. 5. `dcv.publish()` every challenge. 6. `ca.await_issuance()` waits for the CA to validate and sign. -7. `dcv.remove()` cleans up — runs unconditionally, even if issuance failed, so a partial run doesn't leave stray records. -8. Persist the issued cert + chain to the audit store (for cluster cert distribution). +7. `dcv.remove()` cleans up. Runs unconditionally, even if issuance failed, so a partial run doesn't leave stray records. +8. Persist the issued cert and chain to the audit store (for cluster cert distribution). 9. `install.install()` writes locally. -10. Audit log records every step. +10. The audit log records every step. ## Scheduler -Ticks every `check_interval_seconds`. For each cert: read the install backend's `current_cert_pem`, parse `notAfter`, compare to `renew_threshold_days`, queue a renewal if due. Per-cert failure cooldown prevents a flaky CA from getting hammered every tick. +Ticks every `check_interval_seconds`. For each cert: read the install backend's `current_cert_pem`, parse `notAfter`, compare to `renew_threshold_days`, queue a renewal if due. A per-cert failure cooldown prevents a flaky CA from getting hammered every tick. In a cluster, the scheduler's sweep is gated on `cluster.is_leader()`. Followers skip silently; the leader keeps doing the work. @@ -91,11 +91,11 @@ In a cluster, the scheduler's sweep is gated on `cluster.is_leader()`. Followers Every renewal opens a row, appends step events, and closes with a status. Two backends: -- **`SqliteAuditStore`** (default) — single-file SQLite, no external service. Good for single-node deployments. -- **`SurrealAuditStore`** — connects to an existing SurrealDB. Required for cluster federation (the lock and cert distribution rows live in the same database). +* `SqliteAuditStore` (default). Single-file SQLite, no external service. Good for single-node deployments. +* `SurrealAuditStore`. Connects to an existing SurrealDB. Required for cluster federation: the lock and cert distribution rows live in the same database. ## Cluster -When `cluster.enabled = true` and audit is SurrealDB, the daemon runs a `SurrealClusterCoordinator` that holds a lock at `cluster_lock:singleton` with a TTL refresh. The leader's renewer pipeline writes successful issuances to `issued_cert` rows; an `InstallSyncTask` on every node polls the table and runs the local install backend with the operator-pre-provisioned private key when the audit cert is fresher than what's installed locally. Private keys are NEVER distributed through the audit store. +When `cluster.enabled = true` and audit is SurrealDB, the daemon runs a `SurrealClusterCoordinator` that holds a lock at `cluster_lock:singleton` with a TTL refresh. The leader's renewer pipeline writes successful issuances to `issued_cert` rows. An `InstallSyncTask` on every node polls the table and runs the local install backend with the operator-pre-provisioned private key when the audit cert is fresher than what's installed locally. Private keys are never distributed through the audit store. See the [federation runbook](./federation.md) for the operator-side walkthrough. diff --git a/book/src/backends.md b/book/src/backends.md index d48573e..eff9657 100644 --- a/book/src/backends.md +++ b/book/src/backends.md @@ -6,7 +6,7 @@ What ships today, what's coming, and the design choices behind each. ### Namecheap (traditional reissue) -`kind: namecheap` with `ssl_id: `. Uses Namecheap's `namecheap.ssl.reissue` + `namecheap.ssl.getInfo` flow. Activation is one-shot: rota only handles **reissue** within an existing SSL subscription. First-time activation requires a long list of admin-contact fields rota does not model in the config — operators activate once by hand in the Namecheap dashboard, and rota handles every renewal after that. +`kind: namecheap` with `ssl_id: `. Uses Namecheap's `namecheap.ssl.reissue` and `namecheap.ssl.getInfo` flow. Activation is one-shot: rota only handles **reissue** within an existing SSL subscription. First-time activation requires a long list of admin-contact fields rota does not model in the config. Operators activate once by hand in the Namecheap dashboard; rota handles every renewal after that. DNS-01 only. The Namecheap reissue API folds every SAN under one DCV record, so rota's multi-challenge trait sees a single-element vec. @@ -14,7 +14,7 @@ DNS-01 only. The Namecheap reissue API folds every SAN under one DCV record, so `kind: acme`. Speaks Let's Encrypt, ZeroSSL (with External Account Binding), BuyPass, and any directory that follows the spec. Uses the [`instant-acme`](https://crates.io/crates/instant-acme) crate. -rota manages its own persistent ECDSA key per cert (operators rely on key continuity for cert pinning). The ACME submit path uses `finalize_csr(csr_der)` so the operator's key stays canonical across renewals. +rota manages its own persistent ECDSA key per cert because operators rely on key continuity for cert pinning. The ACME submit path uses `finalize_csr(csr_der)` so the operator's key stays canonical across renewals. The ACME backend walks the configured DCV solver's `supported_kinds()` to pick a challenge type per authorization. So `dcv: { kind: webroot }` automatically gets HTTP-01; `dcv: { kind: cloudflare }` automatically gets DNS-01. @@ -24,19 +24,19 @@ The ACME backend walks the configured DCV solver's `supported_kinds()` to pick a `kind: namecheap`. DNS-01 via `namecheap.domains.dns.{getHosts,setHosts}`. -Critical gotcha: Namecheap's `setHosts` is a **full replacement** of every record on the domain, not a per-record edit. Publishing one TXT therefore requires reading every existing record first, merging the new one in, and writing the merged set back. rota does this transparently. +Watch out: Namecheap's `setHosts` is a **full replacement** of every record on the domain, not a per-record edit. Publishing one TXT therefore requires reading every existing record first, merging the new one in, and writing the merged set back. rota does this transparently. ### Cloudflare DNS `kind: cloudflare`. DNS-01 via Cloudflare's v4 API with Bearer-token auth. Token scopes: `Zone.DNS:Edit` on every zone rota will manage. rota does not support the legacy Global API Key. -Cloudflare's per-record edit API means rota doesn't have to read every record on the zone first. The flow is: resolve the apex zone for the record name, look for an existing TXT match (idempotency), POST the record if absent. Removal mirrors the lookup + delete shape. +Cloudflare's per-record edit API means rota doesn't have to read every record on the zone first. The flow: resolve the apex zone for the record name, look for an existing TXT match (idempotency), POST the record if absent. Removal mirrors the lookup-and-delete shape. ### Webroot (HTTP-01) `kind: webroot` with `directory: `. rota writes the key authorization to `/.well-known/acme-challenge/` (mode 644) and removes it after issuance. The operator's existing webserver (nginx, Caddy, Apache, anything that serves static files over HTTP on port 80) is responsible for actually exposing the directory. -Why webroot instead of a daemon-internal listener: most self-hosters already run a webserver on 80/443. Asking rota to bind 80 means coordinating port handoff (or running rota as root) for one purpose: serving a five-byte file the existing webserver could serve in its sleep. +Why webroot rather than a daemon-internal listener: most self-hosters already run a webserver on 80 and 443. Asking rota to bind 80 means coordinating port handoff (or running rota as root) for one purpose: serving a five-byte file the existing webserver could serve in its sleep. Defensive against malformed challenge tokens: rota refuses path-shaped tokens (`/`, `\`, `..`, empty) so a misbehaving CA can't traverse out of the challenge directory. @@ -48,19 +48,19 @@ Defensive against malformed challenge tokens: rota refuses path-shaped tokens (` ### Filesystem -`kind: filesystem` with `directory: `. Lays the issued cert + chain + private key down under predictable filenames so any service that reads disk-based PEM (nginx, HAProxy, Caddy, custom Rust + rustls) can pick them up. +`kind: filesystem` with `directory: `. Lays the issued cert, chain, and private key down under predictable filenames so any service that reads disk-based PEM (nginx, HAProxy, Caddy, custom Rust + rustls) can pick them up. -Filenames mirror the certbot convention so existing reload scripts that grep for `fullchain.pem` / `privkey.pem` work unchanged. Writes are atomic per file: each artifact goes to a sibling `.tmp`, fsync, rename. +Filenames mirror the certbot convention so existing reload scripts that grep for `fullchain.pem` and `privkey.pem` work unchanged. Writes are atomic per file: each artifact goes to a sibling `.tmp`, fsync, rename. ### nginx -`kind: nginx` with `directory: ` and optional `reload_command: []`. Filesystem write + nginx reload subprocess. +`kind: nginx` with `directory: ` and optional `reload_command: []`. Filesystem write plus an nginx reload subprocess. -Default reload is `["nginx", "-s", "reload"]`. Operators on systemd typically override with `["systemctl", "reload", "nginx"]` and a sudoers rule that keeps the daemon unprivileged. The reload runs without a shell wrapper, so argv entries are not interpreted (no globbing, no env interpolation). A non-zero exit surfaces as an `Install` error so the renewer records the failure on the audit log. +Default reload is `["nginx", "-s", "reload"]`. Operators on systemd typically override with `["systemctl", "reload", "nginx"]` and a sudoers rule that keeps the daemon unprivileged. The reload runs without a shell wrapper, so argv entries are not interpreted: no globbing, no env interpolation. A non-zero exit surfaces as an `Install` error so the renewer records the failure on the audit log. ### HAProxy -`kind: haproxy` with `directory:`, `socket_path:`, and `cert_storage_name:`. Filesystem write + HAProxy runtime API hot-swap. +`kind: haproxy` with `directory:`, `socket_path:`, and `cert_storage_name:`. Filesystem write plus HAProxy runtime API hot-swap. The runtime API sequence: @@ -71,7 +71,7 @@ EOL commit ssl cert ``` -No reload, no dropped TCP connections. HAProxy hands the new certificate to live SNI lookups on next handshake. Requires HAProxy 2.x or later with the admin socket exposed: +No reload, no dropped TCP connections. HAProxy hands the new certificate to live SNI lookups on the next handshake. Requires HAProxy 2.x or later with the admin socket exposed: ```text global @@ -84,8 +84,8 @@ global Auth resolution: -- `kubeconfig_path` omitted: in-cluster ServiceAccount (run rotad as a Pod). -- `kubeconfig_path` set: load the named kubeconfig (run rotad outside the cluster). +* `kubeconfig_path` omitted: in-cluster ServiceAccount (run rotad as a Pod). +* `kubeconfig_path` set: load the named kubeconfig (run rotad outside the cluster). Required RBAC on `secrets` in the target namespace: `get`, `create`, `patch`. Server-side apply with FieldManager `"rota"` so concurrent managers (cert-manager, helm, etc.) get clean conflict signaling rather than silent overwrites. @@ -109,6 +109,4 @@ Optional Bearer token auth from a file. Per-request timeout defaults to 10 secon ## Roadmap -- More CAs: Sectigo direct, GoDaddy. -- More DNS-01 solvers: Route 53, DigitalOcean, Porkbun. -- More install targets: native HTTP-01 listener (instead of webroot), more reload integrations as operators surface needs. +More CAs: Sectigo direct, GoDaddy. More DNS-01 solvers: Route 53, DigitalOcean, Porkbun. More install targets: a native HTTP-01 listener (instead of webroot), more reload integrations as operators surface needs. diff --git a/book/src/configuration.md b/book/src/configuration.md index 1faa152..bc655eb 100644 --- a/book/src/configuration.md +++ b/book/src/configuration.md @@ -48,7 +48,7 @@ audit: password_file: /etc/rota/secrets/surreal.password ``` -`endpoint` accepts `mem://`, `file://path`, `ws://`, `wss://`, `http://`, `https://`. Embedded engines (`mem://`, `file://`) skip auth; remote engines need `username` + `password_file`. +`endpoint` accepts `mem://`, `file://path`, `ws://`, `wss://`, `http://`, `https://`. Embedded engines (`mem://`, `file://`) skip auth; remote engines need `username` and `password_file`. ## CA accounts @@ -62,7 +62,7 @@ namecheap: client_ip: 192.0.2.1 ``` -`client_ip` must be on the account's whitelisted IPs in Namecheap, or the API rejects every call. Same credentials authenticate both the CA backend (reissue) and the DCV backend (DNS). +`client_ip` must be on the account's whitelisted IPs in Namecheap, or the API rejects every call. The same credentials authenticate both the CA backend (reissue) and the DCV backend (DNS). ### `cloudflare` @@ -87,10 +87,10 @@ acme: Common directory URLs: -- Let's Encrypt prod: `https://acme-v02.api.letsencrypt.org/directory` -- Let's Encrypt staging: `https://acme-staging-v02.api.letsencrypt.org/directory` -- ZeroSSL: `https://acme.zerossl.com/v2/DV90` -- BuyPass: `https://api.buypass.com/acme/directory` +* Let's Encrypt prod: `https://acme-v02.api.letsencrypt.org/directory` +* Let's Encrypt staging: `https://acme-staging-v02.api.letsencrypt.org/directory` +* ZeroSSL: `https://acme.zerossl.com/v2/DV90` +* BuyPass: `https://api.buypass.com/acme/directory` `account_credentials_file` is created on first run; treat like a private key (mode 0o600). @@ -105,7 +105,7 @@ cluster: lease_seconds: 60 # refresh cadence is lease/3 (~20s here) ``` -Requires `audit.kind: surrealdb` because the lock + cert blobs live in that database. See the [federation runbook](./federation.md) for end-to-end setup. +Requires `audit.kind: surrealdb` because the lock and cert blobs live in that database. See the [federation runbook](./federation.md) for end-to-end setup. ## `alerts` @@ -183,8 +183,8 @@ install: ## Migration from earlier versions -### v0.5 → v0.6 +### v0.5 to v0.6 -- `rota.yaml`: rename `registrar:` → `dcv:` on every cert. The kind values (`namecheap`, `cloudflare`) are unchanged; only the parent field name moves. -- New optional `cluster:` block enables multi-host federation. -- Wire protocol bumped from 1 to 2 (`CertSummary.registrar_backend` → `dcv_backend`). The `rota` CLI must upgrade alongside `rotad`; older clients hit a clean version-mismatch error rather than silent misparse. +* `rota.yaml`: rename `registrar:` to `dcv:` on every cert. The kind values (`namecheap`, `cloudflare`) are unchanged; only the parent field name moves. +* New optional `cluster:` block enables multi-host federation. +* Wire protocol bumped from 1 to 2 (`CertSummary.registrar_backend` becomes `dcv_backend`). The `rota` CLI must upgrade alongside `rotad`; older clients hit a clean version-mismatch error rather than silent misparse. diff --git a/book/src/federation.md b/book/src/federation.md index 272dc7d..1fd3f73 100644 --- a/book/src/federation.md +++ b/book/src/federation.md @@ -6,7 +6,7 @@ Multiple `rotad` instances pointing at the same SurrealDB elect a single leader Two operator-side use cases: -1. **High-availability renewer.** A single `rotad` is a single point of failure: if its host goes down within `renew_threshold_days` of a cert's `notAfter`, the cert lapses. A two-node cluster with leader election keeps renewal pulled forward through host failures. +1. **High-availability renewer.** A single `rotad` is a single point of failure. If its host goes down within `renew_threshold_days` of a cert's `notAfter`, the cert lapses. A two-node cluster with leader election keeps renewal pulled forward through host failures. 2. **Multi-host install.** A cert that fronts multiple machines (load balancers, service mesh ingress, redundant API servers) needs to land on each host. With federation, one node renews and every node installs locally. ## Architecture @@ -35,16 +35,15 @@ Two operator-side use cases: ↓ install (local) ``` -- All nodes share one SurrealDB (or a SurrealDB cluster — rota doesn't care which). -- One node holds the lock at `cluster_lock:singleton` and runs the renewal scheduler. Others have their schedulers gated on `is_leader()` and skip silently. -- The leader's renewer pipeline persists every successful issuance to the `issued_cert` table. -- Every node (including the leader, but the leader's `install_sync` self-suppresses) runs an `InstallSyncTask` that polls `issued_cert` and runs the local `InstallBackend` when the audit cert is fresher than what's installed locally. +All nodes share one SurrealDB. (rota doesn't care whether that's a single instance or a SurrealDB cluster.) One node holds the lock at `cluster_lock:singleton` and runs the renewal scheduler; the others have their schedulers gated on `is_leader()` and skip silently. + +The leader's renewer pipeline persists every successful issuance to the `issued_cert` table. Every node (including the leader, but the leader's `install_sync` self-suppresses) runs an `InstallSyncTask` that polls `issued_cert` and runs the local `InstallBackend` when the audit cert is fresher than what's installed locally. ## Trust model -- The audit store carries cert PEM + chain PEM. **Private keys are never written to the audit store.** -- Each cluster member's `key_path` private key is provisioned out-of-band — config-management, secrets manager, manual scp, whatever the operator already uses for sensitive material. -- The shared SurrealDB is in the trust boundary for cert metadata + renewal history but not for key material. If the database is compromised, an attacker can read which certs exist and when they were renewed; they cannot forge requests against the CA or impersonate any host. +The audit store carries cert PEM and chain PEM. Private keys are never written to the audit store. + +Each cluster member's `key_path` private key is provisioned out-of-band: config-management, secrets manager, manual scp, whatever the operator already uses for sensitive material. The shared SurrealDB is in the trust boundary for cert metadata and renewal history but not for key material. If the database is compromised, an attacker can read which certs exist and when they were renewed; they cannot forge requests against the CA or impersonate any host. ## Setup @@ -56,7 +55,7 @@ Operators who already run SurrealDB skip ahead. Otherwise the simplest is one `s surreal start --user root --pass file:///var/lib/surrealdb ``` -Then create the namespace + database for rota: +Then create the namespace and database for rota: ```bash surreal sql --user root --pass --ns rota --db prod @@ -74,11 +73,11 @@ install -d -m 0700 /var/lib/rota/keys install -m 0600 example.com.key /var/lib/rota/keys/example.com.key ``` -The keys must be byte-identical across nodes; rota uses one key per cert (no per-node keys) so the cert validates against any node's TLS handshake. +The keys must be byte-identical across nodes. rota uses one key per cert (no per-node keys) so the cert validates against any node's TLS handshake. ### 3. Configure each node -Each `rota.yaml` is the same except for the `cluster.node_id`: +Each `rota.yaml` is the same except for `cluster.node_id`: ```yaml daemon: @@ -131,8 +130,8 @@ rota status Each node shows the same cert table (it's pulled from the shared audit). `rotad`'s logs differentiate: ```text -INFO cluster: acquired leader lock ← leader -INFO cluster: still follower ← followers +INFO cluster: acquired leader lock # leader +INFO cluster: still follower # followers ``` A direct query against SurrealDB: @@ -157,14 +156,16 @@ shows the fresh cert blob. Each follower's `install_sync` task picks it up on it When the leader dies (host crash, kernel oom, network partition), its lease lapses after `lease_seconds` (default 60s). The next polling follower acquires the lock and becomes the new leader; renewals pick back up automatically. No operator intervention needed. -If a leader recovers from a transient failure and re-acquires the lock, no harm: the `record_issued_cert` writes are append-only, and `latest_issued_cert` is monotonic by `issued_at`. +If a leader recovers from a transient failure and re-acquires the lock, no harm done: the `record_issued_cert` writes are append-only, and `latest_issued_cert` is monotonic by `issued_at`. ## Failure modes worth knowing -- **SurrealDB unreachable from the leader.** The lease loop logs lock-check failures and demotes defensively. Followers see no leader; on their next sweep one of them tries to acquire and may succeed (if their network sees SurrealDB) or also fail. Renewals pause until SurrealDB is reachable from at least one node. -- **Private key drift across nodes.** If the per-node `key_path` differs, follower installs will succeed locally but the served cert won't match any other node's chain. Audit this with a cross-node `openssl x509 -in` + `openssl rsa -in` modulus comparison. -- **Cert distribution lag.** Followers poll on `check_interval_seconds`. With the default 1h, a follower can be up to 1h behind the leader's renewal. Tune the interval down if you need tighter sync (the cost is more SurrealDB traffic, but it's a single SELECT per cert per tick). +**SurrealDB unreachable from the leader.** The lease loop logs lock-check failures and demotes defensively. Followers see no leader; on their next sweep one of them tries to acquire and may succeed (if their network sees SurrealDB) or also fail. Renewals pause until SurrealDB is reachable from at least one node. + +**Private key drift across nodes.** If the per-node `key_path` differs, follower installs will succeed locally but the served cert won't match any other node's chain. Audit this with a cross-node `openssl x509 -in` and `openssl rsa -in` modulus comparison. + +**Cert distribution lag.** Followers poll on `check_interval_seconds`. With the default 1h, a follower can be up to 1h behind the leader's renewal. Tune the interval down if you need tighter sync (the cost is more SurrealDB traffic, but it's a single SELECT per cert per tick). ## Rolling back to single-node -Set `cluster.enabled: false` (or remove the block entirely) on the surviving node and restart it. The leader lock will lapse; no other node tries to acquire. The audit store retains its history; just point the surviving node at SQLite instead of SurrealDB if you want to fully decouple. +Set `cluster.enabled: false` (or remove the block entirely) on the surviving node and restart it. The leader lock will lapse; no other node tries to acquire. The audit store retains its history. Point the surviving node at SQLite instead of SurrealDB if you want to fully decouple. diff --git a/book/src/getting-started.md b/book/src/getting-started.md index 7864dd7..d7dc999 100644 --- a/book/src/getting-started.md +++ b/book/src/getting-started.md @@ -1,6 +1,6 @@ # Getting started -This page walks through standing up a single-node rota that renews one cert against Let's Encrypt via DNS-01, with the cert installed to the local filesystem. +Stand up a single-node rota that renews one cert against Let's Encrypt via DNS-01, with the cert installed to the local filesystem. ## Install @@ -8,11 +8,11 @@ This page walks through standing up a single-node rota that renews one cert agai cargo install rota ``` -This builds two binaries: `rotad` (the daemon) and `rota` (the CLI client that talks to the daemon over a UNIX socket). +That builds two binaries: `rotad` (the daemon) and `rota` (the CLI client that talks to the daemon over a UNIX socket). ## Minimal config -Create `/etc/rota/rota.yaml`: +Drop this at `/etc/rota/rota.yaml`: ```yaml daemon: @@ -48,7 +48,7 @@ Three things to provision before starting `rotad`: 1. **Cloudflare API token** at `/etc/rota/secrets/cloudflare.token`. Scope: `Zone.DNS:Edit` on every zone rota will publish DCV records in. rota only supports tokens, not the legacy Global API Key. 2. **ACME account credentials file**. The first run creates this automatically; just make sure the parent directory is writable. -3. **Private key directory** (`/var/lib/rota/keys/`) with `0700` mode. The first run also generates the per-cert key automatically; rota reuses the same key on every renewal so cert-pinning operators don't break. +3. **Private key directory** (`/var/lib/rota/keys/`) with `0700` mode. The first run also generates the per-cert key automatically. rota reuses the same key on every renewal so cert-pinning operators don't break. ## First run @@ -56,12 +56,7 @@ Three things to provision before starting `rotad`: rotad --config /etc/rota/rota.yaml ``` -The daemon will: - -1. Open the audit DB (SQLite by default). -2. Connect to the configured CAs, registrars / DCV solvers, and install backends. -3. Sweep every cert on the configured `check_interval_seconds`. The first sweep happens after one full interval, not immediately, so the daemon doesn't hammer the CA on startup. -4. Renew any cert whose installed copy is `< renew_threshold_days` from `notAfter`. +The daemon opens the audit DB (SQLite by default), connects to the configured CAs, registrars / DCV solvers, and install backends, then sweeps every cert on the configured `check_interval_seconds`. The first sweep happens after one full interval, not immediately, so the daemon doesn't hammer the CA on startup. Any cert whose installed copy is closer to `notAfter` than `renew_threshold_days` gets renewed. ## Talking to the daemon @@ -75,7 +70,7 @@ Prints a one-line summary per cert: id, domains, days until expiry, last renewal rota renew example-public ``` -Force a renewal regardless of expiry. Useful when you've just rotated DNS and want to confirm the pipeline end-to-end. +Force a renewal regardless of expiry. Useful when you've just rotated DNS and want to confirm the pipeline end to end. ```bash rota log example-public @@ -85,7 +80,7 @@ Print the most recent renewal's audit trail (CSR generated, CA submitted, DCV pu ## Where to go next -- [Architecture overview](./architecture.md) — the four trait surfaces and how they compose. -- [Configuration reference](./configuration.md) — every field of `rota.yaml`. -- [Backends](./backends.md) — what ships today and what's coming. -- [Federation runbook](./federation.md) — running multiple `rotad` instances with shared state. +* [Architecture overview](./architecture.md). Four trait surfaces and how they compose. +* [Configuration reference](./configuration.md). Every field of `rota.yaml`. +* [Backends](./backends.md). What ships today and what's coming. +* [Federation runbook](./federation.md). Running multiple `rotad` instances with shared state. diff --git a/book/src/introduction.md b/book/src/introduction.md index a1d6406..917ff90 100644 --- a/book/src/introduction.md +++ b/book/src/introduction.md @@ -27,24 +27,15 @@ I run my own stuff and I'd like to keep doing that. So I'm writing the tool I'd ## What it does -- Watches your CA-issued certs and knows when they're close to expiry. -- Generates fresh CSRs against persistent private keys you control. -- Submits reissue or renewal requests to the CA over that CA's API. -- Completes domain-control validation by writing TXT records at your registrar (DNS-01) or dropping a token under `.well-known/acme-challenge/` for an existing webserver to serve (HTTP-01). -- Installs issued certs where they need to land. Today: Synology DSM, plain filesystem, nginx reload, HAProxy runtime API hot-swap, Kubernetes Secret. -- Logs every step. Surfaces a real-time dashboard. Alerts before failures, not after. -- Optionally federates across multiple `rotad` instances: one node renews, peers pick up the cert from a shared SurrealDB and install locally. +Watches your CA-issued certs and knows when they're close to expiry. Generates fresh CSRs against persistent private keys you control. Submits reissue or renewal requests to the CA over that CA's API. Completes domain-control validation by writing TXT records at your registrar (DNS-01) or dropping a token under `.well-known/acme-challenge/` for an existing webserver to serve (HTTP-01). Installs issued certs where they need to land: Synology DSM, plain filesystem, nginx reload, HAProxy runtime API hot-swap, Kubernetes Secret. Logs every step. Surfaces a real-time dashboard. Alerts before failures, not after. -## Who this is for +Optionally federates across multiple `rotad` instances: one node renews, peers pick up the cert from a shared SurrealDB and install locally. -Operators who: +## Who this is for -- Run their own webservers, mail servers, dashboards, hobby boxes, homelabs. -- Want renewal automation without surrendering DNS or HTTPS termination to a managed proxy. -- Are comfortable with a single Rust binary and a YAML config. -- Don't want a sprawling Python toolchain just to keep certs fresh. +Operators who run their own webservers, mail servers, dashboards, hobby boxes, homelabs. People who want renewal automation without surrendering DNS or HTTPS termination to a managed proxy. Anyone who'd rather drop a single Rust binary on a box than wrangle a Python toolchain just to keep certs fresh. -If you're already happy with `certbot`, this isn't a replacement; it's a different tradeoff for the operator who wants pluggable backends and built-in operational surface (audit log, dashboard, alerts, metrics, federation) in one process. +If you're already happy with `certbot`, this isn't a replacement. It's a different tradeoff for the operator who wants pluggable backends and built-in operational surface (audit log, dashboard, alerts, metrics, federation) in one process. ## License