Skip to content

F.0: Caddy federation helpers + hardware gate + rate limiter + storage translators#9

Merged
kh0pper merged 2 commits intomainfrom
f0-caddy-federation-helpers-and-hardware-gate
Apr 12, 2026
Merged

F.0: Caddy federation helpers + hardware gate + rate limiter + storage translators#9
kh0pper merged 2 commits intomainfrom
f0-caddy-federation-helpers-and-hardware-gate

Conversation

@kh0pper
Copy link
Copy Markdown
Owner

@kh0pper kh0pper commented Apr 12, 2026

Summary

Phase 2 federation foundation — lays the groundwork for the 8 federated-app bundles planned for F.1 onward (GoToSocial, WriteFreely, Matrix-Dendrite, Funkwhale, Pixelfed, Lemmy, Mastodon, PeerTube). No federated app ships in this PR; this is pure platform infra so F.1+ can be thin wrappers. Two stacked commits.

Part 1/2: Caddy federation helpers + hardware gate

  • 4 new Caddy MCP tools:
    • `caddy_add_federation_site` — 4 profiles (matrix / activitypub / peertube / generic-ws) with canned directives for WS upgrade, 40MB/8GB body, forwarded headers, 300s/1800s timeouts
    • `caddy_set_wellknown` — 5 kinds (matrix-server, matrix-client, nodeinfo, host-meta, webfinger) with canned JSON builders
    • `caddy_add_matrix_federation_port` — emits a :8448 site block with its own LE cert. Refuses if `/.well-known/matrix/server` delegation already exists on the same domain (forces either-or, not both)
    • `caddy_cert_health` — ok/warning/error per domain; surfaces near-expiry, failed renewals, staging-issuer use that would otherwise stay silent
  • Caddy post-install hook creates `crow-federation` external docker network; compose joins it so federated apps are reachable by docker service name with no host port publish
  • `servers/gateway/hardware-gate.js` — refuses installs when effective RAM (MemAvailable + half-weight SSD-backed swap + half-weight zram − committed `recommended_ram_mb` from installed bundles − 512 MB host reserve) < `min_ram_mb`. SD-card swap explicitly excluded. CLI-only `force_install` bypass
  • Caddy skill doc updated for all 4 tools

Part 2/2: rate limiter, storage translators, cert-health panel

  • `servers/shared/rate-limiter.js` — SQLite-backed token bucket. Persistence survives bundle restart (bypass-by-restart was a round-2 review flag). Bucket key resolves to conversation_id → transport hash → per-tool global. Defaults gated by tool-name suffix: `_post` 10/hr, `_follow` 30/hr, `_search` 60/hr, `_defederate` 5/hr, `*_import_blocklist` 2/hr. Overrides via `~/.crow/rate-limits.json` with fs.watch hot-reload. Exposed as `wrapRateLimited(db)` returning `limiter(toolId, handler)` — one import per bundle
  • `servers/gateway/storage-translators.js` — canonical Crow S3 credentials → per-app env-var schemas for Mastodon (`S3_`), PeerTube (`PEERTUBE_OBJECT_STORAGE_` env vars, not YAML patching), Pixelfed (`AWS_` + `FILESYSTEM_CLOUD`), Funkwhale (`AWS_` + `AWS_S3_*`)
  • `rate_limit_buckets` table added to `scripts/init-db.js`
  • Caddy panel: new cert-health card with Overall status badge + per-domain rows (dot + issuer + expiry + problems). XSS-safe (textContent + createElement only)

Design notes

  • Caddyfile remains the source of truth on disk. All new tools route through `upsertRawSite()` — hand-edits outside managed blocks survive, re-running with the same domain replaces instead of duplicating
  • Effective RAM is deliberately pessimistic. SD-card swap doesn't count as headroom even though the kernel reports it in SwapFree. zram counts at half-weight (compressed RAM, not true extra capacity)
  • Hardware gate is refuse-by-default with CLI override only — the web UI never surfaces `--force-install`
  • Matrix port 8448 enforced as either/or. The round-2 review flagged that 8448 can't be handwaved; the tool refuses to run when `.well-known/matrix/server` delegation is already in place
  • Rate limiter persisted in SQLite so restarting a bundle doesn't reset the window

Test plan

  • `node --check` on all changed files passes
  • `upsertRawSite` round-trip: add federation site → replace it → add :8448 site → remove. Site count and ordering correct
  • `federation-profiles` smoke test: matrix / activitypub / peertube profiles render with upstream substitution; well-known builders produce valid JSON
  • `computeEffectiveRam` unit tests: Pi 4 GB + SD swap → 1200 MB; + SSD swap → 2224 MB; + zram → 1712 MB
  • Rate-limit token bucket: 10 allowed / 11th denied with `retry_after_seconds=360`
  • Storage translators: mastodon / peertube / pixelfed / funkwhale all produce valid env; unknown app + missing-creds throw
  • `npm run init-db` creates `rate_limit_buckets` cleanly
  • `npm run check` passes
  • Live: install updated Caddy bundle on grackle, verify `crow-federation` network created, `caddy_add_federation_site` against throwaway domain validates via `/load` before writing
  • Live: attempt install with fake `min_ram_mb: 999999` manifest → gate refuses with actual numbers in error
  • Live: `caddy_cert_health` on an existing site returns structured JSON; staging issuer surfaces as `warning`

Live verification is correctly deferred to F.1 (GoToSocial pilot) where the shared network and cert-health path get exercised in anger.

Kevin Hopper added 2 commits April 12, 2026 13:31
Lays the groundwork for the 8 federated-app bundles enumerated in the
Phase 2 plan (Matrix-Dendrite, Mastodon, GoToSocial, Pixelfed, PeerTube,
Funkwhale, Lemmy, WriteFreely). No federated app ships in this PR — this
is pure platform infra so the apps in F.1+ are thin wrappers.

Caddy side:

- bundles/caddy/server/federation-profiles.js — 4 canned profiles
  (matrix, activitypub, peertube, generic-ws) with the directives each
  app family needs (websocket upgrade, 40MB/8GB body, forwarded headers,
  300s/1800s timeouts) plus builders for the standard /.well-known/
  JSON payloads (matrix-server, matrix-client, nodeinfo)
- bundles/caddy/server/caddyfile.js — upsertRawSite() helper for
  idempotent site-block replacement. Parser already handled :8448 and
  inner blocks; round-trip verified via smoke test
- bundles/caddy/server/server.js — 4 new MCP tools:
    caddy_add_federation_site       — profile-aware site block
    caddy_set_wellknown             — standalone /.well-known/<path>
                                       handler (e.g. matrix-server
                                       delegation on an apex domain)
    caddy_add_matrix_federation_port — :8448 site block with its own
                                       LE cert (refuses if the same
                                       domain already has matrix-server
                                       delegation — enforces "one or
                                       the other, not both")
    caddy_cert_health               — ok/warning/error per domain,
                                       surfaces staging-cert use and
                                       near-expiry that would otherwise
                                       stay silent until outage
- bundles/caddy/scripts/post-install.sh — creates the crow-federation
  external docker network (idempotent)
- bundles/caddy/docker-compose.yml — Caddy now joins crow-federation so
  federated apps in F.1+ become reachable by docker service name with
  no host port publish
- bundles/caddy/skills/caddy.md — full doc for the 4 new tools,
  covering the matrix-8448-vs-well-known either/or and the shared-
  network model

Gateway side:

- servers/gateway/hardware-gate.js — checkInstall() refuses installs
  whose min_ram_mb exceeds effective RAM (MemAvailable + 0.5 ×
  SSD-backed swap + 0.5 × zram; SD-card swap explicitly excluded)
  minus already-committed RAM (sum of recommended_ram_mb across
  installed bundles) minus a flat 512 MB host reserve. Warns but
  allows when under the recommended threshold. Unit-tested against
  Pi-with-SD-swap, Pi-with-SSD-swap, and Pi-with-zram fixtures
- servers/gateway/routes/bundles.js — hardware gate wired into
  POST /bundles/api/install before the consent-token check. CLI-only
  force_install bypass (request body, never UI)

Design notes:

- Caddyfile on-disk remains the source of truth. All new tools go
  through upsertRawSite() so hand-edits outside the managed blocks
  survive round-trips, and re-running a tool with the same domain
  replaces instead of duplicating
- Effective RAM is deliberately pessimistic: SD-card swap does not
  count as headroom even though /proc/meminfo reports it in SwapFree.
  zram counts at half-weight because it's compressed RAM, not true
  extra capacity. Host-reserve keeps the base OS + gateway responsive
- Hardware gate is a refuse-by-default mechanism with a CLI override;
  the web UI never surfaces --force-install

Part 2/2 (follow-up): shared rate-limiter (SQLite-backed token bucket),
storage-translators (per-app S3 env-var schema mapping), init-db entry
for rate_limit_buckets, end-to-end verification on grackle with the
cert-health panel card.
Closes out F.0 so the federated-app bundles in F.1+ can plug in without
rebuilding shared infra. Builds on part 1/2 (222e175).

- servers/shared/rate-limiter.js — SQLite-backed token-bucket helper.
  Persistence in the rate_limit_buckets table survives bundle restart
  (round-2 reviewer flagged bypass-by-restart). Bucket key resolves to
  conversation_id when MCP supplies one, else a hash of transport
  identity, else a per-tool global. Defaults gated by tool-name suffix:
  *_post 10/hr, *_follow 30/hr, *_search 60/hr, *_block_* / *_defederate
  5/hr, *_import_blocklist 2/hr. Overrides via ~/.crow/rate-limits.json
  with fs.watch hot-reload. Exposed as wrapRateLimited(db) returning
  limiter(toolId, handler) so bundles wrap their MCP handlers with one
  import. Smoke-tested: 11th call in the window is denied with
  retry_after_seconds.

- servers/gateway/storage-translators.js — per-app S3 env-var schema
  mapping. Canonical Crow S3 credentials
  { endpoint, bucket, accessKey, secretKey, region?, forcePathStyle? }
  translate into Mastodon's S3_*, PeerTube's PEERTUBE_OBJECT_STORAGE_*
  (env-var path, not YAML patching), Pixelfed's AWS_*, and Funkwhale's
  AWS_*/AWS_S3_* schemas. One function per app, validated fixtures.

- scripts/init-db.js — rate_limit_buckets table
  (tool_id, bucket_key, tokens, refilled_at) with PRIMARY KEY
  (tool_id, bucket_key) and a refilled_at index for GC later. Verified
  via `npm run init-db && npm run check`.

- bundles/caddy/panel/caddy.js + panel/routes.js — cert-health card on
  the Caddy panel. New GET /api/caddy/cert-health endpoint surfaces
  ok/warning/error per domain with issuer, expiry, and staging-cert
  detection. Panel shows an Overall badge + per-domain rows with colored
  status dots; textContent + createElement only (XSS-safe pattern).

Verified (this PR):

- node --check on all changed files: OK
- Rate-limit token bucket: 10 calls through, 11th denied with
  retry_after_seconds=360 on a 10/3600 bucket
- Storage translator: mastodon/peertube/pixelfed/funkwhale all produce
  valid env vars with credentials present; unknown app and missing
  credentials both throw
- `npm run init-db` creates rate_limit_buckets cleanly
- `npm run check` passes

Deferred to F.1:

- Live end-to-end install of the updated Caddy bundle on grackle
  (requires the uninstall + reinstall flow — will exercise as part of
  F.1 GoToSocial pilot where the shared network + cert-health path is
  actually used in anger)
- Panel cert-health card live-verification with a real issued cert
  (waits on F.1 for a federated site to exist)
@kh0pper kh0pper merged commit ad7f349 into main Apr 12, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant