From c366287289bd74e50635409b99e98ff3649a4f54 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Fri, 5 Jun 2026 02:26:51 -0400 Subject: [PATCH] docs(dns): complete scope-lock + retro for delegation portfolio Status Locked->Complete (live-validated: 30/30 snapshots both layers; import run imported 30 infra.dns_delegation). Retro: adversarial gates caught 3 pre-merge bugs; live validation caught 3 runtime-only (429, the 404 delegation-read-never-live-tested endpoint, the verify-capabilities version gate). Co-Authored-By: Claude Opus 4.8 (1M context) --- .../2026-06-02-dns-delegation-portfolio.md | 2 +- ...-02-dns-delegation-portfolio.md.scope-lock | 1 - ...26-06-05-dns-delegation-portfolio-retro.md | 36 +++++++++++++++++++ 3 files changed, 37 insertions(+), 2 deletions(-) delete mode 100644 docs/plans/2026-06-02-dns-delegation-portfolio.md.scope-lock create mode 100644 docs/retros/2026-06-05-dns-delegation-portfolio-retro.md diff --git a/docs/plans/2026-06-02-dns-delegation-portfolio.md b/docs/plans/2026-06-02-dns-delegation-portfolio.md index 43fdb8da0..efa778a2a 100644 --- a/docs/plans/2026-06-02-dns-delegation-portfolio.md +++ b/docs/plans/2026-06-02-dns-delegation-portfolio.md @@ -36,7 +36,7 @@ **Deploy ordering (load-bearing):** PR1 merge → workflow/wfctl release (minor — behavioral change to `FromResourceStates` + import-all state IDs); PR2 merge → hover v0.5.1 release; THEN PR3 (bumps both pins, re-runs import). PR3 is independently revertible (revert pins) but NOT independently deployable. -**Status:** Locked 2026-06-02T15:51:47Z +**Status:** Complete 2026-06-05T06:25:38Z --- diff --git a/docs/plans/2026-06-02-dns-delegation-portfolio.md.scope-lock b/docs/plans/2026-06-02-dns-delegation-portfolio.md.scope-lock deleted file mode 100644 index 7610f3340..000000000 --- a/docs/plans/2026-06-02-dns-delegation-portfolio.md.scope-lock +++ /dev/null @@ -1 +0,0 @@ -0aeb1aef093bf8d6b9f6330d3d44e31b72aa8149a00701f18a1aa6da9275d91f diff --git a/docs/retros/2026-06-05-dns-delegation-portfolio-retro.md b/docs/retros/2026-06-05-dns-delegation-portfolio-retro.md new file mode 100644 index 000000000..468e0afde --- /dev/null +++ b/docs/retros/2026-06-05-dns-delegation-portfolio-retro.md @@ -0,0 +1,36 @@ +# Retro — DNS delegation in portfolio (both layers) + +**Date:** 2026-06-05 +**Scope:** workflow v0.71.0 + workflow-plugin-hover v0.5.2→v0.5.4 + gocodealone-dns import-dns delegation +**Artifacts:** design `docs/plans/2026-06-02-dns-delegation-portfolio-design.md` (3 revs) · plan `...portfolio.md` (3 revs, scope-locked→complete) · ADR 0047 + +## Outcome + +The DNS catalog now captures BOTH layers per domain: registrar NS delegation (`Snapshot.Authority{registrar_nameservers, live_nameservers}`) + hosted records. Live-proven: import-dns.yml imported 30 `infra.dns_delegation`; 30/30 portfolio snapshots carry both layers; 15 domains flagged delegated-away (NS ≠ hover → Hover records are placeholder/staging). Catalog data PR gocodealone-dns#18 open. + +## What worked + +- **Adversarial gates caught 3 real bugs PRE-merge** that would have silently shipped broken: (1) the state-store overwrite — both import types keyed the same `.json` file by domain, so the merge would have seen only delegation → empty `records` (the entire point, defeated). (2) source-of-truth — the delegation `Read` returns live-DNS-first, so a naive import would capture the *stale* live NS during a cutover, not the registrar intent. (3) Outputs key-shape — emitting only `registrar_nameservers` would have broken `Diff`/`parseDelegationSpec` (spurious perpetual drift). None were visible from the design text alone; the reviewer had to read the code. +- **Type-namespacing the import state ID** (`resourceType + "/" + zone`) — a small, general engine fix that makes any two resource types for one domain coexist on disk. + +## What live validation caught (that nothing else did) + +The unit tests + 2 design + 2 plan adversarial cycles were all green, yet the **first bulk live import** surfaced three runtime-only failures in sequence — a textbook case for `runtime-launch-validation`: + +1. **Imperva 429** on the per-domain NS read burst → fixed with retry-with-backoff (v0.5.3). +2. **HTTP 404 on every domain** → the delegation read used `GET /api/control_panel/domains/domain-`, which is **PUT-only** (Hover field-update endpoint). The read path had **never been live-tested** — the test account has 0 domains, and `SetNameservers`/delegation were never exercised against real Hover. Root-caused with the operator's live API captures: `GET /api/domains` (the list `ListDomains` already calls) returns `nameservers` for every domain in ONE call. Fix (v0.5.4): parse + cache NS from the list → the delegation import is ~1 Hover call (also dissolving the 429 fan-out), with `GET /api/domains/` as the per-domain fallback. +3. **Release `verify-capabilities` gate** failed twice on `plugin.json` version ≠ git tag. + +## Lessons + +- **A read/write path with zero live coverage is a latent wrong-endpoint bug.** The delegation `Read` shipped (months earlier) against a guessed endpoint and passed all stub tests; only the first bulk live call exposed it. Treat "never live-tested" as a release risk, not a footnote — the design's own Assumption #1 flagged this, and it was right. +- **Bulk per-resource fan-out is rate-limit-fragile against bot-protected APIs.** Prefer the list endpoint when it already carries the field (it did). The fix turned 30 calls into 1. +- **Release-carrying PRs must bump `plugin.json` (root + cmd/) to match the intended tag** — the `verify-capabilities` gate enforces tag==manifest version. Fold the bump into the feature PR, not a follow-up. +- **Operator API captures are the fastest root-cause for a closed third-party API.** Two `curl`s against the real endpoints settled days of guessing. + +## Follow-ups + +- Merge catalog PR gocodealone-dns#18 (real portfolio data). +- Live-test the in-browser WRITE path (`SetNameservers`) against a disposable domain before any migration relies on it (still httptest-only). +- DO delegation not captured (registrar Hover holds it) — future-optional. +- Carryover (hover#31): UA/platform/version derivation; setup-go Node-20 bump.