feat(wfctl): rotate-and-prune --prune-first flag for at-quota safety (v0.27.2)#592
Merged
Conversation
…t-quota safety Adds --prune-first to runInfraRotateAndPrune that flips step ordering: pre-prune orphans (name != canonical, not in --preserve-names) BEFORE rotation, then rotate, then defensive post-prune sweep. Default is true; --prune-first=false preserves the v0.27.1 ordering for callers that need byte-exact legacy behavior. Closes the at-quota chicken-and-egg: when DO Spaces hits its 200-key cap, Step 1 (rotate = mint new key) used to fail with HTTP 403 "key quota exceeded" before Step 2 (prune) ever got a chance to free quota. Operators had to hand-prune via the cloud console first — exactly the workflow rotate-and-prune was built to eliminate. Pre-prune uses a NAME-based filter (skips the canonical --name + the --preserve-names regex). The TIME + ACCESS_KEY filter that runInfraPrune uses is not applicable here because no rotation result exists yet at this point. A failed pre-prune delete aborts BEFORE rotation: better to leave a partially- pruned account than to mint a new key on a half-cleaned one. The pre-flight EnumerateAll probe + ErrProviderMethodUnimplemented handling (v0.27.1) is unchanged. WFCTL_CONFIRM_PRUNE=1 + --confirm gate the entire flow including the pre-prune step (the helper defensively re-checks).
Four new tests covering the --prune-first flag: - TestRotateAndPrune_PruneFirst_HappyPath_AtQuota: orphans + canonical-old loaded; --prune-first default true deletes orphans pre-rotation, rotates cleanly, post-prune cleans canonical-old. Asserts Step 0 + defensive sweep banners in output. - TestRotateAndPrune_PruneFirst_DefaultTrue: regression sentinel for the ADR 0023 default flip. With an orphan whose created_at is NEWER than the rotation timestamp, default behavior must delete it (would survive the post-rotation time filter under legacy ordering). - TestRotateAndPrune_PruneFirst_False_LegacyOrder: opt-out path. Same newer-than-cutoff orphan must NOT be deleted under --prune-first=false. Output must NOT mention Step 0 / defensive sweep. - TestRotateAndPrune_PruneFirst_PreservesCanonicalName: canonical --name + --preserve-names regex match must both be skipped during pre-prune. Asserted via the "Pre-rotation dry-run: 1 orphan" output line. Adds quotaCappedFakeProvider helper that mutates its EnumerateAll output as DeleteResource calls land — closer to the real cloud-API semantics than the existing static-list fakeProviderEnumerableDriver.
Documents the default flip from prune-after to prune-first. Cites the at-quota chicken-and-egg as motivation: the very condition operators need the cleanup tool for makes the legacy step-1-then-step-2 order unusable. References workspace memory feedback_proper_fixes_over_workarounds — the default IS the strict behavior; the legacy ordering is opt-out via --prune-first=false for byte-exact regressions. Lists four rejected alternatives (opt-in flag, auto-detect quota, split into two commands, enum-valued flag) with rationale for each.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates wfctl infra rotate-and-prune to be safer when cloud accounts are at credential quota by adding a --prune-first flag (default true) that performs a pre-rotation orphan prune, then rotates, then performs a post-rotation defensive prune sweep. It also adds an ADR documenting the default flip and new unit tests covering the new ordering.
Changes:
- Add
--prune-first(defaulttrue) and a newrunPreRotationPrunehelper to prune orphan credentials before attempting rotation. - Adjust user-visible step/banner output to reflect Step 0/1/2 when pre-prune is enabled and label the post-prune as a “defensive sweep”.
- Add tests for default behavior, legacy opt-out behavior, quota safety behavior, and preserve/canonical-name protection; add ADR 0023 documenting the rationale.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| decisions/0023-rotate-and-prune-prune-first-default.md | Adds ADR documenting why --prune-first defaults to true and what behavior changes. |
| cmd/wfctl/infra_rotate_and_prune.go | Implements --prune-first flow and runPreRotationPrune helper; updates step ordering and output. |
| cmd/wfctl/infra_rotate_and_prune_test.go | Adds new tests and a new fake provider to validate pre-prune behavior and legacy ordering opt-out. |
Comment on lines
+291
to
+297
| if pruneFirst { | ||
| fmt.Fprintf(w, "Step 0 (--prune-first): pruning orphan %s resources before rotation...\n", resourceType) | ||
| if code := runPreRotationPrune(ctx, provider, resourceType, name, preserveNames, nonInteractive, w); code != 0 { | ||
| fmt.Fprintf(w, "\nrotate-and-prune: pre-rotation prune failed (code=%d). No rotation attempted; no state mutated.\n", code) | ||
| return code | ||
| } | ||
| } |
Comment on lines
+502
to
+508
| ref := interfaces.ResourceRef{Type: o.Type, Name: o.Name, ProviderID: o.ProviderID} | ||
| if delErr := provider.DeleteResource(ctx, ref); delErr != nil { | ||
| fmt.Fprintf(w, "rotate-and-prune: pre-rotation delete %s: %v\n", o.Name, delErr) | ||
| failed++ | ||
| continue | ||
| } | ||
| fmt.Fprintf(w, " ✓ deleted %s\n", o.Name) |
Comment on lines
+228
to
+238
| // quotaCappedFakeProvider simulates a cloud account at the per-resource-type | ||
| // quota (e.g., DO Spaces 200-key limit). EnumerateAll returns the current | ||
| // `outputs` slice; DeleteResource shrinks `outputs` (removes the matching | ||
| // ProviderID) and records the delete in `deleted`. | ||
| // | ||
| // `createOnRotate` is the metadata the bootstrapSecrets stub will return for | ||
| // the new key. The test driver invokes the stub which appends this to | ||
| // `outputs`; if `len(outputs) >= quota` at append time the stub errors with | ||
| // quotaErr to simulate the at-quota Create failure. Tests assert that with | ||
| // --prune-first=true the orphan deletes happen FIRST, freeing room before | ||
| // the stub appends, so the rotation succeeds. |
⏱ Benchmark Results✅ No significant performance regressions detected. benchstat comparison (baseline → PR)
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
--prune-firstflag towfctl infra rotate-and-prune(defaulttrue)that flips step ordering: pre-prune orphans BEFORE rotation, then rotate,
then defensive post-prune sweep. Closes the at-quota chicken-and-egg.
Why
Before this PR,
rotate-and-pruneran rotate then prune in fixed order. Whenthe cloud account is at quota (DO Spaces enforces 200 keys per project),
Step 1's
Createcall fails withHTTP 403: key quota exceededbeforeStep 2 (prune) can free quota. Operators had to hand-prune via the cloud
console first — exactly the manual workflow
rotate-and-prunewas built toeliminate.
What changes
cmd/wfctl/infra_rotate_and_prune.go:--prune-firstboolean flag, defaulttrue(per ADR 0023).--prune-first=true, runsrunPreRotationPrune(NEW helper) BEFOREStep 1. The pre-prune deletes resources where
Name != --nameANDNamedoes not match--preserve-namesregex.--prune-first=true— should be a no-op if Step 0 was complete, butcovers replacement of canonical-name's old value.
--prune-first=false, the v0.27.1 ordering is preserved exactly.WFCTL_CONFIRM_PRUNE=1. A failedpre-prune delete aborts BEFORE rotation (no minting on a half-cleaned
account).
EnumerateAllprobe +ErrProviderMethodUnimplementedhandling unchanged.
cmd/wfctl/infra_rotate_and_prune_test.go:quotaCappedFakeProviderthat mutatesoutputsasDeleteResourcecalls land — closer to real cloud-API semantics.decisions/0023-rotate-and-prune-prune-first-default.md:Filter shape
Pre-prune uses a NAME-based filter (canonical
--name+--preserve-namesskipped). The TIME + ACCESS_KEY filter that
runInfraPruneuses is notapplicable here because no rotation result exists yet at this point in the
flow. Post-prune retains its TIME + ACCESS_KEY filter unchanged.
Test plan
GOWORK=off go build ./...— cleanGOWORK=off go test ./cmd/wfctl/... -count=1 -run RotateAndPrune -v—8/8 pass (4 existing + 4 new)
GOWORK=off go test ./...— full suite passesgolangci-lint run ./cmd/wfctl/...— 0 issuesNew test functions:
TestRotateAndPrune_PruneFirst_HappyPath_AtQuota— orphans + canonical-oldloaded; default
--prune-first=truedeletes orphans pre-rotation, rotatescleanly, post-prune cleans canonical-old. Asserts Step 0 + defensive sweep
banners in output.
TestRotateAndPrune_PruneFirst_DefaultTrue— regression sentinel for theADR 0023 default flip. Orphan with
created_atnewer than rotationtimestamp is deleted by default (would survive legacy ordering).
TestRotateAndPrune_PruneFirst_False_LegacyOrder— opt-out path. Samenewer-than-cutoff orphan must NOT be deleted under
--prune-first=false.Output must NOT mention Step 0 / defensive sweep.
TestRotateAndPrune_PruneFirst_PreservesCanonicalName— canonical--name+--preserve-namesregex match must both be skipped duringpre-prune. Asserted via "Pre-rotation dry-run: 1 orphan" output line.
Behavior change call-out
Default
--prune-first=truechanges observable behavior on thedefault invocation: orphan keys (Name != canonical, not in preserve regex)
created AFTER the rotation timestamp are now deleted, where previously
they survived the post-rotation time filter. This is intentional per
ADR 0023 — orphans are orphans regardless of when they were created.
Callers needing legacy semantics use
--prune-first=false.DO NOT
ADR
decisions/0023-rotate-and-prune-prune-first-default.mdReferences ADR 0012 (provider credential rotation), ADR 0017 (prune two-key
opt-in), ADR 0020 (storage filter sidecar metadata).