Skip to content

Commit 23dbd8f

Browse files
intel352claude
andauthored
fix(secrets): prevent duplicate DO Spaces key creation (#732)
* fix(secrets): prevent duplicate DO Spaces key creation DO does not enforce name uniqueness on Spaces keys. The provider_credential bootstrap path POSTed blindly every run; if the bootstrap layer failed to persist the access_key+secret_key pair to the configured store (or skipped existsInProvider for any reason), the next run created another key with the same name. Repeated runs accreted orphaned keys until the account hit DO's 200-key quota. Each orphan is unrecoverable — DO returns secret_key once at creation time, and List returns access_key only. Changes: - generateDOSpacesKey() now does a list-then-create. Pre-create call to https://api.digitalocean.com/v2/spaces/keys checks for a matching name; if found, returns an error naming the existing access_key with explicit recovery guidance (delete via console OR --force-rotate). The POST is never attempted in the conflict case. - New lookupExistingSpacesKey() helper paginates through Spaces keys (10 × 100 cap) matching by exact name. Used by the pre-check AND by the 403 error branch (account-quota vs name-conflict disambiguation hint). - Improved error wrapping: every "DO spaces key create" error now includes name=%q so the operator can correlate logs with DO console entries. - stderr trace line on every create attempt: name + bucket + grant permission, so the failure mode is visible without re-running with extra flags. Tests: - TestGenerateDOSpacesKey_RejectsDuplicateNameBeforeCreate verifies the pre-check short-circuits and POST is never attempted when a duplicate exists. (postCalls == 0 asserted.) - TestGenerateDOSpacesKey_403QuotaWithoutNameConflict covers the fallback hint when no name match but DO still returns 403. - TestLookupExistingSpacesKey_PaginatedHit + _Miss cover the pager helper. - Existing TestGenerateSecret_ProviderCredential_DOSpaces and _NoBucket / _WithBucket tests pass without changes — their GET handlers return 404 (no existing key) so the create path proceeds. - TestGenerateDOSpacesKey_IncludesCreatedAt updated to stub both the new GET and the existing POST. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(secrets): require explicit name for DO Spaces key + fullaccess warning Two follow-ups to the duplicate-creation fix: 1. Refuse the default name fallback. The prior default \`workflow-spaces-key\` was project-shared; multiple workflow- managed projects in one DO account would all try to create or adopt the same key. Force the caller to set a project-unique slug (e.g. \`multisite-deploy-key\`, \`wfcompute-deploy-key\`). 2. Emit a stderr WARN when permission=fullaccess is granted (i.e. no \`bucket:\` configured). Bootstrap necessarily uses fullaccess because the IaC state bucket does not exist yet, but once it does the operator should rotate to a bucket- scoped key via \`wfctl secrets rotate --target SPACES --bucket <state-bucket>\` to limit blast radius. The warning includes the exact rotate command. Test updates: - WithBucket / IncludesCreatedAt / NoBucket tests now pass an explicit name (was relying on the default). - New TestGenerateDOSpacesKey_RequiresName covers the refuse- default-name path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(wfctl): transactional rollback + list-orphans for provider_credential Round 2 of the DO Spaces orphan fix (workflow#732 follow-up). Round 1 (already in this PR) added a list-before-create pre-check so the generator refuses to POST a duplicate name. That prevents NEW orphans whose root cause is a save-failure-after-mint loop. This round closes the other half: the orphan-creation event itself. cmd/wfctl/infra_bootstrap.go — bootstrapSecrets provider_credential path now: 1. Extracts access_key + created_at BEFORE the Set loop (was inside the loop, so a first-iteration Set failure left the rollback path blind to the just-minted credential). 2. On ANY Set failure inside the loop, invokes credRevoker. RevokeProviderCredential with the just-minted access_key. If no revoker is registered, emits a loud ORPHANED-CREDENTIAL warning to stderr with the wfctl list-orphans recovery command pre-filled. 3. Rollback is best-effort: failure does not mask the original Set error. cmd/wfctl/secrets_orphans.go — new \`wfctl secrets list-orphans\` subcommand: wfctl secrets list-orphans --source digitalocean.spaces \ --name workflow-spaces-key # dry-run wfctl secrets list-orphans --source digitalocean.spaces \ --name workflow-spaces-key --delete # delete all matches Lists every upstream credential whose name field equals --name (DO Spaces does not enforce name uniqueness, so a single name can map to many access_keys after the orphan loop). Bounded pager (100 × 100 = 10 000 keys). Per-orphan delete continues on failure to surface every recoverable orphan. Currently supports digitalocean.spaces; extending to other sources adds one switch arm. Tests: - TestBootstrapSecrets_ProviderCredential_RollbackOnSetFailure — Set(SPACES_secret_key) returns error; assert revoker invoked with AK_ORPHAN; assert original error surfaces. - TestBootstrapSecrets_ProviderCredential_RollbackOnFirstSetFailure — Set(SPACES_access_key) (the very first call) returns error; asserts the access_key extraction reorder is correct (revoker must receive AK_FIRST even though we never even reached the access_key iteration). These two tests reproduce the pre-fix orphan-creation bug (generator succeeds + Set fails → DO key remains, no rollback attempted). Without the rollback wiring they fail with "expected 1 rollback-revoke call; got 0". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(lint): annotate ghAPICmd #nosec G204 Pre-existing G204 false positive on a fixed-binary + url-escaped endpoint subprocess. Documented inline; no behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(secrets): follow DO pagination + raise per_page to 200 list-orphans was using local page-counter increments. Earlier runs truncated at 100 because per_page was 100 + the locally-bounded loop returned after page 1. DO's Spaces Keys API allows per_page up to 200 and returns an absolute next-page URL in links.pages.next; we now follow that URL rather than incrementing locally. Also emits a stderr scan summary so debugging is observable from the CI log without re-running with extra flags. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 631345f commit 23dbd8f

7 files changed

Lines changed: 622 additions & 26 deletions

File tree

cmd/wfctl/infra_bootstrap.go

Lines changed: 37 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -713,19 +713,43 @@ var bootstrapSecrets = func(ctx context.Context, provider secrets.Provider, cfg
713713
allowed[k] = true
714714
}
715715

716-
// Capture access_key + created_at independently of storage so
717-
// RotationResult always reports them on a force-rotate, even
718-
// though access_key IS in the allowed set and created_at is
719-
// NOT (per review #2 C5: stderr emission alone would skip
720-
// access_key).
721-
var newAccessKey, newCreatedAt string
722-
for subKey, subVal := range subKeyMap {
723-
if subKey == "access_key" {
724-
newAccessKey = subVal
716+
// Capture access_key + created_at BEFORE any Set call so the
717+
// transactional rollback path has the upstream credential ID
718+
// available even when the very first Set fails. Bug fix:
719+
// previously these were extracted DURING iteration, so a
720+
// Set("foo_access_key") failure left newAccessKey unset and
721+
// the orphaned DO key invisible to the rollback path.
722+
newAccessKey := subKeyMap["access_key"]
723+
newCreatedAt := subKeyMap["created_at"]
724+
725+
// Transactional rollback: if ANY Set() inside the loop fails,
726+
// revoke the just-created upstream credential before
727+
// returning. The DO Spaces Keys API does NOT enforce name
728+
// uniqueness, so a partially-applied generation without
729+
// rollback leaves an orphaned key whose secret_key is
730+
// permanently irrecoverable (see workflow#732 / SPEC V?).
731+
//
732+
// Rollback is best-effort — failure is logged but does not
733+
// mask the original Set error. Operator can still find the
734+
// orphan via the wfctl secrets list-orphans helper.
735+
revokeOrphan := func(reason error) {
736+
if newAccessKey == "" {
737+
return
738+
}
739+
if credRevoker == nil {
740+
fmt.Fprintf(os.Stderr, "warn: provider_credential %q minted access_key=%s but no revoker available; the upstream credential is now ORPHANED and unrecoverable. Run `wfctl secrets list-orphans --source %s --name %s` to clean up. Original error: %v\n",
741+
gen.Key, newAccessKey, gen.Source, gen.Key, reason)
742+
return
725743
}
726-
if subKey == "created_at" {
727-
newCreatedAt = subVal
744+
if revokeErr := credRevoker.RevokeProviderCredential(ctx, gen.Source, newAccessKey); revokeErr != nil {
745+
fmt.Fprintf(os.Stderr, "warn: rollback-revoke just-minted credential %q (access_key=%s) failed: %v — manual cleanup required via `wfctl secrets list-orphans --source %s`\n",
746+
gen.Key, newAccessKey, revokeErr, gen.Source)
747+
return
728748
}
749+
fmt.Fprintf(os.Stderr, "wfctl: rolled back orphaned credential %q (access_key=%s) after Set failure\n", gen.Key, newAccessKey)
750+
}
751+
752+
for subKey, subVal := range subKeyMap {
729753
if !allowed[subKey] {
730754
// Sidecar field — emit to stderr in machine-parseable form
731755
// for operator observability + audit logs.
@@ -734,6 +758,7 @@ var bootstrapSecrets = func(ctx context.Context, provider secrets.Provider, cfg
734758
}
735759
fullKey := gen.Key + "_" + subKey
736760
if setErr := provider.Set(ctx, fullKey, subVal); setErr != nil {
761+
revokeOrphan(setErr)
737762
return generated, rotations, fmt.Errorf("store secret %q: %w", fullKey, setErr)
738763
}
739764
generated[fullKey] = subVal
@@ -743,6 +768,7 @@ var bootstrapSecrets = func(ctx context.Context, provider secrets.Provider, cfg
743768
fmt.Printf(" secret %q: created\n", fullKey)
744769
}
745770
}
771+
_ = newCreatedAt // captured into RotationResult below
746772
if forceRotate[gen.Key] {
747773
fmt.Fprintf(os.Stderr, "wfctl: rotated provider_credential %s (replaced existing value at %s)\n", gen.Key, time.Now().UTC().Format(time.RFC3339))
748774
// Record the rotation in-process so callers (rotate-and-prune)

cmd/wfctl/infra_bootstrap_secrets_test.go

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,3 +265,125 @@ func TestBootstrapSecrets_ProviderCredentialProbeIgnoresBareKey(t *testing.T) {
265265
t.Fatalf("generator called %d times, want 1", generateCalls)
266266
}
267267
}
268+
269+
// failOnSetProvider triggers a Set() failure on the first matching key
270+
// — used to exercise the transactional rollback path when a
271+
// provider_credential creation succeeds upstream but the store-write
272+
// half fails (e.g. GH PAT lost permissions mid-bootstrap).
273+
type failOnSetProvider struct {
274+
failKey string
275+
setCalls []string
276+
}
277+
278+
func (p *failOnSetProvider) Name() string { return "fail-on-set" }
279+
func (p *failOnSetProvider) Get(_ context.Context, _ string) (string, error) {
280+
return "", secrets.ErrUnsupported
281+
}
282+
func (p *failOnSetProvider) Set(_ context.Context, key, _ string) error {
283+
p.setCalls = append(p.setCalls, key)
284+
if key == p.failKey {
285+
return errFakeStoreUnavailable
286+
}
287+
return nil
288+
}
289+
func (p *failOnSetProvider) Delete(_ context.Context, _ string) error { return nil }
290+
func (p *failOnSetProvider) List(_ context.Context) ([]string, error) {
291+
return nil, secrets.ErrUnsupported
292+
}
293+
294+
// recordingRevoker captures RevokeProviderCredential calls so the test
295+
// can assert rollback occurred. Implements interfaces.ProviderCredentialRevoker.
296+
type recordingRevoker struct {
297+
calls []revokeCall
298+
}
299+
type revokeCall struct {
300+
source string
301+
credentialID string
302+
}
303+
304+
func (r *recordingRevoker) RevokeProviderCredential(_ context.Context, source, credentialID string) error {
305+
r.calls = append(r.calls, revokeCall{source: source, credentialID: credentialID})
306+
return nil
307+
}
308+
309+
var errFakeStoreUnavailable = errFakeStore("store unavailable (simulated)")
310+
311+
type errFakeStore string
312+
313+
func (e errFakeStore) Error() string { return string(e) }
314+
315+
// TestBootstrapSecrets_ProviderCredential_RollbackOnSetFailure is the
316+
// regression test for the orphan-key bug: when generateSecret returns a
317+
// fresh DO Spaces key but provider.Set fails to persist it, bootstrap
318+
// MUST revoke the just-minted upstream credential.
319+
//
320+
// Pre-fix behaviour: Set fails → return error → DO key remains in the
321+
// account with an unrecoverable secret_key. Every subsequent run mints
322+
// another orphan with the same name.
323+
//
324+
// Post-fix: Set failure triggers credRevoker.RevokeProviderCredential
325+
// with the access_key from the just-generated subKeyMap.
326+
func TestBootstrapSecrets_ProviderCredential_RollbackOnSetFailure(t *testing.T) {
327+
withStubGenerator(t, func(_ context.Context, _ string, _ map[string]any) (string, error) {
328+
out, _ := json.Marshal(map[string]string{
329+
"access_key": "AK_ORPHAN",
330+
"secret_key": "SK_DOOMED",
331+
})
332+
return string(out), nil
333+
})
334+
335+
p := &failOnSetProvider{failKey: "SPACES_secret_key"}
336+
rev := &recordingRevoker{}
337+
cfg := &SecretsConfig{
338+
Generate: []SecretGen{
339+
{Key: "SPACES", Type: "provider_credential", Source: "digitalocean.spaces"},
340+
},
341+
}
342+
343+
_, _, err := bootstrapSecrets(context.Background(), p, cfg, nil, rev)
344+
if err == nil {
345+
t.Fatal("expected Set failure to surface as error")
346+
}
347+
348+
if len(rev.calls) != 1 {
349+
t.Fatalf("expected 1 rollback-revoke call; got %d", len(rev.calls))
350+
}
351+
if rev.calls[0].credentialID != "AK_ORPHAN" {
352+
t.Errorf("rollback called with credentialID=%q want AK_ORPHAN", rev.calls[0].credentialID)
353+
}
354+
if rev.calls[0].source != "digitalocean.spaces" {
355+
t.Errorf("rollback source=%q want digitalocean.spaces", rev.calls[0].source)
356+
}
357+
}
358+
359+
// TestBootstrapSecrets_ProviderCredential_RollbackOnFirstSetFailure
360+
// guards the most insidious shape of the bug: the very first Set call
361+
// fails (e.g. access_key write). The pre-fix code extracted
362+
// newAccessKey *during* the for-range loop, so a first-iteration
363+
// failure left newAccessKey empty even though the upstream key exists.
364+
// The fix extracts access_key BEFORE the loop.
365+
func TestBootstrapSecrets_ProviderCredential_RollbackOnFirstSetFailure(t *testing.T) {
366+
withStubGenerator(t, func(_ context.Context, _ string, _ map[string]any) (string, error) {
367+
out, _ := json.Marshal(map[string]string{
368+
"access_key": "AK_FIRST",
369+
"secret_key": "SK_FIRST",
370+
})
371+
return string(out), nil
372+
})
373+
374+
p := &failOnSetProvider{failKey: "SPACES_access_key"}
375+
rev := &recordingRevoker{}
376+
cfg := &SecretsConfig{
377+
Generate: []SecretGen{
378+
{Key: "SPACES", Type: "provider_credential", Source: "digitalocean.spaces"},
379+
},
380+
}
381+
382+
_, _, err := bootstrapSecrets(context.Background(), p, cfg, nil, rev)
383+
if err == nil {
384+
t.Fatal("expected Set failure")
385+
}
386+
if len(rev.calls) != 1 || rev.calls[0].credentialID != "AK_FIRST" {
387+
t.Errorf("rollback calls = %v; want one revoke of AK_FIRST", rev.calls)
388+
}
389+
}

cmd/wfctl/plugin_marketplace_verify.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,11 @@ Options:
8585

8686
// ghAPICmd is the indirection point so tests can inject a fake gh binary.
8787
// Default is the real `gh api` CLI.
88+
//
89+
// #nosec G204 -- the binary is the fixed string "gh" and `endpoint` is
90+
// constructed from a literal-prefix + URL-escaped query inside this
91+
// package (urlQueryEscape sanitises). No user-controlled shell metachars
92+
// flow into the subprocess.
8893
var ghAPICmd = func(ctx context.Context, endpoint string) ([]byte, error) {
8994
cmd := exec.CommandContext(ctx, "gh", "api", endpoint)
9095
return cmd.Output()

cmd/wfctl/secrets.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ func runSecrets(args []string) error {
3232
return runSecretsSync(args[1:])
3333
case "setup":
3434
return runSecretsSetup(args[1:])
35+
case "list-orphans":
36+
return runSecretsListOrphans(args[1:])
3537
default:
3638
return secretsUsage()
3739
}
@@ -54,6 +56,7 @@ Actions:
5456
rotate Trigger rotation of a secret
5557
sync Copy secret structure between environments
5658
setup Interactively set all secrets for an environment
59+
list-orphans List/delete duplicate-named upstream credentials (e.g. DO Spaces keys)
5760
5861
Ad-hoc provider flags (set, get, list, delete, export):
5962
--provider Override provider: keychain, env, aws
@@ -75,6 +78,8 @@ Examples:
7578
wfctl secrets sync --from staging --to production
7679
wfctl secrets setup --env local
7780
wfctl secrets setup --env production --auto-gen-keys
81+
wfctl secrets list-orphans --source digitalocean.spaces --name workflow-spaces-key
82+
wfctl secrets list-orphans --source digitalocean.spaces --name workflow-spaces-key --delete
7883
`)
7984
return fmt.Errorf("missing or unknown action")
8085
}

0 commit comments

Comments
 (0)