Cloud-SDK extraction — PR 6: add ctx context.Context to module.IaCStateStore#671
Conversation
…tract plugins Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and cloud.google.com/go + google.golang.org/api direct deps from workflow core's module/ package. Architecture: 3 extension surfaces, 3 strategies: - IaC state backends → new IaCStateBackend strict proto contract; iac.state stays core, config.backend dispatches to plugin gRPC client. - platform.* provisioners → new PlatformBackend strict proto contract; module types + provider: key stay core, kind backend stays in-core, cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract. - standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload, storage.s3, storage.gcs) → plugin-native module/step types via the existing ModuleFactories/StepFactories SDK — no new contract. Credentials (Option 1): each plugin-native module carries its own credentials: block + builds aws.Config in-process; optional in-plugin credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp cloud_account files have no SDK import and stay. 4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp, D digitalocean (spaces backend, minor bump + migration doc). Includes Assumptions + Rollback sections + self-challenge top-3 doubts (PlatformBackend over-generality, provider-separability fragility, benchmark-could-invalidate-unary-default — all with mitigations deferred to writing-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 5 Important findings from adversarial-design-review: Critical: - iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit home: deleted by Phase B's core PR; Phase D reframed from soft-compat to a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2 entirely" is now actually achieved by the phases as written + enforced by a go list -deps CI gate. - kinesis: added Non-Goals entry explaining it's a transitive dep of modular/modules/eventbus/v2, not a direct workflow import — out of scope, with the go mod why chain documented so the literal ask is fully answered. Important: - Full grep-verified 13-file AWS inventory table in Phase B with per-file destinations; reconciled aws_api_gateway.go (route-sync module) vs platform_apigateway.go (provisioner) as two distinct files. - aksBackend assigned to Phase A (Azure gets the PlatformBackend half too); platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with explicit always-compiles coordination. - Proto contracts fold into existing plugin/external/proto/iac.proto (8 services already) instead of new files — matches precedent. - New Security section: secret-redaction in config-version-store/tracing + gRPC interceptor logging are blocking writing-plans tasks; credentials_ref blast radius documented as strictly narrower than today's cloud.account. Minor: - IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no speculative surface. - Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR). - IaCProviderRequired/ResourceDriver reuse promoted to a first-class Alternatives Considered entry with accept/reject rationale + retained as the gated fallback for PlatformBackend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 3 Important from cycle-2 review: Critical: - platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with its own import block). The "always compiles across phases" property is now structural, not asserted. Added a verified per-file import-ownership table. - Corrected the false Phase A rationale: aksBackend uses raw net/http REST, NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend region). The Azure go.mod drop comes entirely from iac_state_azure.go deletion + iac_module.go edit; aksBackend extraction is code-organisation, not a dependency change. - Documented the eksBackend → cloud_account_aws.go call-graph edge as a hard same-commit atomicity constraint (verified: eksBackend calls awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138). Important: - Phase B core-PR bullet now explicitly lists "strip the spaces case from iac_module.go" (was only obliquely referenced). - New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds contract field; SaveState lost-response retry → documented idempotent (full-state replace, last-writer-wins); plugin-unreachable → abort before mutation; PlatformBackend mid-Apply crash → identical to today's in-process risk, no new mitigation. - §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no body-logging interceptor (grpc.NewServer(opts...) passthrough; only callback_server.go logs, never module config). Writing-plans adds a guard test instead of a conditional interceptor. Minor: file-count table footnoted (count = importers, not deletions); shared s3compat module added as Alternatives Considered #3 (deferred, not rejected); self-challenge doubt numbering tidied (2 mitigations cover 3 doubts, intentionally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sed in the cycle-2 commit ran from the wrong cwd — Status line still said "cycle 1" and two interface-audit-spike references still said "Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 2 Important from cycle-3 review: Critical (same root — symbol-level coupling the import-block audit missed): - parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers the plugin-bound backend files call. An import-block audit is symbol-blind. Fix: Phase 0 now does TWO moves — the file split AND relocating both helpers into a new SDK-free core module/cloud_helpers.go. Per-file table gains a "cross-file symbol deps (the trap)" column listing every helper edge per backend. Phase 0 acceptance criteria now include a grep that no core file references the helpers from their old homes. - §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file (module shell + kubernetesBackend interface + safeIntToInt32) — NOT touched by the split; only platform_kubernetes_kind.go (holds all 4 backends) is split. Earlier draft conflated the two files. Important: - Per-file ownership table relabelled "intended post-split — verified by the Phase 0 build gate" (was asserted-as-verified against an unsplit file — same hand-waving class cycle-2 flagged for "always compiles"). - lease_ttl_seconds DROPPED from the Phase A proto. It was a contract field with no enforced semantics and no implementing backend in scope — YAGNI. §Failure-modes orphaned-lock reworked: documented limitation + operator-side lock-object delete for recovery; TTL is a planned ADDITIVE follow-up paired with a conformance test, shipped with the first backend that honors expiry. Added explicit Lock-contention behavior (immediate error, matches today's in-process IaCStateStore.Lock — no new waiting state). Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence fixed; §Assumptions 2 notes Phase 0 de-risks it structurally. Also: removed a stray stale cycle-1 copy of this doc that was sitting untracked in the main workflow checkout (the canonical doc is here in the feat/cloud-sdk-extraction worktree). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 2 Important from cycle-4 review: Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't — that's eksBackend) and aksBackend has no cross-file deps (it does — CloudCredentials/CloudCredentialProvider from cloud_account.go, same as gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The symbol-ownership map is now a Phase 0 build artifact — scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a design-doc claim that rots on every edit. The design commits to the *method* + the *known shape* (cloud_account.go stays core; all 3 cloud backends bind to it via k.provider.GetCredentials; eksBackend additionally binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no cloud SDK). Critical 2 — Phase 0's "split into four, zero logic change" silently dropped the single func init() that registers kind/k3s/eks/gke/aks. Splitting REQUIRES partitioning init() per-file (a distribution, not zero-change). Phase 0 now has an explicit step 2 for the init() partition; relabelled "behavior-equivalent" not "zero logic change"; k3s documented as reusing kindBackend (both stay core). Important 1 — platform.* cloud credential flow across PlatformBackend was unspecified (aksBackend needs CloudCredentials — how does it reach the plugin?). Added: PlatformBackend requests carry a CloudCredentials proto message; engine resolves k.provider.GetCredentials() in-core (config-map parsing, no SDK) and serialises it. Unified with the Architecture-3 credentials story — ONE CloudCredentials proto shape for both surfaces, so secret-redaction has one shape to redact. Important 2 — core actually imports FOUR cloud SDK trees, not three: godo is still in cloud_account_do.go + 5 platform_do_*.go files. §Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT (user's ask was 3 trees), and the go list -deps gate is reworded to assert "zero packages from the three in-scope trees" not "zero cloud SDKs". All "zero cloud SDKs" phrasing reconciled throughout. Minor: ListStates filter + remaining-proto-messages notes folded in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 1 Critical + 2 Important from cycle-5 review: Critical — the init()-partition fix (cycle-4) was kubernetes-only, but the SAME defect class exists in platform_dns.go / platform_ecs.go / platform_networking.go / platform_autoscaling.go: each has a single func init() registering BOTH a core-staying `mock` backend AND a plugin-bound `aws` backend. The old Phase B inventory moved those files wholesale → would exile the mock backends + dangle the route53 registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go" to a repo-wide uniform `_core.go` / `_<provider>.go` convention across the WHOLE platform.* family. Every mixed init() is partitioned; the audit script flags any init() registering a mix of core-staying + plugin-bound factories as a CI failure. Phase B inventory rewritten to delete only `_aws.go`/`_eks.go` files, never a mixed file. Important 1 — the cycle-4 "Known shape" prose reintroduced hand-maintained cross-file symbol claims (one already incomplete: parseStringSlice consumers). FIX: cut all per-file symbol enumerations; the section now states only invariants the script VERIFIES (not discovers) + the method. No transcribed symbol lists remain. Important 2 + own finding — cycle-4 said the engine resolves credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile) and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK. FIX: §Architecture-2 corrected — engine passes the DECLARED credential config (plain strings) in the CloudCredentials proto; the PLUGIN resolves (incl. the SDK-bearing profile/role_arn paths). Both cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B, no core replacement — all AWS cred resolution moves plugin-side. azure/gcp resolver files stay (their resolvers are genuinely SDK-free). Minor — backend-name collision: core-reserved names (memory/filesystem/ postgres/kind/k3s/mock) cause a load-time error if a plugin collides, not silent shadowing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 1 Critical + 2 Important from cycle-6 review: Critical — cycle-5's credential-flow fix replaced one false claim with another: it said the CloudCredentials struct already holds "declared config (plain strings incl. profile)". VERIFIED FALSE — the struct (cloud_account.go:18) has no Profile field (profile lives in Extra map) and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner than the struct change the reviewer proposed: the struct needs NO change (Extra map already carries markers, RoleARN field exists). Instead, cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig, sts.AssumeRole) are removed; they keep their SDK-free heads (record declared inputs + an Extra["credential_source"] marker, exactly as awsStaticResolver already does). After the edit the file is SDK-free and stays in core alongside the azure/gcp resolver files. Only cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider + awsProviderFrom) is deleted; its profile-chain/STS logic moves into the plugin's buildAWSConfig. Every in-core resolver becomes uniformly "declare, don't resolve"; the plugin honors the markers. No unregistered- resolver failure mode — the resolver init() registrations stay. Important 1 — §Phase-0 misidentified the DNS file with the mixed init(). VERIFIED: platform_dns.go:66 has the init() (+ interface + factory registry); platform_dns_backends.go has both impls + the route53 SDK import, NO init(). DNS is a TWO-file split, unlike single-file ecs/networking/autoscaling. §Phase-0 now states the per-family layout explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling one-file) and notes the audit script determines it. Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit deferred-resolution markers for env/CLI/managed-identity/workload-identity/ profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5 now state the plugin MUST implement marker handling for every deferred type, not just AWS profile/role_arn. Minor — safeIntToInt32 relocation rationale clarified (it's a clean copy-source for the plugin-bound files, not a hard core necessity); parseStringSlice IS a hard necessity (its file is deleted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical from cycle-7 review (architecture confirmed sound; these are the last two extraction-mechanic precision gaps): C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a clean contiguous tail, but awsRoleARNResolver's SDK block (base-config build + sts.AssumeRole, ~45 lines) is the larger half of the method, after the declared-input recording. FIX: §Architecture-2 re-characterizes the edit as a deliberate Resolve() body REWRITE (not a one-line snip) — explicitly per-resolver. Added a Phase B CI invariant: an import-block grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not prose-asserted. C2 — cloud_account_aws.go defines FOUR symbols, not one; the symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed: - AWSConfigProvider interface signature names aws.Config → CANNOT stay in core, deleted with the file. - awsProviderFrom → deleted with the interface. - ValidateCredentials → verified NO real caller (only a comment ref in cmd/wfctl/deploy.go:866) → deletes cleanly. - The 8 awsProviderFrom consumers are all verified plugin-bound — but each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the plugin there's no cloud.account to type-assert. §Cross-file-coupling invariant 3 now states Phase B must REWRITE all 8 consumers to obtain creds from the CloudCredentials proto + buildAWSConfig — explicit Phase B scope, not a footnote. Phase B table atomicity column updated. Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in Phase 0 so the dns family conforms to the uniform _core.go/_aws.go naming; no special-case three-file layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t post-#653 main Cycle-8 adversarial review caught the design's file/symbol inventory as stale: it predated issue #653 (closed 2026-05-13), which already removed the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild + EKS backends. Re-baselined every file/symbol claim against origin/main HEAD (worktree confirmed 0 commits behind origin/main): - Added "Relationship to issue #653" section — this design is #653's named successor, extracting the AWS surface #653 scoped out ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces. - Problem table corrected: AWS 6 real-import files (not 13), Azure 3, GCP 3. storage_artifact_s3.go is comment-only — stays in core. - cloud_account_aws.go is dead code — zero non-test consumers verified; deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers removed by #653). - Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go); parseStringSlice + safeIntToInt32 no longer exist — helper-relocation task deleted. - PlatformBackend now serves only aks + gke (eks already a #653 SDK-free stub); interface-audit spike audits one interface, not five. - Phase B inventory rewritten; Phase A/C file lists corrected. - Self-challenge doubt #4 + Assumption 7 added: inventory staleness is the cycle-8 defect class; audit script makes it CI-enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t script Cycle-9 adversarial review caught aksBackend mis-classified as an azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match is a stale doc comment (line 332) — aksBackend.azureToken is a plain net/http OAuth2 client. An import-block-disciplined re-survey found a second comment-only false positive: nosql_dynamodb.go. Structural fix for the recurring "grep matched a comment" defect class: added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks (never comments) and emits the comment-immune real-import map. Its output now populates every file table in the design — prose claims replaced by a build artifact. Formalized + CI-wired in Phase 0. Corrected inventory (audit-script output): AWS 5 real-import files (not 6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go are comment-only stubs — out of scope, stay in core. Design consequences of aksBackend being SDK-free: - Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all stay in core. - Architecture §2 no longer proposes a new PlatformBackend contract. The gke cross-process mechanism is gated on an interface-audit spike whose preferred outcome is folding into the existing ResourceDriver contract — a dedicated contract for one backend is YAGNI. - Phase A (Azure) is now pure IaCStateBackend — touches no platform file. - Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/ eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and fixes the stale line-332 comment. - The gke platform extraction + its contract decision move to Phase C. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…scope boundary explicit
Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9
audit script scanned only module/, missing five aws-sdk-go-v2 importers
under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal
("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by
the four phases as written.
Structural fix — third defect-class variant closed:
- audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and
splits results module/ vs. elsewhere. Comment-immune (cycle 9) +
scope-complete (cycle 10) + CI-enforced (Phase 0).
Whole-repo inventory result:
- Azure + GCP SDK usage is entirely module/-resident → Phases A and C
drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate).
- aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files
in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go.
Scope decision: the out-of-module/ AWS surface is exactly #653's
deliberately-retained "RBAC/secrets/artifact stay" scope (plus the
provider/aws deploy provider). This design does NOT unilaterally
override #653's recent documented decision — it scopes that surface OUT
(new Non-Goal, parallel to godo) and logs a recommended successor issue.
Consequences threaded through the doc:
- Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is
module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the
out-of-scope surface).
- Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP,
module/-scoped zero for AWS.
- Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's
scope decision is respected, not contested.
- Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check +
sessionName extraction sit between declared-input recording and the
SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale
line-18 backend-list comment), M2 (internal/legacyaws noted).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adversarial review cycle 11: PASS (zero Critical, zero Important). Two Minor nits applied: - audit-cloud-symbols.sh: real_import now also matches single-line `import "..."` form, not just parenthesized blocks — closes the one latent parser false-negative the reviewer flagged. - §Goals: clarified that the module/-scoped AWS-zero `--check` assertion is deferred-implementation added in Phase C (the committed script only enforces the cloud_account_aws_creds.go post-Phase-B invariant today), parallel to the Phase 0 init()-partition deferral. Design phase complete — proceeding to writing-plans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e pipe The cycle-11 single-line-import hardening added an inner `grep -E '^import "'` whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`, making real_import() return false for every file lacking a single-line import. Added `|| true` on the inner grep. Verified: full report restored, all REAL/comment-only classifications correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e A) Bite-sized TDD plan for the first executable increment: Phase 0 (split platform_kubernetes_kind.go, fix the stale comment, wire the audit script into CI) + Phase A (IaCStateBackend proto + benchmark-gated proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging guard, workflow-plugin-azure implementation, core deletion dropping azure-sdk from go.mod). 14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on plan — their concrete tasks depend on Phase A's outputs (the benchmark-validated proto shape, the host-resolution pattern, the plugin-side serve path), so planning them now would be fiction. The design doc remains the authoritative B/C/D spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al review Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor). All addressed: - C1 (Critical): Task 4's proto used google.protobuf.Struct, which iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config as `bytes outputs_json`/`bytes config_json` (the established ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not structpb. Removed the bogus struct.proto import step. - I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives there), not `cd plugin/external/proto`. - I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up the new benchmarks automatically) — no redundant harness; clarified the task is a one-time decision gate. - I3: Task 8's embedded research spike resolved at plan time — engine.go was read; integration is the design-sanctioned package-level module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal. Tasks 8/13/14 now have concrete file sets. - I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo, workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly. - M1: Task 5's benchmark file is now genuinely self-contained (local benchStateToProto + benchStateBackendServer; no forward references). - M2: Task 3 names ci.yml directly, places the audit job beside the existing godo-banned/aws-sdk-banned grep-gate jobs. - M3: Task 6 pins benchstat (go install + bare invocation). - M4: Task 9 states the redaction gap is verified against step_output_redactor.go:7-19, not a live deduction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed resolved; 2 new Important + 2 Minor surfaced by the new-defect scan. All addressed: - I-A (Important): Task 9's redaction test was inconsistent with the actual redactMap behavior — a key named `credentials` matches the existing `credential` pattern and is wholesale-replaced with the placeholder STRING before any recursion, so the test's `.(map[string]any)` assertion panicked. Reworked Task 9: the `credentials:` block is ALREADY redacted wholesale (regression-tested); the real gap is `credentials_ref` being over-redacted (it's a module name, not a secret) — fix is a narrow `*_ref`-suffix exemption in isSensitiveField, not camelCase leaf patterns (which would be dead code given wholesale redaction happens first). - I-B (Important): Task 14's engine.go integration seam was under-specified and would fight loadPluginInternal's no-concrete-types precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now defines an `IaCStateBackendProvider` optional interface and type-asserts it in loadPluginInternal exactly like the existing stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter implements it. Concrete file set + code sketch added. - M-i: Task 6's benchmark.yml description corrected (runs `go test -bench=.` inline, not `make bench-baseline`). - M-ii: Task 4 notes the proto README's plugin.proto-specific wording is stale; trust root buf.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inor cleanups Plan-phase adversarial review cycle 3: PASS (zero Critical, zero Important). Two Minor doc-tightening fixes applied: - Task 9 Step 4 now names bearer_token_ref explicitly and explains why the *_ref exemption is safe for it (SecretRef is a reference struct, not a raw secret) rather than claiming no *_ref field exists. - engine.go line citations corrected to 311-326. Plan phase complete — proceeding to alignment-check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/ aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core init(); gkeBackend (the only SDK-bearing k8s backend) moves to platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same five backend names registered. Isolates the lone SDK-bearing platform file for a later clean deletion.
Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition self-documenting for readers without the commit message.
aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk consumer. The stale comment is what fooled an earlier inventory pass into mis-counting platform_kubernetes_kind.go as an azure-sdk importer.
Extends audit-cloud-symbols.sh --check with an init()-partition assertion (platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned / aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced artifact rather than a prose claim.
…nding Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar. Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB map[string]any (inherent to the bytes outputs_json wire format the iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB message cap. The plan's contingency remedy (streaming redesign) addresses message-size-cap + memory-buffering, neither of which the benchmark hits; streaming would not move the number. Recommendation: retain unary (6.5ms is still negligible vs real cloud backend I/O — the design's own bar-rationale). Deviation from the literal 5ms estimate-bar is surfaced to the operator, not absorbed silently. Scope lock intact: Task 6 run + recorded, no task added/dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-confirmed) Operator reviewed the 6.51ms benchmark + root-cause analysis and confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's gate resolves Unary LOCKED — the Task 4 proto stands, no streaming redesign, PR 2/3 proceed unchanged. Operator additionally raised a long-term architectural item: IaC state is persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR) with JSON-export + content-detection-on-read would be better for processing/type-correctness/large-state scaling. Logged as a post-extraction follow-up in both the benchmark decision record and the design doc's Open items — distinct from the wire contract, cross-cutting across all IaCStateStore impls, needs its own brainstorming pass. Not actioned in this locked plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954.
Operator-approved scope amendment to the (reverted-to-Draft) plan: - ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 / Task 15. Task 7 had to hardcode context.Background() in grpcIaCStateStore; the operator directed widening the interface now while we're at that boundary, so Phase B/C/D plugin backends inherit it ctx-ful. Bounded blast radius (~9 files, all in module/); interfaces.IaCStateStore already had ctx and is untouched. - ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo. Operator: agents should operate in plugin repos directly; the real requirement is prompt clarity (absolute repo path stated up front), not a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes updated accordingly. - Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note updated to RESOLVED (unary locked). Next: re-run alignment-check on the amended plan, then re-lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strict 6-method contract mirroring module.IaCStateStore 1:1, with an IaCState message mirroring module.IaCState. Free-form Outputs/Config maps cross the wire as bytes outputs_json/config_json per the iac.proto hard invariant (NO google.protobuf.Struct) — same pattern as ResourceState.outputs_json. Unary RPCs. No TTL field. Regenerated bindings via buf.
Drives a ~1 MB synthetic IaCState through Lock/GetState/SaveState/Unlock both in-process (baseline) and over a real bufconn gRPC boundary (post-extraction path). Self-contained (local benchStateToProto + benchStateBackendServer; Task 7 promotes production versions). Feeds the unary-vs-streaming proto-transport decision in the next task.
Task 4 added the IaCStateBackend service to iac.proto but missed the corresponding iacServiceChecks row in wftest/bdd/strict_iac.go. TestIaCServiceChecks_CoversEveryProtoService enforces parity between iac.proto's services and that table — it was failing on the missing entry. Belongs with PR 2 (the proto PR).
…apter grpcIaCStateStore implements module.IaCStateStore over an IaCStateBackendClient — the host-side half of the new contract. iacStateToProto/iacStateFromProto convert the free-form Outputs/Config maps via encoding/json (no structpb — iac.proto hard invariant). iacStateBackendServer is the production server type. Promotes these out of the benchmark file so one canonical copy is shared.
Code-review Minor: the spec asked for the hardcoded context.Background() to be acknowledged as a known follow-up (IaCStateStore has no ctx param) rather than silently used.
A package-level iacStateBackendRegistry maps a backend name to a pb.IaCStateBackendClient; the engine populates it at plugin-load time (Task 14). IaCModule.Init()'s switch gains a default arm that resolves non-core backend names from the registry, constructing a grpcIaCStateStore. Reserved core names (memory/filesystem/postgres) are rejected at registration. The existing in-process backend cases (incl. azure_blob) are untouched here — the plumbing exists and is tested; PR 5 flips azure_blob onto it.
…redaction Option-1 credentials move raw cloud secrets inline into plugin-native module config under a credentials: key — already redacted wholesale by the existing 'credential' pattern (regression test added). But that same pattern over-redacts credentials_ref:, which holds a module NAME, not a secret. Adds a narrow *_ref-suffix exemption to isSensitiveField so reference keys are preserved for trace debuggability.
Code-review Minor: refFieldSuffix const for consistency with the existing safeFieldSuffix (_display) exemption.
CreateModule requests carry inline credentials: blocks (Option-1 credentials model). This guard fails CI if any plugin/external/ file gains a gRPC interceptor option, forcing a reviewer to confirm it cannot log request bodies. Implements the cloud-sdk-extraction design's Security guard-test requirement.
Code-review catch: the guard regex covered Unary only. CreateModule is unary today, but a future streaming RPC carrying credentials must not slip a stream interceptor past the guard. Now matches (Unary|Stream).
…ndment) Widens module.IaCStateStore's 6 methods with a leading ctx parameter so grpcIaCStateStore plumbs the caller's real context (was context.Background()) and iacStateBackendServer forwards its gRPC ctx into the store. The 6 in-process backends accept ctx; postgres/spaces/ gcs/azure use it for their SDK/DB calls. pipeline_step_iac.go callers pass the step context. Operator-approved scope amendment — see decisions/0033. The separate interfaces.IaCStateStore already had ctx and is untouched. Caller inventory note: cmd/wfctl/infra_state_store.go (the wfctl wrapper around the concrete Spaces/Postgres stores) was also updated — a mechanical consequence of the widening, beyond the plan's Files list; its wrapper methods already carried a ctx. Rollback: revert this commit — mechanical signature-only widening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion-iacstore-ctx # Conflicts: # module/benchmark_iac_state_backend_test.go # module/iac_state_grpc_client.go # module/iac_state_grpc_client_test.go # module/iac_state_plugin_registry.go # module/iac_state_plugin_registry_test.go # module/step_output_redactor.go # module/step_output_redactor_test.go # plugin/external/grpc_logging_guard_test.go # scripts/audit-cloud-symbols.sh
There was a problem hiding this comment.
Pull request overview
This PR is part of the Cloud-SDK extraction series and updates the in-module module.IaCStateStore interface to accept ctx context.Context as the first argument for all six methods, enabling proper context propagation (cancellation/deadlines) through in-process backends and the gRPC-backed adapter/server.
Changes:
- Widened
module.IaCStateStoremethod signatures to(... ctx context.Context, ...)and updated in-process implementations to accept/plumbctx. - Updated pipeline IaC steps, gRPC adapter/server, wfctl wrappers, and benchmarks/tests to pass
ctxthrough to the store calls. - Updated cloud/DB backends (Spaces/S3, GCS, Azure Blob, Postgres) to use the provided
ctxfor SDK/DB operations instead ofcontext.Background().
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| module/pipeline_step_iac.go | Passes step ctx through to IaC state store operations in IaC pipeline steps. |
| module/pipeline_step_iac_test.go | Updates tests to call state store methods with an explicit context. |
| module/iac_state.go | Updates module.IaCStateStore interface to accept context.Context on all methods. |
| module/iac_state_test.go | Updates shared IaCStateStore test suite to pass context. |
| module/iac_state_spaces.go | Plumbs ctx into S3/Spaces SDK calls for state operations. |
| module/iac_state_spaces_test.go | Updates Spaces state store tests to pass context. |
| module/iac_state_postgres.go | Plumbs ctx into Postgres connection operations for state CRUD/locking. |
| module/iac_state_postgres_test.go | Updates Postgres state store tests to pass context. |
| module/iac_state_memory.go | Updates in-memory store to accept ctx in method signatures. |
| module/iac_state_grpc_client.go | Updates grpc-backed IaCStateStore adapter and backend server to forward caller/gRPC ctx. |
| module/iac_state_grpc_client_test.go | Updates grpc store round-trip test to pass context. |
| module/iac_state_gcs.go | Plumbs ctx into GCS client operations for state CRUD/locking. |
| module/iac_state_gcs_test.go | Updates GCS state store tests to pass context. |
| module/iac_state_fs.go | Updates filesystem store to accept ctx in method signatures. |
| module/iac_state_azure.go | Plumbs ctx into Azure Blob client operations for state CRUD/locking. |
| module/iac_state_azure_test.go | Updates Azure Blob state store tests to pass context. |
| module/benchmark_iac_state_backend_test.go | Updates benchmark to use context-aware IaCStateStore methods. |
| cmd/wfctl/infra_state_store.go | Updates wfctl infra state store wrappers to forward ctx into module backends. |
Comments suppressed due to low confidence (3)
module/pipeline_step_iac.go:206
- IaCApplyStep swallows errors from GetState and from the intermediate SaveState calls that mark provisioning/error. With ctx now plumbed through the store, context cancellation/timeouts (and backend failures) can be silently ignored, causing Apply to run while state transitions are not persisted. Consider propagating these errors (or at minimum returning when ctx is canceled) so callers get a consistent result.
// Transition state to provisioning before calling Apply.
existing, _ := store.GetState(ctx, s.resourceID)
if existing != nil {
existing.Status = "provisioning"
existing.UpdatedAt = now
_ = store.SaveState(ctx, existing)
}
module/pipeline_step_iac.go:399
- IaCDestroyStep swallows errors from GetState and from the intermediate SaveState calls that mark destroying/error. With ctx now plumbed through the store, cancellations/timeouts can be silently ignored and the step may report success even though state transitions failed to persist. Consider handling/propagating these errors (especially ctx cancellation) instead of ignoring them.
now := nowUTC()
existing, _ := store.GetState(ctx, s.resourceID)
if existing != nil {
existing.Status = "destroying"
existing.UpdatedAt = now
_ = store.SaveState(ctx, existing)
}
module/iac_state_spaces.go:204
- ListStates now uses the passed ctx for GetObject, but errors are unconditionally skipped. If ctx is canceled or its deadline is exceeded mid-listing, this will return partial results instead of failing fast with the context error, which undermines the purpose of adding ctx. Consider detecting context cancellation/deadline errors and returning them, while still skipping truly unreadable/corrupt objects.
getOut, err := s.client.GetObject(ctx, &s3.GetObjectInput{
Bucket: &s.bucket,
Key: obj.Key,
})
if err != nil {
continue // skip unreadable objects
}
…lowing Addresses Copilot review on PR #671 — now that IaCStateStore carries the caller's ctx, silently dropping its errors lets a canceled step proceed against a stale state view: - pipeline_step_iac.go: iac_plan/iac_apply/iac_destroy now go through a lookupExistingState helper that returns any GetState error (including ctx cancellation/deadline) so the step aborts. - iac_state_gcs.go / iac_state_azure.go: ListStates aborts on a context-cancellation error mid-iteration rather than returning partial results; genuinely unreadable objects/blobs are still skipped. - iac_state.go: ListStates doc clarifies nil filter == "no filter" to match actual call sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
⏱ Benchmark Results✅ No significant performance regressions detected. benchstat comparison (baseline → PR)
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Summary
Operator-approved scope amendment (
decisions/0033). Widens themodule.IaCStateStoreinterface's 6 methods to takectx context.Contextas the first parameter — sogrpcIaCStateStoreplumbs the caller's real context (wascontext.Background()),iacStateBackendServerforwards its gRPC-received ctx, and the deferred Phase B/C/D plugin backends inherit a ctx-ful interface from the start.memory/fs/postgres/spaces/gcs/azure— the SDK-bearing ones now use the passed ctx for their SDK/DB calls) +grpcIaCStateStore/iacStateBackendServer+ thepipeline_step_iac.gocallers +cmd/wfctl/infra_state_store.go(the one out-of-module/caller — its wrapper methods already carried a discarded_ ctx) + test files. No logic changes. The separateinterfaces.IaCStateStorealready had ctx and is untouched.Scope Manifest
PR Count: 6 · Tasks: 15 · Status: Locked 2026-05-14T10:37:04Z
This is PR 6 —
Task 15— branchfeat/cloud-sdk-extraction-iacstore-ctx. Added by the amendment recorded indecisions/0033(manifest re-aligned + re-locked).Stacked PR
Base:
feat/cloud-sdk-extraction-pa-host(PR #670). Merge last of the four. Retarget after #670 merges.Test status
Full
go test ./...green except the two pre-existing unrelated failures (TestFallbackRuns,TestInfraMultiEnv_E2E) — see PR #668.🤖 Generated with Claude Code