Cloud-SDK extraction — Phase A: IaCStateBackend proto + benchmark + proto-lock#669
Conversation
…tract plugins Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and cloud.google.com/go + google.golang.org/api direct deps from workflow core's module/ package. Architecture: 3 extension surfaces, 3 strategies: - IaC state backends → new IaCStateBackend strict proto contract; iac.state stays core, config.backend dispatches to plugin gRPC client. - platform.* provisioners → new PlatformBackend strict proto contract; module types + provider: key stay core, kind backend stays in-core, cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract. - standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload, storage.s3, storage.gcs) → plugin-native module/step types via the existing ModuleFactories/StepFactories SDK — no new contract. Credentials (Option 1): each plugin-native module carries its own credentials: block + builds aws.Config in-process; optional in-plugin credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp cloud_account files have no SDK import and stay. 4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp, D digitalocean (spaces backend, minor bump + migration doc). Includes Assumptions + Rollback sections + self-challenge top-3 doubts (PlatformBackend over-generality, provider-separability fragility, benchmark-could-invalidate-unary-default — all with mitigations deferred to writing-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 5 Important findings from adversarial-design-review: Critical: - iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit home: deleted by Phase B's core PR; Phase D reframed from soft-compat to a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2 entirely" is now actually achieved by the phases as written + enforced by a go list -deps CI gate. - kinesis: added Non-Goals entry explaining it's a transitive dep of modular/modules/eventbus/v2, not a direct workflow import — out of scope, with the go mod why chain documented so the literal ask is fully answered. Important: - Full grep-verified 13-file AWS inventory table in Phase B with per-file destinations; reconciled aws_api_gateway.go (route-sync module) vs platform_apigateway.go (provisioner) as two distinct files. - aksBackend assigned to Phase A (Azure gets the PlatformBackend half too); platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with explicit always-compiles coordination. - Proto contracts fold into existing plugin/external/proto/iac.proto (8 services already) instead of new files — matches precedent. - New Security section: secret-redaction in config-version-store/tracing + gRPC interceptor logging are blocking writing-plans tasks; credentials_ref blast radius documented as strictly narrower than today's cloud.account. Minor: - IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no speculative surface. - Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR). - IaCProviderRequired/ResourceDriver reuse promoted to a first-class Alternatives Considered entry with accept/reject rationale + retained as the gated fallback for PlatformBackend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 3 Important from cycle-2 review: Critical: - platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with its own import block). The "always compiles across phases" property is now structural, not asserted. Added a verified per-file import-ownership table. - Corrected the false Phase A rationale: aksBackend uses raw net/http REST, NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend region). The Azure go.mod drop comes entirely from iac_state_azure.go deletion + iac_module.go edit; aksBackend extraction is code-organisation, not a dependency change. - Documented the eksBackend → cloud_account_aws.go call-graph edge as a hard same-commit atomicity constraint (verified: eksBackend calls awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138). Important: - Phase B core-PR bullet now explicitly lists "strip the spaces case from iac_module.go" (was only obliquely referenced). - New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds contract field; SaveState lost-response retry → documented idempotent (full-state replace, last-writer-wins); plugin-unreachable → abort before mutation; PlatformBackend mid-Apply crash → identical to today's in-process risk, no new mitigation. - §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no body-logging interceptor (grpc.NewServer(opts...) passthrough; only callback_server.go logs, never module config). Writing-plans adds a guard test instead of a conditional interceptor. Minor: file-count table footnoted (count = importers, not deletions); shared s3compat module added as Alternatives Considered #3 (deferred, not rejected); self-challenge doubt numbering tidied (2 mitigations cover 3 doubts, intentionally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sed in the cycle-2 commit ran from the wrong cwd — Status line still said "cycle 1" and two interface-audit-spike references still said "Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 2 Important from cycle-3 review: Critical (same root — symbol-level coupling the import-block audit missed): - parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers the plugin-bound backend files call. An import-block audit is symbol-blind. Fix: Phase 0 now does TWO moves — the file split AND relocating both helpers into a new SDK-free core module/cloud_helpers.go. Per-file table gains a "cross-file symbol deps (the trap)" column listing every helper edge per backend. Phase 0 acceptance criteria now include a grep that no core file references the helpers from their old homes. - §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file (module shell + kubernetesBackend interface + safeIntToInt32) — NOT touched by the split; only platform_kubernetes_kind.go (holds all 4 backends) is split. Earlier draft conflated the two files. Important: - Per-file ownership table relabelled "intended post-split — verified by the Phase 0 build gate" (was asserted-as-verified against an unsplit file — same hand-waving class cycle-2 flagged for "always compiles"). - lease_ttl_seconds DROPPED from the Phase A proto. It was a contract field with no enforced semantics and no implementing backend in scope — YAGNI. §Failure-modes orphaned-lock reworked: documented limitation + operator-side lock-object delete for recovery; TTL is a planned ADDITIVE follow-up paired with a conformance test, shipped with the first backend that honors expiry. Added explicit Lock-contention behavior (immediate error, matches today's in-process IaCStateStore.Lock — no new waiting state). Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence fixed; §Assumptions 2 notes Phase 0 de-risks it structurally. Also: removed a stray stale cycle-1 copy of this doc that was sitting untracked in the main workflow checkout (the canonical doc is here in the feat/cloud-sdk-extraction worktree). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 2 Important from cycle-4 review: Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't — that's eksBackend) and aksBackend has no cross-file deps (it does — CloudCredentials/CloudCredentialProvider from cloud_account.go, same as gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The symbol-ownership map is now a Phase 0 build artifact — scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a design-doc claim that rots on every edit. The design commits to the *method* + the *known shape* (cloud_account.go stays core; all 3 cloud backends bind to it via k.provider.GetCredentials; eksBackend additionally binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no cloud SDK). Critical 2 — Phase 0's "split into four, zero logic change" silently dropped the single func init() that registers kind/k3s/eks/gke/aks. Splitting REQUIRES partitioning init() per-file (a distribution, not zero-change). Phase 0 now has an explicit step 2 for the init() partition; relabelled "behavior-equivalent" not "zero logic change"; k3s documented as reusing kindBackend (both stay core). Important 1 — platform.* cloud credential flow across PlatformBackend was unspecified (aksBackend needs CloudCredentials — how does it reach the plugin?). Added: PlatformBackend requests carry a CloudCredentials proto message; engine resolves k.provider.GetCredentials() in-core (config-map parsing, no SDK) and serialises it. Unified with the Architecture-3 credentials story — ONE CloudCredentials proto shape for both surfaces, so secret-redaction has one shape to redact. Important 2 — core actually imports FOUR cloud SDK trees, not three: godo is still in cloud_account_do.go + 5 platform_do_*.go files. §Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT (user's ask was 3 trees), and the go list -deps gate is reworded to assert "zero packages from the three in-scope trees" not "zero cloud SDKs". All "zero cloud SDKs" phrasing reconciled throughout. Minor: ListStates filter + remaining-proto-messages notes folded in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 1 Critical + 2 Important from cycle-5 review: Critical — the init()-partition fix (cycle-4) was kubernetes-only, but the SAME defect class exists in platform_dns.go / platform_ecs.go / platform_networking.go / platform_autoscaling.go: each has a single func init() registering BOTH a core-staying `mock` backend AND a plugin-bound `aws` backend. The old Phase B inventory moved those files wholesale → would exile the mock backends + dangle the route53 registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go" to a repo-wide uniform `_core.go` / `_<provider>.go` convention across the WHOLE platform.* family. Every mixed init() is partitioned; the audit script flags any init() registering a mix of core-staying + plugin-bound factories as a CI failure. Phase B inventory rewritten to delete only `_aws.go`/`_eks.go` files, never a mixed file. Important 1 — the cycle-4 "Known shape" prose reintroduced hand-maintained cross-file symbol claims (one already incomplete: parseStringSlice consumers). FIX: cut all per-file symbol enumerations; the section now states only invariants the script VERIFIES (not discovers) + the method. No transcribed symbol lists remain. Important 2 + own finding — cycle-4 said the engine resolves credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile) and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK. FIX: §Architecture-2 corrected — engine passes the DECLARED credential config (plain strings) in the CloudCredentials proto; the PLUGIN resolves (incl. the SDK-bearing profile/role_arn paths). Both cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B, no core replacement — all AWS cred resolution moves plugin-side. azure/gcp resolver files stay (their resolvers are genuinely SDK-free). Minor — backend-name collision: core-reserved names (memory/filesystem/ postgres/kind/k3s/mock) cause a load-time error if a plugin collides, not silent shadowing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 1 Critical + 2 Important from cycle-6 review: Critical — cycle-5's credential-flow fix replaced one false claim with another: it said the CloudCredentials struct already holds "declared config (plain strings incl. profile)". VERIFIED FALSE — the struct (cloud_account.go:18) has no Profile field (profile lives in Extra map) and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner than the struct change the reviewer proposed: the struct needs NO change (Extra map already carries markers, RoleARN field exists). Instead, cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig, sts.AssumeRole) are removed; they keep their SDK-free heads (record declared inputs + an Extra["credential_source"] marker, exactly as awsStaticResolver already does). After the edit the file is SDK-free and stays in core alongside the azure/gcp resolver files. Only cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider + awsProviderFrom) is deleted; its profile-chain/STS logic moves into the plugin's buildAWSConfig. Every in-core resolver becomes uniformly "declare, don't resolve"; the plugin honors the markers. No unregistered- resolver failure mode — the resolver init() registrations stay. Important 1 — §Phase-0 misidentified the DNS file with the mixed init(). VERIFIED: platform_dns.go:66 has the init() (+ interface + factory registry); platform_dns_backends.go has both impls + the route53 SDK import, NO init(). DNS is a TWO-file split, unlike single-file ecs/networking/autoscaling. §Phase-0 now states the per-family layout explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling one-file) and notes the audit script determines it. Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit deferred-resolution markers for env/CLI/managed-identity/workload-identity/ profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5 now state the plugin MUST implement marker handling for every deferred type, not just AWS profile/role_arn. Minor — safeIntToInt32 relocation rationale clarified (it's a clean copy-source for the plugin-bound files, not a hard core necessity); parseStringSlice IS a hard necessity (its file is deleted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical from cycle-7 review (architecture confirmed sound; these are the last two extraction-mechanic precision gaps): C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a clean contiguous tail, but awsRoleARNResolver's SDK block (base-config build + sts.AssumeRole, ~45 lines) is the larger half of the method, after the declared-input recording. FIX: §Architecture-2 re-characterizes the edit as a deliberate Resolve() body REWRITE (not a one-line snip) — explicitly per-resolver. Added a Phase B CI invariant: an import-block grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not prose-asserted. C2 — cloud_account_aws.go defines FOUR symbols, not one; the symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed: - AWSConfigProvider interface signature names aws.Config → CANNOT stay in core, deleted with the file. - awsProviderFrom → deleted with the interface. - ValidateCredentials → verified NO real caller (only a comment ref in cmd/wfctl/deploy.go:866) → deletes cleanly. - The 8 awsProviderFrom consumers are all verified plugin-bound — but each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the plugin there's no cloud.account to type-assert. §Cross-file-coupling invariant 3 now states Phase B must REWRITE all 8 consumers to obtain creds from the CloudCredentials proto + buildAWSConfig — explicit Phase B scope, not a footnote. Phase B table atomicity column updated. Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in Phase 0 so the dns family conforms to the uniform _core.go/_aws.go naming; no special-case three-file layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t post-#653 main Cycle-8 adversarial review caught the design's file/symbol inventory as stale: it predated issue #653 (closed 2026-05-13), which already removed the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild + EKS backends. Re-baselined every file/symbol claim against origin/main HEAD (worktree confirmed 0 commits behind origin/main): - Added "Relationship to issue #653" section — this design is #653's named successor, extracting the AWS surface #653 scoped out ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces. - Problem table corrected: AWS 6 real-import files (not 13), Azure 3, GCP 3. storage_artifact_s3.go is comment-only — stays in core. - cloud_account_aws.go is dead code — zero non-test consumers verified; deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers removed by #653). - Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go); parseStringSlice + safeIntToInt32 no longer exist — helper-relocation task deleted. - PlatformBackend now serves only aks + gke (eks already a #653 SDK-free stub); interface-audit spike audits one interface, not five. - Phase B inventory rewritten; Phase A/C file lists corrected. - Self-challenge doubt #4 + Assumption 7 added: inventory staleness is the cycle-8 defect class; audit script makes it CI-enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t script Cycle-9 adversarial review caught aksBackend mis-classified as an azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match is a stale doc comment (line 332) — aksBackend.azureToken is a plain net/http OAuth2 client. An import-block-disciplined re-survey found a second comment-only false positive: nosql_dynamodb.go. Structural fix for the recurring "grep matched a comment" defect class: added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks (never comments) and emits the comment-immune real-import map. Its output now populates every file table in the design — prose claims replaced by a build artifact. Formalized + CI-wired in Phase 0. Corrected inventory (audit-script output): AWS 5 real-import files (not 6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go are comment-only stubs — out of scope, stay in core. Design consequences of aksBackend being SDK-free: - Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all stay in core. - Architecture §2 no longer proposes a new PlatformBackend contract. The gke cross-process mechanism is gated on an interface-audit spike whose preferred outcome is folding into the existing ResourceDriver contract — a dedicated contract for one backend is YAGNI. - Phase A (Azure) is now pure IaCStateBackend — touches no platform file. - Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/ eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and fixes the stale line-332 comment. - The gke platform extraction + its contract decision move to Phase C. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…scope boundary explicit
Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9
audit script scanned only module/, missing five aws-sdk-go-v2 importers
under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal
("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by
the four phases as written.
Structural fix — third defect-class variant closed:
- audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and
splits results module/ vs. elsewhere. Comment-immune (cycle 9) +
scope-complete (cycle 10) + CI-enforced (Phase 0).
Whole-repo inventory result:
- Azure + GCP SDK usage is entirely module/-resident → Phases A and C
drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate).
- aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files
in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go.
Scope decision: the out-of-module/ AWS surface is exactly #653's
deliberately-retained "RBAC/secrets/artifact stay" scope (plus the
provider/aws deploy provider). This design does NOT unilaterally
override #653's recent documented decision — it scopes that surface OUT
(new Non-Goal, parallel to godo) and logs a recommended successor issue.
Consequences threaded through the doc:
- Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is
module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the
out-of-scope surface).
- Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP,
module/-scoped zero for AWS.
- Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's
scope decision is respected, not contested.
- Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check +
sessionName extraction sit between declared-input recording and the
SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale
line-18 backend-list comment), M2 (internal/legacyaws noted).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adversarial review cycle 11: PASS (zero Critical, zero Important). Two Minor nits applied: - audit-cloud-symbols.sh: real_import now also matches single-line `import "..."` form, not just parenthesized blocks — closes the one latent parser false-negative the reviewer flagged. - §Goals: clarified that the module/-scoped AWS-zero `--check` assertion is deferred-implementation added in Phase C (the committed script only enforces the cloud_account_aws_creds.go post-Phase-B invariant today), parallel to the Phase 0 init()-partition deferral. Design phase complete — proceeding to writing-plans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e pipe The cycle-11 single-line-import hardening added an inner `grep -E '^import "'` whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`, making real_import() return false for every file lacking a single-line import. Added `|| true` on the inner grep. Verified: full report restored, all REAL/comment-only classifications correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e A) Bite-sized TDD plan for the first executable increment: Phase 0 (split platform_kubernetes_kind.go, fix the stale comment, wire the audit script into CI) + Phase A (IaCStateBackend proto + benchmark-gated proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging guard, workflow-plugin-azure implementation, core deletion dropping azure-sdk from go.mod). 14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on plan — their concrete tasks depend on Phase A's outputs (the benchmark-validated proto shape, the host-resolution pattern, the plugin-side serve path), so planning them now would be fiction. The design doc remains the authoritative B/C/D spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al review Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor). All addressed: - C1 (Critical): Task 4's proto used google.protobuf.Struct, which iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config as `bytes outputs_json`/`bytes config_json` (the established ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not structpb. Removed the bogus struct.proto import step. - I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives there), not `cd plugin/external/proto`. - I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up the new benchmarks automatically) — no redundant harness; clarified the task is a one-time decision gate. - I3: Task 8's embedded research spike resolved at plan time — engine.go was read; integration is the design-sanctioned package-level module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal. Tasks 8/13/14 now have concrete file sets. - I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo, workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly. - M1: Task 5's benchmark file is now genuinely self-contained (local benchStateToProto + benchStateBackendServer; no forward references). - M2: Task 3 names ci.yml directly, places the audit job beside the existing godo-banned/aws-sdk-banned grep-gate jobs. - M3: Task 6 pins benchstat (go install + bare invocation). - M4: Task 9 states the redaction gap is verified against step_output_redactor.go:7-19, not a live deduction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed resolved; 2 new Important + 2 Minor surfaced by the new-defect scan. All addressed: - I-A (Important): Task 9's redaction test was inconsistent with the actual redactMap behavior — a key named `credentials` matches the existing `credential` pattern and is wholesale-replaced with the placeholder STRING before any recursion, so the test's `.(map[string]any)` assertion panicked. Reworked Task 9: the `credentials:` block is ALREADY redacted wholesale (regression-tested); the real gap is `credentials_ref` being over-redacted (it's a module name, not a secret) — fix is a narrow `*_ref`-suffix exemption in isSensitiveField, not camelCase leaf patterns (which would be dead code given wholesale redaction happens first). - I-B (Important): Task 14's engine.go integration seam was under-specified and would fight loadPluginInternal's no-concrete-types precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now defines an `IaCStateBackendProvider` optional interface and type-asserts it in loadPluginInternal exactly like the existing stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter implements it. Concrete file set + code sketch added. - M-i: Task 6's benchmark.yml description corrected (runs `go test -bench=.` inline, not `make bench-baseline`). - M-ii: Task 4 notes the proto README's plugin.proto-specific wording is stale; trust root buf.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inor cleanups Plan-phase adversarial review cycle 3: PASS (zero Critical, zero Important). Two Minor doc-tightening fixes applied: - Task 9 Step 4 now names bearer_token_ref explicitly and explains why the *_ref exemption is safe for it (SecretRef is a reference struct, not a raw secret) rather than claiming no *_ref field exists. - engine.go line citations corrected to 311-326. Plan phase complete — proceeding to alignment-check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/ aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core init(); gkeBackend (the only SDK-bearing k8s backend) moves to platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same five backend names registered. Isolates the lone SDK-bearing platform file for a later clean deletion.
Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition self-documenting for readers without the commit message.
aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk consumer. The stale comment is what fooled an earlier inventory pass into mis-counting platform_kubernetes_kind.go as an azure-sdk importer.
Extends audit-cloud-symbols.sh --check with an init()-partition assertion (platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned / aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced artifact rather than a prose claim.
…nding Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar. Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB map[string]any (inherent to the bytes outputs_json wire format the iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB message cap. The plan's contingency remedy (streaming redesign) addresses message-size-cap + memory-buffering, neither of which the benchmark hits; streaming would not move the number. Recommendation: retain unary (6.5ms is still negligible vs real cloud backend I/O — the design's own bar-rationale). Deviation from the literal 5ms estimate-bar is surfaced to the operator, not absorbed silently. Scope lock intact: Task 6 run + recorded, no task added/dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-confirmed) Operator reviewed the 6.51ms benchmark + root-cause analysis and confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's gate resolves Unary LOCKED — the Task 4 proto stands, no streaming redesign, PR 2/3 proceed unchanged. Operator additionally raised a long-term architectural item: IaC state is persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR) with JSON-export + content-detection-on-read would be better for processing/type-correctness/large-state scaling. Logged as a post-extraction follow-up in both the benchmark decision record and the design doc's Open items — distinct from the wire contract, cross-cutting across all IaCStateStore impls, needs its own brainstorming pass. Not actioned in this locked plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954.
Operator-approved scope amendment to the (reverted-to-Draft) plan: - ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 / Task 15. Task 7 had to hardcode context.Background() in grpcIaCStateStore; the operator directed widening the interface now while we're at that boundary, so Phase B/C/D plugin backends inherit it ctx-ful. Bounded blast radius (~9 files, all in module/); interfaces.IaCStateStore already had ctx and is untouched. - ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo. Operator: agents should operate in plugin repos directly; the real requirement is prompt clarity (absolute repo path stated up front), not a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes updated accordingly. - Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note updated to RESOLVED (unary locked). Next: re-run alignment-check on the amended plan, then re-lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strict 6-method contract mirroring module.IaCStateStore 1:1, with an IaCState message mirroring module.IaCState. Free-form Outputs/Config maps cross the wire as bytes outputs_json/config_json per the iac.proto hard invariant (NO google.protobuf.Struct) — same pattern as ResourceState.outputs_json. Unary RPCs. No TTL field. Regenerated bindings via buf.
Drives a ~1 MB synthetic IaCState through Lock/GetState/SaveState/Unlock both in-process (baseline) and over a real bufconn gRPC boundary (post-extraction path). Self-contained (local benchStateToProto + benchStateBackendServer; Task 7 promotes production versions). Feeds the unary-vs-streaming proto-transport decision in the next task.
Task 4 added the IaCStateBackend service to iac.proto but missed the corresponding iacServiceChecks row in wftest/bdd/strict_iac.go. TestIaCServiceChecks_CoversEveryProtoService enforces parity between iac.proto's services and that table — it was failing on the missing entry. Belongs with PR 2 (the proto PR).
There was a problem hiding this comment.
Pull request overview
Introduces the new strict IaCStateBackend gRPC contract (proto + generated bindings) and adds an in-repo benchmark harness to validate unary RPC transport cost for ~1 MB IaC state blobs, as part of Phase A of the cloud-SDK extraction effort. It also formalizes a repo-wide cloud-SDK import audit script and isolates the GKE Kubernetes backend into its own SDK-bearing file.
Changes:
- Add
IaCStateBackendservice +IaCStatemessage toplugin/external/proto/iac.protoand regenerate Go bindings; add a compile-level guard test. - Add benchmark comparing in-process
IaCStateStorecalls vs bufconn gRPC round-trips for the new service; record benchmark decision docs. - Add CI-enforced cloud-SDK audit script and split Kubernetes backends into SDK-free core vs SDK-bearing GKE file.
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| wftest/bdd/strict_iac.go | Extends strict-contract BDD capability checks to include IaCStateBackend. |
| scripts/audit-cloud-symbols.sh | Adds/updates a repo-wide cloud-SDK import audit tool with optional invariant checks. |
| plugin/external/proto/iac.proto | Defines the IaCStateBackend service and related request/response messages. |
| plugin/external/proto/iac.pb.go | Regenerated protobuf message/types for the updated iac.proto. |
| plugin/external/proto/iac_grpc.pb.go | Regenerated gRPC client/server stubs for the updated iac.proto (adds IaCStateBackend). |
| plugin/external/proto/iac_statebackend_test.go | Compile-level test ensuring the generated IaCStateBackend types exist and have expected shape. |
| module/platform_kubernetes_gke.go | Isolates the SDK-bearing GKE backend (GCP Container API) into its own file and registers gke. |
| module/platform_kubernetes_core.go | Removes GKE backend from core file; keeps SDK-free backends and updates AKS comment. |
| module/benchmark_iac_state_backend_test.go | Adds benchmark harness for in-process vs gRPC IaCStateBackend cycles with ~1 MB payload. |
| docs/plans/2026-05-14-iac-state-backend-benchmark.md | Records benchmark results and the unary-transport decision rationale. |
| docs/plans/2026-05-14-cloud-sdk-extraction.md.scope-lock | Adds scope-lock hash for the extraction plan. |
| docs/plans/2026-05-14-cloud-sdk-extraction.md | Adds the full implementation plan (Phase 0 + Phase A and sequencing). |
| docs/plans/2026-05-14-cloud-sdk-extraction-design.md | Adds the design doc describing the extraction architecture and constraints. |
| decisions/0034-cross-repo-agent-operation-for-plugin-prs.md | Records the decision to treat plugin-repo PRs as autonomous cross-repo agent work. |
| decisions/0033-add-ctx-to-module-iac-state-store.md | Records the decision to add context.Context to module.IaCStateStore later in the plan. |
| .github/workflows/ci.yml | Adds a CI job to run scripts/audit-cloud-symbols.sh --check. |
Files not reviewed (2)
- plugin/external/proto/iac.pb.go: Language not supported
- plugin/external/proto/iac_grpc.pb.go: Language not supported
…rt audit regex Addresses Copilot review on PR #669: - benchStateBackendServer.SaveState now rejects nil State and propagates JSON unmarshal failures as InvalidArgument instead of silently writing corrupted/empty data. - BenchmarkIaCStateBackend_GRPC closes the bufconn listener; comment no longer implies bufconn size sets the gRPC message cap. - audit-cloud-symbols.sh real_import() single-line regex now matches aliased/dot/blank imports (import foo "pkg" / . / _), not just plain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion-pa-proto # Conflicts: # scripts/audit-cloud-symbols.sh
⏱ Benchmark Results✅ No significant performance regressions detected. benchstat comparison (baseline → PR)
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Summary
Phase A, part 1 — defines the new strict
IaCStateBackendgRPC contract and validates the unary-transport decision with a real benchmark.service IaCStateBackend(6 RPCs, 1:1 withmodule.IaCStateStore) + theIaCStatemessage toplugin/external/proto/iac.proto; regenerate bindings viabuf. Honors theiac.protohard invariant (nogoogle.protobuf.Struct— free-form maps asbytes outputs_json/config_json). AddsIaCStateBackendtowftest/bdd'siacServiceCheckscoverage table.module/benchmark_iac_state_backend_test.go: drives a ~1 MB syntheticIaCStatethrough a fullLock/Get/Save/Unlockcycle, in-process vs. over a real bufconn gRPC boundary.bytes *_jsonwire format), not gRPC transport — so the streaming-redesign contingency was mis-targeted. Operator confirmed unary is acceptable; unary LOCKED. Decision recorded indocs/plans/2026-05-14-iac-state-backend-benchmark.md(carried in PR 1).Scope Manifest
PR Count: 6 · Tasks: 15 · Status: Locked 2026-05-14T10:37:04Z
This is PR 2 —
Tasks 4, 5, 6— branchfeat/cloud-sdk-extraction-pa-proto.Stacked PR
Base:
feat/cloud-sdk-extraction-p0(PR #668). Retarget tomainafter PR #668 merges.Test status
go test ./module/ ./plugin/external/... ./wftest/bdd/green. The two repo-wide pre-existing failures (TestFallbackRuns,TestInfraMultiEnv_E2E) are unrelated to this branch — see PR #668.🤖 Generated with Claude Code