Cloud-SDK extraction — Phase 0: split platform_kubernetes + audit-script CI gate (+ design & plan)#668
Merged
Merged
Conversation
…tract plugins Design for removing aws-sdk-go-v2, Azure/azure-sdk-for-go, and cloud.google.com/go + google.golang.org/api direct deps from workflow core's module/ package. Architecture: 3 extension surfaces, 3 strategies: - IaC state backends → new IaCStateBackend strict proto contract; iac.state stays core, config.backend dispatches to plugin gRPC client. - platform.* provisioners → new PlatformBackend strict proto contract; module types + provider: key stay core, kind backend stays in-core, cloud backends (eks/gke/ecs/route53/ec2/autoscaling) extract. - standalone modules/steps (apigateway, codebuild, dynamodb, s3_upload, storage.s3, storage.gcs) → plugin-native module/step types via the existing ModuleFactories/StepFactories SDK — no new contract. Credentials (Option 1): each plugin-native module carries its own credentials: block + builds aws.Config in-process; optional in-plugin credentials_ref for DRY. cloud_account_aws*.go deleted; azure/gcp cloud_account files have no SDK import and stay. 4 phases: A azure (validates IaCStateBackend), B aws (largest), C gcp, D digitalocean (spaces backend, minor bump + migration doc). Includes Assumptions + Rollback sections + self-challenge top-3 doubts (PlatformBackend over-generality, provider-separability fragility, benchmark-could-invalidate-unary-default — all with mitigations deferred to writing-plans). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 5 Important findings from adversarial-design-review: Critical: - iac_state_spaces.go (core file importing aws-sdk s3) now has an explicit home: deleted by Phase B's core PR; Phase D reframed from soft-compat to a real clean-break for the `spaces` backend. Goal "core drops aws-sdk-go-v2 entirely" is now actually achieved by the phases as written + enforced by a go list -deps CI gate. - kinesis: added Non-Goals entry explaining it's a transitive dep of modular/modules/eventbus/v2, not a direct workflow import — out of scope, with the go mod why chain documented so the literal ask is fully answered. Important: - Full grep-verified 13-file AWS inventory table in Phase B with per-file destinations; reconciled aws_api_gateway.go (route-sync module) vs platform_apigateway.go (provisioner) as two distinct files. - aksBackend assigned to Phase A (Azure gets the PlatformBackend half too); platform_kubernetes_kind.go split now spans 3 phases (aks/eks/gke) with explicit always-compiles coordination. - Proto contracts fold into existing plugin/external/proto/iac.proto (8 services already) instead of new files — matches precedent. - New Security section: secret-redaction in config-version-store/tracing + gRPC interceptor logging are blocking writing-plans tasks; credentials_ref blast radius documented as strictly narrower than today's cloud.account. Minor: - IaCStateBackend RPC set now maps 1:1 to the real module.IaCStateStore interface (GetState/SaveState/ListStates/DeleteState/Lock/Unlock) — no speculative surface. - Phase D rollback restated as a matched pair (Phase B core PR + DO plugin PR). - IaCProviderRequired/ResourceDriver reuse promoted to a first-class Alternatives Considered entry with accept/reject rationale + retained as the gated fallback for PlatformBackend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 3 Important from cycle-2 review: Critical: - platform_kubernetes_kind.go handling reworked. Added Phase 0: a pure mechanical precursor file-split (kind/eks/gke/aks → 4 files, each with its own import block). The "always compiles across phases" property is now structural, not asserted. Added a verified per-file import-ownership table. - Corrected the false Phase A rationale: aksBackend uses raw net/http REST, NOT the Azure SDK (verified — no azure-sdk symbol in the aksBackend region). The Azure go.mod drop comes entirely from iac_state_azure.go deletion + iac_module.go edit; aksBackend extraction is code-organisation, not a dependency change. - Documented the eksBackend → cloud_account_aws.go call-graph edge as a hard same-commit atomicity constraint (verified: eksBackend calls awsProviderFrom + AWSConfig at platform_kubernetes_kind.go:96,105,138). Important: - Phase B core-PR bullet now explicitly lists "strip the spaces case from iac_module.go" (was only obliquely referenced). - New §Failure modes section: orphaned-lock-on-plugin-crash → lease_ttl_seconds contract field; SaveState lost-response retry → documented idempotent (full-state replace, last-writer-wins); plugin-unreachable → abort before mutation; PlatformBackend mid-Apply crash → identical to today's in-process risk, no new mitigation. - §Security gRPC-logging bullet concretized: VERIFIED plugin SDK adds no body-logging interceptor (grpc.NewServer(opts...) passthrough; only callback_server.go logs, never module config). Writing-plans adds a guard test instead of a conditional interceptor. Minor: file-count table footnoted (count = importers, not deletions); shared s3compat module added as Alternatives Considered #3 (deferred, not rejected); self-challenge doubt numbering tidied (2 mitigations cover 3 doubts, intentionally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sed in the cycle-2 commit ran from the wrong cwd — Status line still said "cycle 1" and two interface-audit-spike references still said "Phase A/B" instead of "Phase 0/A". Pure text cleanup, no design change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 2 Important from cycle-3 review: Critical (same root — symbol-level coupling the import-block audit missed): - parseStringSlice (in cloud_account_aws.go, which Phase B deletes) and safeIntToInt32 (in core-staying platform_kubernetes.go) are pure helpers the plugin-bound backend files call. An import-block audit is symbol-blind. Fix: Phase 0 now does TWO moves — the file split AND relocating both helpers into a new SDK-free core module/cloud_helpers.go. Per-file table gains a "cross-file symbol deps (the trap)" column listing every helper edge per backend. Phase 0 acceptance criteria now include a grep that no core file references the helpers from their old homes. - §Phase 0 corrected: platform_kubernetes.go is a SEPARATE existing file (module shell + kubernetesBackend interface + safeIntToInt32) — NOT touched by the split; only platform_kubernetes_kind.go (holds all 4 backends) is split. Earlier draft conflated the two files. Important: - Per-file ownership table relabelled "intended post-split — verified by the Phase 0 build gate" (was asserted-as-verified against an unsplit file — same hand-waving class cycle-2 flagged for "always compiles"). - lease_ttl_seconds DROPPED from the Phase A proto. It was a contract field with no enforced semantics and no implementing backend in scope — YAGNI. §Failure-modes orphaned-lock reworked: documented limitation + operator-side lock-object delete for recovery; TTL is a planned ADDITIVE follow-up paired with a conformance test, shipped with the first backend that honors expiry. Added explicit Lock-contention behavior (immediate error, matches today's in-process IaCStateStore.Lock — no new waiting state). Minor: Phase 0 rollback sentence added; garbled §Assumptions 2 sentence fixed; §Assumptions 2 notes Phase 0 de-risks it structurally. Also: removed a stray stale cycle-1 copy of this doc that was sitting untracked in the main workflow checkout (the canonical doc is here in the feat/cloud-sdk-extraction worktree). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical + 2 Important from cycle-4 review: Critical 1 — the per-file symbol-ownership table was wrong AGAIN (3rd cycle running): claimed gkeBackend depends on safeIntToInt32 (it doesn't — that's eksBackend) and aksBackend has no cross-file deps (it does — CloudCredentials/CloudCredentialProvider from cloud_account.go, same as gke). STRUCTURAL FIX: deleted the hand-maintained table entirely. The symbol-ownership map is now a Phase 0 build artifact — scripts/audit-cloud-symbols.sh, committed + re-run in CI — not a design-doc claim that rots on every edit. The design commits to the *method* + the *known shape* (cloud_account.go stays core; all 3 cloud backends bind to it via k.provider.GetCredentials; eksBackend additionally binds to the Phase-B-deleted cloud_account_aws.go; aksBackend imports no cloud SDK). Critical 2 — Phase 0's "split into four, zero logic change" silently dropped the single func init() that registers kind/k3s/eks/gke/aks. Splitting REQUIRES partitioning init() per-file (a distribution, not zero-change). Phase 0 now has an explicit step 2 for the init() partition; relabelled "behavior-equivalent" not "zero logic change"; k3s documented as reusing kindBackend (both stay core). Important 1 — platform.* cloud credential flow across PlatformBackend was unspecified (aksBackend needs CloudCredentials — how does it reach the plugin?). Added: PlatformBackend requests carry a CloudCredentials proto message; engine resolves k.provider.GetCredentials() in-core (config-map parsing, no SDK) and serialises it. Unified with the Architecture-3 credentials story — ONE CloudCredentials proto shape for both surfaces, so secret-redaction has one shape to redact. Important 2 — core actually imports FOUR cloud SDK trees, not three: godo is still in cloud_account_do.go + 5 platform_do_*.go files. §Problem now acknowledges godo as a 4th tree, explicitly scopes it OUT (user's ask was 3 trees), and the go list -deps gate is reworded to assert "zero packages from the three in-scope trees" not "zero cloud SDKs". All "zero cloud SDKs" phrasing reconciled throughout. Minor: ListStates filter + remaining-proto-messages notes folded in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 1 Critical + 2 Important from cycle-5 review: Critical — the init()-partition fix (cycle-4) was kubernetes-only, but the SAME defect class exists in platform_dns.go / platform_ecs.go / platform_networking.go / platform_autoscaling.go: each has a single func init() registering BOTH a core-staying `mock` backend AND a plugin-bound `aws` backend. The old Phase B inventory moved those files wholesale → would exile the mock backends + dangle the route53 registration. FIX: Phase 0 generalized from "split platform_kubernetes_kind.go" to a repo-wide uniform `_core.go` / `_<provider>.go` convention across the WHOLE platform.* family. Every mixed init() is partitioned; the audit script flags any init() registering a mix of core-staying + plugin-bound factories as a CI failure. Phase B inventory rewritten to delete only `_aws.go`/`_eks.go` files, never a mixed file. Important 1 — the cycle-4 "Known shape" prose reintroduced hand-maintained cross-file symbol claims (one already incomplete: parseStringSlice consumers). FIX: cut all per-file symbol enumerations; the section now states only invariants the script VERIFIES (not discovers) + the method. No transcribed symbol lists remain. Important 2 + own finding — cycle-4 said the engine resolves credentials in-core "no SDK needed." VERIFIED FALSE: cloud_account_aws_creds.go's awsProfileResolver calls config.LoadDefaultConfig(WithSharedConfigProfile) and awsRoleARNResolver calls sts.AssumeRole — both need the AWS SDK. FIX: §Architecture-2 corrected — engine passes the DECLARED credential config (plain strings) in the CloudCredentials proto; the PLUGIN resolves (incl. the SDK-bearing profile/role_arn paths). Both cloud_account_aws.go AND cloud_account_aws_creds.go deleted by Phase B, no core replacement — all AWS cred resolution moves plugin-side. azure/gcp resolver files stay (their resolvers are genuinely SDK-free). Minor — backend-name collision: core-reserved names (memory/filesystem/ postgres/kind/k3s/mock) cause a load-time error if a plugin collides, not silent shadowing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 1 Critical + 2 Important from cycle-6 review: Critical — cycle-5's credential-flow fix replaced one false claim with another: it said the CloudCredentials struct already holds "declared config (plain strings incl. profile)". VERIFIED FALSE — the struct (cloud_account.go:18) has no Profile field (profile lives in Extra map) and the resolvers mutate it in-place with RESOLVED values. FIX, cleaner than the struct change the reviewer proposed: the struct needs NO change (Extra map already carries markers, RoleARN field exists). Instead, cloud_account_aws_creds.go is EDITED not deleted — the SDK-bearing tails of awsProfileResolver/awsRoleARNResolver (config.LoadDefaultConfig, sts.AssumeRole) are removed; they keep their SDK-free heads (record declared inputs + an Extra["credential_source"] marker, exactly as awsStaticResolver already does). After the edit the file is SDK-free and stays in core alongside the azure/gcp resolver files. Only cloud_account_aws.go (the pure-SDK AWSConfig() builder + AWSConfigProvider + awsProviderFrom) is deleted; its profile-chain/STS logic moves into the plugin's buildAWSConfig. Every in-core resolver becomes uniformly "declare, don't resolve"; the plugin honors the markers. No unregistered- resolver failure mode — the resolver init() registrations stay. Important 1 — §Phase-0 misidentified the DNS file with the mixed init(). VERIFIED: platform_dns.go:66 has the init() (+ interface + factory registry); platform_dns_backends.go has both impls + the route53 SDK import, NO init(). DNS is a TWO-file split, unlike single-file ecs/networking/autoscaling. §Phase-0 now states the per-family layout explicitly (kubernetes one-file, dns two-file, ecs/networking/autoscaling one-file) and notes the audit script determines it. Important 2 — azure/gcp resolvers (and now aws profile/role_arn) emit deferred-resolution markers for env/CLI/managed-identity/workload-identity/ profile/role_arn — NOT plain-string passthrough. §Architecture-3 + Assumption 5 now state the plugin MUST implement marker handling for every deferred type, not just AWS profile/role_arn. Minor — safeIntToInt32 relocation rationale clarified (it's a clean copy-source for the plugin-bound files, not a hard core necessity); parseStringSlice IS a hard necessity (its file is deleted). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… revisions Addresses 2 Critical from cycle-7 review (architecture confirmed sound; these are the last two extraction-mechanic precision gaps): C1 — "remove the SDK tail, file becomes SDK-free" mischaracterized the awsRoleARNResolver edit. VERIFIED: awsProfileResolver's SDK calls ARE a clean contiguous tail, but awsRoleARNResolver's SDK block (base-config build + sts.AssumeRole, ~45 lines) is the larger half of the method, after the declared-input recording. FIX: §Architecture-2 re-characterizes the edit as a deliberate Resolve() body REWRITE (not a one-line snip) — explicitly per-resolver. Added a Phase B CI invariant: an import-block grep (folded into audit-cloud-symbols.sh) asserts cloud_account_aws_creds.go has zero aws-sdk-go-v2 imports post-rewrite — mechanically enforced, not prose-asserted. C2 — cloud_account_aws.go defines FOUR symbols, not one; the symbol-ownership invariant named only parseStringSlice. VERIFIED + fixed: - AWSConfigProvider interface signature names aws.Config → CANNOT stay in core, deleted with the file. - awsProviderFrom → deleted with the interface. - ValidateCredentials → verified NO real caller (only a comment ref in cmd/wfctl/deploy.go:866) → deletes cleanly. - The 8 awsProviderFrom consumers are all verified plugin-bound — but each currently does awsProviderFrom(k.provider).AWSConfig(ctx); in the plugin there's no cloud.account to type-assert. §Cross-file-coupling invariant 3 now states Phase B must REWRITE all 8 consumers to obtain creds from the CloudCredentials proto + buildAWSConfig — explicit Phase B scope, not a footnote. Phase B table atomicity column updated. Minor (M1) — platform_dns_backends.go renamed → platform_dns_core.go in Phase 0 so the dns family conforms to the uniform _core.go/_aws.go naming; no special-case three-file layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t post-#653 main Cycle-8 adversarial review caught the design's file/symbol inventory as stale: it predated issue #653 (closed 2026-05-13), which already removed the AWS IaC modules, platform/providers/aws/, and stubbed the codebuild + EKS backends. Re-baselined every file/symbol claim against origin/main HEAD (worktree confirmed 0 commits behind origin/main): - Added "Relationship to issue #653" section — this design is #653's named successor, extracting the AWS surface #653 scoped out ("RBAC/secrets/artifact stay") plus the untouched Azure/GCP surfaces. - Problem table corrected: AWS 6 real-import files (not 13), Azure 3, GCP 3. storage_artifact_s3.go is comment-only — stays in core. - cloud_account_aws.go is dead code — zero non-test consumers verified; deleted outright, no 8-consumer rewrite (awsProviderFrom + consumers removed by #653). - Phase 0 shrunk to a single-file split (platform_kubernetes_kind.go); parseStringSlice + safeIntToInt32 no longer exist — helper-relocation task deleted. - PlatformBackend now serves only aks + gke (eks already a #653 SDK-free stub); interface-audit spike audits one interface, not five. - Phase B inventory rewritten; Phase A/C file lists corrected. - Self-challenge doubt #4 + Assumption 7 added: inventory staleness is the cycle-8 defect class; audit script makes it CI-enforced. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t script Cycle-9 adversarial review caught aksBackend mis-classified as an azure-sdk importer: platform_kubernetes_kind.go's azure-sdk-for-go match is a stale doc comment (line 332) — aksBackend.azureToken is a plain net/http OAuth2 client. An import-block-disciplined re-survey found a second comment-only false positive: nosql_dynamodb.go. Structural fix for the recurring "grep matched a comment" defect class: added scripts/audit-cloud-symbols.sh, which parses Go import(...) blocks (never comments) and emits the comment-immune real-import map. Its output now populates every file table in the design — prose claims replaced by a build artifact. Formalized + CI-wired in Phase 0. Corrected inventory (audit-script output): AWS 5 real-import files (not 6), Azure 2 (not 3), GCP 3. nosql_dynamodb.go + storage_artifact_s3.go are comment-only stubs — out of scope, stay in core. Design consequences of aksBackend being SDK-free: - Only gkeBackend carries a cloud platform SDK. kind/k3s/eks/aks all stay in core. - Architecture §2 no longer proposes a new PlatformBackend contract. The gke cross-process mechanism is gated on an interface-audit spike whose preferred outcome is folding into the existing ResourceDriver contract — a dedicated contract for one backend is YAGNI. - Phase A (Azure) is now pure IaCStateBackend — touches no platform file. - Phase 0 splits platform_kubernetes_kind.go into _core.go (kind/k3s/ eks/aks — all SDK-free) + _gke.go (the lone SDK-bearing backend), and fixes the stale line-332 comment. - The gke platform extraction + its contract decision move to Phase C. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…scope boundary explicit
Cycle-10 adversarial review caught Assumption 6 as false: the cycle-9
audit script scanned only module/, missing five aws-sdk-go-v2 importers
under provider/aws/, plugin/rbac/, iam/, artifact/. The design's Goal
("go.mod drops aws-sdk-go-v2 entirely") was therefore unachievable by
the four phases as written.
Structural fix — third defect-class variant closed:
- audit-cloud-symbols.sh now scans the WHOLE REPO (not just module/) and
splits results module/ vs. elsewhere. Comment-immune (cycle 9) +
scope-complete (cycle 10) + CI-enforced (Phase 0).
Whole-repo inventory result:
- Azure + GCP SDK usage is entirely module/-resident → Phases A and C
drop those trees from go.mod ENTIRELY (whole-graph go list -deps gate).
- aws-sdk-go-v2 is split: 5 module/ files (in scope, Phase B) + 6 files
in provider/aws/, plugin/rbac/aws.go, iam/aws.go, artifact/s3.go.
Scope decision: the out-of-module/ AWS surface is exactly #653's
deliberately-retained "RBAC/secrets/artifact stay" scope (plus the
provider/aws deploy provider). This design does NOT unilaterally
override #653's recent documented decision — it scopes that surface OUT
(new Non-Goal, parallel to godo) and logs a recommended successor issue.
Consequences threaded through the doc:
- Goals section is now asymmetric: Azure/GCP full go.mod removal; AWS is
module/-scoped removal (aws-sdk-go-v2 stays in go.mod for the
out-of-scope surface).
- Phase C CI gate is asymmetric: whole-graph zero for Azure/GCP,
module/-scoped zero for AWS.
- Assumption 6 rewritten to the verified truth; Assumption 7 notes #653's
scope decision is respected, not contested.
- Minors: I2 (awsRoleARNResolver rewrite — non-SDK required-check +
sessionName extraction sit between declared-input recording and the
SDK block; spelled out), M1 (Phase A also fixes iac_module.go's stale
line-18 backend-list comment), M2 (internal/legacyaws noted).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adversarial review cycle 11: PASS (zero Critical, zero Important). Two Minor nits applied: - audit-cloud-symbols.sh: real_import now also matches single-line `import "..."` form, not just parenthesized blocks — closes the one latent parser false-negative the reviewer flagged. - §Goals: clarified that the module/-scoped AWS-zero `--check` assertion is deferred-implementation added in Phase C (the committed script only enforces the cloud_account_aws_creds.go post-Phase-B invariant today), parallel to the Phase 0 init()-partition deferral. Design phase complete — proceeding to writing-plans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e pipe The cycle-11 single-line-import hardening added an inner `grep -E '^import "'` whose no-match exit 1 poisoned the `| grep -q` pipe under `set -o pipefail`, making real_import() return false for every file lacking a single-line import. Added `|| true` on the inner grep. Verified: full report restored, all REAL/comment-only classifications correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e A) Bite-sized TDD plan for the first executable increment: Phase 0 (split platform_kubernetes_kind.go, fix the stale comment, wire the audit script into CI) + Phase A (IaCStateBackend proto + benchmark-gated proto-lock, host-side gRPC resolution, secret-redaction, gRPC-logging guard, workflow-plugin-azure implementation, core deletion dropping azure-sdk from go.mod). 14 tasks across 5 PRs. Phases B/C/D are explicitly scoped to a follow-on plan — their concrete tasks depend on Phase A's outputs (the benchmark-validated proto shape, the host-resolution pattern, the plugin-side serve path), so planning them now would be fiction. The design doc remains the authoritative B/C/D spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al review Plan-phase adversarial review FAIL (1 Critical + 4 Important + 4 Minor). All addressed: - C1 (Critical): Task 4's proto used google.protobuf.Struct, which iac.proto:6-10 explicitly bans. Rewrote IaCState to carry Outputs/Config as `bytes outputs_json`/`bytes config_json` (the established ResourceState pattern); Tasks 5/7/11 now convert via encoding/json, not structpb. Removed the bogus struct.proto import step. - I1: Task 4 `buf generate` now runs from worktree root (buf.yaml lives there), not `cd plugin/external/proto`. - I2: Task 6 acknowledges the existing benchmark.yml (-bench=. picks up the new benchmarks automatically) — no redundant harness; clarified the task is a one-time decision gate. - I3: Task 8's embedded research spike resolved at plan time — engine.go was read; integration is the design-sanctioned package-level module.iacStateBackendRegistry populated by StdEngine.loadPluginInternal. Tasks 8/13/14 now have concrete file sets. - I4: Scope Manifest now declares PR 4 a human-action gate (cross-repo, workflow-plugin-azure) with the PR4->PR5 dependency stated explicitly. - M1: Task 5's benchmark file is now genuinely self-contained (local benchStateToProto + benchStateBackendServer; no forward references). - M2: Task 3 names ci.yml directly, places the audit job beside the existing godo-banned/aws-sdk-banned grep-gate jobs. - M3: Task 6 pins benchstat (go install + bare invocation). - M4: Task 9 states the redaction gap is verified against step_output_redactor.go:7-19, not a live deduction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plan-phase adversarial review cycle 2: all 9 cycle-1 findings confirmed resolved; 2 new Important + 2 Minor surfaced by the new-defect scan. All addressed: - I-A (Important): Task 9's redaction test was inconsistent with the actual redactMap behavior — a key named `credentials` matches the existing `credential` pattern and is wholesale-replaced with the placeholder STRING before any recursion, so the test's `.(map[string]any)` assertion panicked. Reworked Task 9: the `credentials:` block is ALREADY redacted wholesale (regression-tested); the real gap is `credentials_ref` being over-redacted (it's a module name, not a secret) — fix is a narrow `*_ref`-suffix exemption in isSensitiveField, not camelCase leaf patterns (which would be dead code given wholesale redaction happens first). - I-B (Important): Task 14's engine.go integration seam was under-specified and would fight loadPluginInternal's no-concrete-types precedent. Resolved at plan time (engine.go:305-327 read): Task 14 now defines an `IaCStateBackendProvider` optional interface and type-asserts it in loadPluginInternal exactly like the existing stepRegistrySetter/slogLoggerSetter pattern; ExternalPluginAdapter implements it. Concrete file set + code sketch added. - M-i: Task 6's benchmark.yml description corrected (runs `go test -bench=.` inline, not `make bench-baseline`). - M-ii: Task 4 notes the proto README's plugin.proto-specific wording is stale; trust root buf.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…inor cleanups Plan-phase adversarial review cycle 3: PASS (zero Critical, zero Important). Two Minor doc-tightening fixes applied: - Task 9 Step 4 now names bearer_token_ref explicitly and explains why the *_ref exemption is safe for it (SecretRef is a reference struct, not a raw secret) rather than claiming no *_ref field exists. - engine.go line citations corrected to 311-326. Plan phase complete — proceeding to alignment-check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 precursor for cloud-SDK extraction. kindBackend/eksErrorBackend/ aksBackend (all SDK-free) move to platform_kubernetes_core.go with a core init(); gkeBackend (the only SDK-bearing k8s backend) moves to platform_kubernetes_gke.go with its own init(). Behavior-equivalent: same five backend names registered. Isolates the lone SDK-bearing platform file for a later clean deletion.
Code-review Minor: makes the Phase 0 SDK-free/SDK-bearing partition self-documenting for readers without the commit message.
aksBackend.azureToken is a net/http OAuth2 client, not an azure-sdk consumer. The stale comment is what fooled an earlier inventory pass into mis-counting platform_kubernetes_kind.go as an azure-sdk importer.
Extends audit-cloud-symbols.sh --check with an init()-partition assertion (platform_kubernetes_core.go registers only kind/k3s/eks/aks; _gke.go only gke) and adds a cloud-sdk-audit job to ci.yml beside godo-banned / aws-sdk-banned, so the cloud-SDK inventory becomes a build-enforced artifact rather than a prose claim.
…nding Task 6 measurement: gRPC cycle 6.511ms ±1% vs in-process 179ns, for a worst-case 1MB synthetic state. Exceeds the plan's <5ms acceptance bar. Root-cause analysis: the cost is json.Marshal/Unmarshal of the ~1MB map[string]any (inherent to the bytes outputs_json wire format the iac.proto invariant mandates) — NOT gRPC transport buffering or the 4MB message cap. The plan's contingency remedy (streaming redesign) addresses message-size-cap + memory-buffering, neither of which the benchmark hits; streaming would not move the number. Recommendation: retain unary (6.5ms is still negligible vs real cloud backend I/O — the design's own bar-rationale). Deviation from the literal 5ms estimate-bar is surfaced to the operator, not absorbed silently. Scope lock intact: Task 6 run + recorded, no task added/dropped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-confirmed) Operator reviewed the 6.51ms benchmark + root-cause analysis and confirmed: "I'm not concerned about 6.51ms, that's acceptable." Task 6's gate resolves Unary LOCKED — the Task 4 proto stands, no streaming redesign, PR 2/3 proceed unchanged. Operator additionally raised a long-term architectural item: IaC state is persisted at-rest as JSON; a typed/compact binary format (pb/msgpack/CBOR) with JSON-export + content-detection-on-read would be better for processing/type-correctness/large-state scaling. Logged as a post-extraction follow-up in both the benchmark decision record and the design doc's Open items — distinct from the wire contract, cross-cutting across all IaCStateStore impls, needs its own brainstorming pass. Not actioned in this locked plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 6186e3d100e427807b9fd122e20df589d6bb6954.
Operator-approved scope amendment to the (reverted-to-Draft) plan: - ADR 0033: add ctx context.Context to module.IaCStateStore — new PR 6 / Task 15. Task 7 had to hardcode context.Background() in grpcIaCStateStore; the operator directed widening the interface now while we're at that boundary, so Phase B/C/D plugin backends inherit it ctx-ful. Bounded blast radius (~9 files, all in module/); interfaces.IaCStateStore already had ctx and is untouched. - ADR 0034: de-gate PR 4 from "HUMAN-GATE" to autonomous cross-repo. Operator: agents should operate in plugin repos directly; the real requirement is prompt clarity (absolute repo path stated up front), not a human hand-off. Plan's PR 4 row, Cross-repo note, and executor notes updated accordingly. - Manifest: 5 PRs/14 tasks -> 6 PRs/15 tasks. Execution order documented (PR 6 stacks on PR 3, runs before PR 4). Benchmark-gate executor note updated to RESOLVED (unary locked). Next: re-run alignment-check on the amended plan, then re-lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 14, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR starts the cloud-SDK extraction effort by mechanically isolating the GKE SDK-backed Kubernetes backend, adding an audit gate for cloud SDK ownership, and documenting the multi-phase extraction plan and ADRs.
Changes:
- Split Kubernetes backend implementations into SDK-free core backends and the SDK-bearing GKE backend.
- Added a cloud SDK audit script and CI job to enforce Kubernetes backend registration partitioning.
- Added design, implementation plan, benchmark decision record, and ADRs for the broader extraction effort.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
module/platform_kubernetes_core.go |
Holds SDK-free kind/k3s/eks/aks backends and core registrations. |
module/platform_kubernetes_gke.go |
Isolates the GKE backend and Google SDK imports. |
scripts/audit-cloud-symbols.sh |
Adds repo-wide SDK import reporting and Phase 0 partition checks. |
.github/workflows/ci.yml |
Adds the cloud SDK audit CI job. |
docs/plans/2026-05-14-cloud-sdk-extraction-design.md |
Adds the extraction design. |
docs/plans/2026-05-14-cloud-sdk-extraction.md |
Adds the locked implementation plan. |
docs/plans/2026-05-14-cloud-sdk-extraction.md.scope-lock |
Adds the plan lock hash. |
docs/plans/2026-05-14-iac-state-backend-benchmark.md |
Records the IaC state backend transport benchmark decision. |
decisions/0033-add-ctx-to-module-iac-state-store.md |
Adds ADR for adding context to module.IaCStateStore. |
decisions/0034-cross-repo-agent-operation-for-plugin-prs.md |
Adds ADR for autonomous cross-repo plugin PR work. |
Comments suppressed due to low confidence (1)
module/platform_kubernetes_core.go:123
- This comment still contains the literal
azure-sdk-for-go, soscripts/audit-cloud-symbols.shwill continue to listmodule/platform_kubernetes_core.goas a comment-only Azure SDK match. That contradicts the Phase 0 plan's verification step that this file should no longer appear in the Azure section; reword the comment without the SDK import path (for example, say it uses the REST API and no Azure SDK) to keep the audit output clean.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
⏱ Benchmark Results✅ No significant performance regressions detected. benchstat comparison (baseline → PR)
|
This was referenced May 14, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First PR of the cloud-SDK-extraction effort (extracting AWS/Azure/GCP SDKs from
module/into strict-contract gRPC plugins). This PR carries Phase 0 (the mechanical precursor) plus the full design + plan + ADRs for the whole effort.module/platform_kubernetes_kind.gointo_core.go(SDK-freekind/k3s/eks/aksbackends) +_gke.go(the lone SDK-bearing backend), partitioning the sharedinit(). Behavior-equivalent.aksBackend(it's anet/httpOAuth2 client, not an azure-sdk consumer).scripts/audit-cloud-symbols.shwith aninit()-partition assertion + wire it intoci.ymlas thecloud-sdk-auditjob, beside the existinggodo-banned/aws-sdk-bannedgrep gates.Design & Plan
docs/plans/2026-05-14-cloud-sdk-extraction-design.md(adversarial review PASS, cycle 11)docs/plans/2026-05-14-cloud-sdk-extraction.md(adversarial review --phase=plan PASS, cycle 3; alignment-check zero drift; scope-locked)decisions/0033(ctx amendment),decisions/0034(cross-repo agent operation)docs/plans/2026-05-14-iac-state-backend-benchmark.mdScope Manifest
PR Count: 6 · Tasks: 15 · Status: Locked 2026-05-14T10:37:04Z
This is PR 1 —
Tasks 1, 2, 3— branchfeat/cloud-sdk-extraction-p0.Stacked PR
Base:
main. PRs 2/3/6 stack on this one (PR 2 ← this, PR 3 ← PR 2, PR 6 ← PR 3). Merge in order.Test status
go test ./...green for everything this PR touches. Two repo-wide failures are pre-existing and unrelated to this branch:TestFallbackRuns(inexample.test/wfctl-ci/pkg— a separate Go module this branch touches zero files in) andTestInfraMultiEnv_E2E(cmd/wfctl— a strict-contracts/missing-plugin fixture issue; reproduces on the parent commit).🤖 Generated with Claude Code