diff --git a/docs/plans/2026-05-16-v2-lifecycle-phase2-design.md b/docs/plans/2026-05-16-v2-lifecycle-phase2-design.md new file mode 100644 index 00000000..0b3f1a1b --- /dev/null +++ b/docs/plans/2026-05-16-v2-lifecycle-phase2-design.md @@ -0,0 +1,379 @@ +# V2 Action Lifecycle — Phase 2 gRPC Contract Extension + 5-Repo Hard-Cutover — Design + +**Status:** Draft +**Date:** 2026-05-16 +**Operator:** Jon (autonomous-mode mandate 2026-05-16) +**Tracking issue:** GoCodeAlone/workflow#640 +**Phase 1 (already shipped):** PR #691 at f5ec18ef — inventory + ADR 0040 + Deprecated marker + 4 callers migrated +**ADR (will land in this design):** decisions/0041-v2-applyresponse-actions-contract.md +**Related ADRs (BINDING):** decisions/0024-iac-typed-force-cutover.md (no compat shim), decisions/0040-v2-action-lifecycle-provider-compatibility.md (5 provider expectations) + +## Goal + +Phase 2 of #640. Extend the `IaCProviderRequiredServer.Apply` gRPC contract with per-action outcome evidence so wfctl-side `OnResourceApplied` + `OnResourceDeleted` hooks fire correctly. Ship as a coordinated HARD-CUTOVER PR cascade across 5 repos per ADR 0024 (no compat shim, no graceful proto fallback). + +**Phase 2 scope (REVISED per cycle-1 adversarial review: Pattern B collapsed — 8 deliverables, not 10):** + +The cycle-1 review correctly identified that "Pattern B (custom-loop)" is architecturally broken: aws/gcp/azure don't declare `compute_plan_version="v2"` in their CapabilitiesResponse, so wfctl takes the v1 dispatch path (`provider.Apply(ctx, &plan)` direct call). Even if those plugins populate the new `Actions` wire field, wfctl never reads it on the v1 path. The intended OnResourceApplied / OnResourceDeleted hook firing CANNOT happen via Pattern B. + +**Correct architecture**: have ALL 4 plugins DECLARE `compute_plan_version="v2"` in their CapabilitiesResponse. wfctl then routes through `wfctlhelpers.ApplyPlanWithHooks` for all 4 — engine-side population handles Actions UNIFORMLY (Pattern A becomes the ONLY pattern). The plugins' existing `IaCProvider.Apply` impls become unused dead code (engine dispatches via `provider.ResourceDriver(action.Resource.Type)` per action instead). + +**8 deliverables:** + +1. Extend `plugin/external/proto/iac.proto` `ApplyResult` message with `repeated ActionResult actions` field + new `enum ActionStatus` (workflow PR) +2. Regenerate `iac.pb.go` from updated proto (workflow PR) +3. Extend `interfaces.ApplyResult` Go struct with `Actions []ActionOutcome` field (workflow PR) +4. Update `cmd/wfctl/iac_typed_adapter.go::applyResultFromPB` to decode the new field. **REVISED per cycle-1 C-2**: function signature must thread plan-action-count for length validation. Either (a) add `expectedActionCount uint32` parameter to applyResultFromPB OR (b) move length validation to caller after applyResultFromPB returns. +5. Update `iac/wfctlhelpers/apply.go::applyPlanWithEnvProviderAndHooks` to populate `result.Actions` during dispatch (workflow PR) — the engine ALREADY iterates plan.Actions via `dispatchAction`; Phase 2 just appends ActionOutcome per iteration +6. **REVISED per cycle-1 C-3**: manifest validation gate scope. Two options: (a) implement new `plugin/external/manifest` package with `ValidateBytes(bytes []byte) error` + create `schema/plugin_manifest.json` (hidden scope, ~200 LOC). (b) Use existing `schema/` package's JSON schema infrastructure if it covers plugin.json (probe required at writing-plans). (c) DEFER manifest validation to Phase 2.1 separate PR. Decision needed at writing-plans phase. Phase 2 design now LISTS this explicitly as deliverable #6 with scope-decision sub-task. +7. **All 4 plugins (aws/gcp/azure/DO): declare `compute_plan_version="v2"` in CapabilitiesResponse** + bump workflow pin + bump minEngineVersion. ~5-line change per plugin. PRs in parallel. Plugins' existing `IaCProvider.Apply` impls become dead-code; they could be deleted in a Phase 2.5 cleanup but kept for Phase 2 to avoid blast radius. +8. Cross-plugin smoke: install each plugin into wfctl + run sample apply + verify ActionResults populated AND that v2 dispatch path was taken (not v1 fallback). + +**Coordinated cutover:** workflow PR + 4 plugin PRs land in same release window. Workflow tag (v0.54.0 candidate) carries the engine change. 4 plugin tags (aws v1.2.0 / gcp v1.2.0 / azure v1.2.0 / DO v1.2.0) carry the per-plugin Apply impls. All 5 ship in lockstep. + +**Out of scope for Phase 2** (deferred to Phase 3 / 5): +- Phase 3: codemod-driven canonical-form bump for plugins that delegate to wfctlhelpers.ApplyPlan (currently only DO) +- Phase 5: remove `wfctlhelpers.ApplyPlan` entirely + +## Architecture + +**1-design-doc + 1-ADR + 5-coordinated-PR cascade**. Per-repo single PR; coordinated release tags. + +### Proto extension (definitive spec) + +```protobuf +// plugin/external/proto/iac.proto (additions) + +// ActionStatus categorizes per-action outcomes for wfctl-side hook dispatch. +// Per ADR 0040 invariants 1-2. (Compensation invariants 3 deferred to Phase +// 2.3 when engine-side compensation logic actually exists — cycle-2 review +// correctly flagged that defining COMPENSATED + COMPENSATION_FAILED enum +// values in Phase 2 with no reachable emission path creates dead values + +// linter exhaustiveness violations. Phase 2 scope ships only the 3 statuses +// that have engine-side emission paths today.) +enum ActionStatus { + // Default; must NEVER be emitted by plugins. wfctl rejects ApplyResponse + // with any action_index whose status == UNSPECIFIED (catches plugin bugs + // where action result was forgotten). + ACTION_STATUS_UNSPECIFIED = 0; + + // Action succeeded; wfctl fires OnResourceApplied (or OnResourceDeleted + // for successful delete actions per ADR 0040 invariant 1). + ACTION_STATUS_SUCCESS = 1; + + // Action failed (non-delete: create/update/replace). For create/update, no + // resource exists; wfctl skips OnResourceApplied. ApplyResult.errors carries + // the human-readable error (existing field). + ACTION_STATUS_ERROR = 2; + + // Delete action failed (resource still exists in cloud); wfctl MUST NOT + // fire OnResourceDeleted (state preserved). Distinct from ACTION_STATUS_ERROR + // because wfctl-side downstream behavior differs per ADR 0040 invariant 2. + ACTION_STATUS_DELETE_FAILED = 3; + + // FUTURE: ACTION_STATUS_COMPENSATED + ACTION_STATUS_COMPENSATION_FAILED + // reserve tags 4 + 5 for Phase 2.3 when engine-side compensation lands. + // NOT defined in Phase 2; defining-without-emitting creates dead enum + // values per cycle-2 review. +} + +// ActionResult is the per-action outcome surfacing for Phase 2 v2 hooks. +// Per ADR 0040 invariant 1. +// +// REVISED per cycle-2 review: output_keys field DROPPED. Hook firing +// (OnResourceApplied / OnResourceDeleted) requires only action_index + +// status. Per-resource outputs already live in ApplyResult.resources +// (existing aggregated field; engine-side populated). Adding a parallel +// map with map[string]any→string conversion ambiguity +// expands proto surface without delivering new capability for hooks. +// Phase 2.2 may add a per-action outputs field if a future caller actually +// needs per-action output correlation; today no such caller exists. +message ActionResult { + // 0-indexed position in the input plan.actions array. + uint32 action_index = 1; + + // Per ActionStatus enum semantics above. + ActionStatus status = 2; + + // Per-action error message. Empty unless status is ERROR / DELETE_FAILED. + // Mirrors interfaces.ActionError.Error for backwards compat with the + // existing ApplyResult.errors aggregation path (engine-side reconciles + // when populating both fields). + string error = 3; +} + +// Extend existing ApplyResult message with the new field at tag 7 (non-conflicting +// with current tags 1-6). +message ApplyResult { + string plan_id = 1; + repeated ResourceOutput resources = 2; + repeated ActionError errors = 3; + map initial_input_snapshot = 4; + repeated DriftEntry input_drift_report = 5; + map replace_id_map = 6; + // NEW for Phase 2 (workflow#640 ADR 0041): per-action outcomes for wfctl-side + // hook dispatch. Per ADR 0024 + 0040, plugins MUST populate this field; wfctl + // MUST validate that len(actions) == len(plan.actions) on receipt. Absent or + // partial actions field → wfctl rejects ApplyResponse with structured error. + repeated ActionResult actions = 7; +} +``` + +### Go interfaces extension (workflow side) + +```go +// interfaces/iac.go additions + +// ActionStatus mirrors pb.ActionStatus. Type-safe enum at the Go boundary. +type ActionStatus uint8 + +const ( + ActionStatusUnspecified ActionStatus = iota + ActionStatusSuccess + ActionStatusError + ActionStatusDeleteFailed + ActionStatusCompensated + ActionStatusCompensationFailed +) + +// ActionOutcome mirrors pb.ActionResult. +// REVISED per cycle-2: Outputs field dropped (see ActionResult proto comment). +type ActionOutcome struct { + ActionIndex uint32 + Status ActionStatus + Error string +} + +// ApplyResult — add Actions field; preserve existing fields. +type ApplyResult struct { + PlanID string + Resources []ResourceOutput + Errors []ActionError + InitialInputSnapshot map[string]string + InputDriftReport []DriftEntry + ReplaceIDMap map[string]string + Actions []ActionOutcome // NEW Phase 2 +} +``` + +### wfctl-side decoder (REVISED per cycle-1 C-2) + +**Cycle-1 finding:** `applyResultFromPB(r *pb.ApplyResult)` does NOT receive `plan`; the design's pseudo-code `plan.GetActions()` was a compile error. + +**Fix:** length validation lives at the CALLER (which has both plan + result). applyResultFromPB only decodes + rejects UNSPECIFIED. The two-phase responsibility split: + +```go +// applyResultFromPB (no signature change): just decode the new field +func applyResultFromPB(r *pb.ApplyResult) (*interfaces.ApplyResult, error) { + // ... existing decoding ... + actions := make([]interfaces.ActionOutcome, 0, len(r.GetActions())) + for _, a := range r.GetActions() { + if a.GetStatus() == pb.ActionStatus_ACTION_STATUS_UNSPECIFIED { + return nil, fmt.Errorf("plugin returned ActionResult with UNSPECIFIED status at action_index=%d (Phase 2 contract violation per ADR 0041)", a.GetActionIndex()) + } + actions = append(actions, interfaces.ActionOutcome{ + ActionIndex: a.GetActionIndex(), + Status: mapPBStatusToInterface(a.GetStatus()), + Outputs: a.GetOutputKeys(), + Error: a.GetError(), + }) + } + result.Actions = actions + return result, nil +} + +// At the caller (typedIaCAdapter.Apply, iac_typed_adapter.go:350) — V1 PATH ONLY: +result, err := applyResultFromPB(resp.GetResult()) +if err != nil { return nil, err } +// NOTE: Phase 2 length validation does NOT live here. typedIaCAdapter.Apply +// is reached only on the V1 DISPATCH PATH (provider.Apply direct). On the +// v2 path (wfctlhelpers.ApplyPlanWithHooks), this function is never called; +// engine populates Actions per dispatch loop iteration. +// +// V1-plugin compatibility: future plugins that stay on v1 (declare empty +// computePlanVersion) will return ApplyResult with len(actions) == 0. The +// Phase 2 contract length validation MUST happen on the V2 ENGINE PATH +// only — see next subsection. +``` + +### Engine-side length validation (cycle-2 fix for I-2) + +```go +// In iac/wfctlhelpers/apply.go::applyPlanWithEnvProviderAndHooks AFTER the +// dispatchAction loop completes: +if len(result.Actions) != len(plan.Actions) { + // ASSERT, not return: this is an engine-internal invariant violation + // (engine just iterated len(plan.Actions) times; if Actions count + // differs, engine has a bug). Log + return error rather than panic. + return result, fmt.Errorf("internal: ApplyPlanWithHooks produced %d ActionOutcomes for %d plan actions (engine invariant violation)", len(result.Actions), len(plan.Actions)) +} +``` + +This places the validation where the engine actually iterates the plan + can guarantee 1:1 correspondence. V1-plugin path (`typedIaCAdapter.Apply` → `provider.Apply`) does NOT enforce this check because v1 plugins legitimately return empty Actions field; that's the v1 dispatch contract. + +### Engine-side population (iac/wfctlhelpers/apply.go::applyPlanWithEnvProviderAndHooks) + +```go +// In the dispatchAction loop, after each action (success OR error): +result.Actions = append(result.Actions, interfaces.ActionOutcome{ + ActionIndex: uint32(i), + Status: mapErrToStatus(dispatchErr, action.Type), + Error: errOrEmpty(dispatchErr), +}) +``` + +`mapErrToStatus(err, actionType)` returns: +- `ActionStatusSuccess` if `err == nil` +- `ActionStatusDeleteFailed` if `err != nil && actionType == "delete"` +- `ActionStatusError` otherwise + +Compensation status values (COMPENSATED + COMPENSATION_FAILED) are RESERVED for Phase 2.3 — cycle-2 review correctly identified that no engine path currently emits them (doCreate's upsert recovery is name-conflict workaround, not cloud-side compensation; defaultReplaceWithHooks documents no rollback attempted). Phase 2 ships only the 3 reachable statuses. + +### Per-plugin implementation pattern (REVISED — single uniform pattern) + +**All 4 plugins (aws/gcp/azure/DO): declare `compute_plan_version="v2"` in CapabilitiesResponse.** wfctl then routes through `wfctlhelpers.ApplyPlanWithHooks` for all 4. Engine-side dispatch via `provider.ResourceDriver(action.Resource.Type)` per action; engine-side population of `result.Actions` per dispatch. + +Plugin-side change template (~5 LOC per plugin): + +```go +// internal/server.go OR equivalent: where Capabilities RPC handler lives +func (s *Server) Capabilities(ctx context.Context, _ *emptypb.Empty) (*pb.CapabilitiesResponse, error) { + return &pb.CapabilitiesResponse{ + // ... existing CapabilityDeclarations field ... + ComputePlanVersion: "v2", // NEW Phase 2: declare v2 dispatch + }, nil +} +``` + +This is the IaCCapabilityDeclaration array stays unchanged; the addition is the `compute_plan_version` STRING field on CapabilitiesResponse (proto path `plugin/external/proto/iac.proto:382`). + +**Existing `IaCProvider.Apply` impls on aws/gcp/azure become dead code post-cutover** — wfctl no longer calls them on the v2 path. Phase 2 keeps them in-tree to minimize blast radius; Phase 2.5 follow-up may delete. + +### Manifest validation gate (REVISED per cycle-1 C-3) + +**Cycle-1 finding:** `pluginmanifest.ValidateBytes` doesn't exist; design referenced a non-existent package. ~200 LOC hidden scope. + +**Resolution:** scope-decision at writing-plans phase. Three options recorded for writing-plans evaluation: + +(a) **Implement `plugin/external/manifest` package** with `ValidateBytes(bytes []byte) error` + create `schema/plugin_manifest.json` schema file. ~200 LOC of new code; adds a real-time JSON-schema-validation dependency (e.g., `github.com/santhosh-tekuri/jsonschema`). + +(b) **Reuse existing `schema/` package's JSON schema infrastructure** IF it covers plugin.json — writing-plans probes this; if `schema/` already has a plugin manifest schema generator, validation hook can ride on top. + +(c) **DEFER manifest validation to a Phase 2.1 follow-up PR.** Phase 2 ships without the gate; Phase 1 Assumption 8 risk remains (silent v1 fallback on manifest typos) until 2.1 closes it. Acceptable trade-off if the schema-validation library + schema-file scope is too big to fold into Phase 2's 5-repo cutover. + +**Phase 2 design RECOMMENDS option (c)** — defer manifest validation to Phase 2.1. Reason: the 5-repo HARD-CUTOVER is already substantial; adding net-new package + dependency expands blast radius. Phase 2.1 is a single workflow-side PR that adds the validation gate WITHOUT cross-repo cutover risk. Phase 2.1 can also handle the typo-silent-fallback risk via a simpler check (verify computePlanVersion field value is one of {"v1", "v2"} OR empty) without requiring full JSON-schema infrastructure. + +This shifts deliverable #6 from "implement manifest validation gate" to "file workflow#640-followup tracking issue for Phase 2.1 manifest validation gate." + +### ResourceReplacer manifest declaration (ADR 0040 invariant 5) + +Plugins declaring `ResourceReplacer` interface usage at the resource-type level via `plugin.json`: + +```json +{ + "iacProvider": { + "computePlanVersion": "v2", + "resourceTypes": [ + {"name": "infra.compute", "drivers": ["compute"], "resourceReplacer": true}, + {"name": "infra.network", "drivers": ["network"]} + ] + } +} +``` + +wfctl-side `deploy_providers.go` reads `resourceReplacer: true` PER resource type; pre-mutation gating in `preflightProviderOwnedReplaceWithDeleteHooks` (`iac/wfctlhelpers/apply.go:166`) uses this declaration instead of runtime ResourceReplacer detection. + +## Coordinated cutover sequence (REVISED per cycle-1 timing critique) + +**Realistic 2-3 hour release window** — cycle-1 reviewer correctly flagged that the original "30-min atomic window" claim ignored CI + Copilot review cycles + GoReleaser windows + draft-edit defensive fixes (azure-pattern recurring). Empirical baseline from this session's plugin sweep: 15-20 min per plugin PR (CI ~10min + Copilot settle ~10min + admin-merge + tag-push + GoReleaser ~5-15min + defensive edit ~30s). 5 repos parallel = bounded by slowest path ≈ 30-45 min IF all goes clean, 1-2 hours with any per-repo retry/Copilot finding, 2-3 hours with multi-round Copilot iteration. + +``` +T-0: workflow PR opened (proto + decoder + engine + Phase 2.1 followup-issue-file) +T+10-20m: workflow PR CI + Copilot settle; admin-merge; v0.54.0 tag pushed +T+20-30m: v0.54.0 GoReleaser verifies (defensive draft=false if needed) +T+30m: All 4 plugin PRs opened in parallel (each with workflow pin bumped to v0.54.0 + Capabilities computePlanVersion="v2" + minEng bumped to "0.54.0") +T+45-90m: Plugin PRs CI + Copilot review cycles (Copilot may surface findings; per-plugin iterate) +T+90-120m: All 4 plugin PRs merged; tags pushed (aws v1.2.0 + gcp v1.2.0 + azure v1.2.0 + DO v1.2.0) +T+120-150m: All 4 plugin GoReleaser runs verified; defensive draft=false per plugin +T+150-180m: Cross-plugin smoke: install each plugin + run sample apply + verify v2 dispatch + Actions populated +T+180m+: Memory update + close Phase 2 + flag Phase 2.1 + Phase 3 followups +``` + +**Plugin v1.2.0 semver decision (cycle-1 reviewer flagged):** plugins move from v1.1.x to v1.2.0 (MINOR bump). Decision rationale: per ADR 0040 + 0024, old plugin tags become permanently incompatible with workflow v0.54.0+ (hard-cutover). Semver convention treats this as "compatibility ratcheting via minEngineVersion" rather than "API break of the plugin's own API" — the PLUGIN's exported API (Go module + plugin.json schema) is unchanged; what changes is which workflow version the plugin requires. MINOR is the established convention for "minEngineVersion floor moved" per prior sweep (v1.1.0 from v0.51.x → v0.53.x pin was also a minor bump). Operator who wants MAJOR (v2.0.0) should override before Phase 2 ships. + +**Failure modes + rollback:** + +| Failure | Detection | Recovery | +|---------|-----------|----------| +| 1 of 4 plugin PRs fails CI | Pre-merge | Pause cutover; fix plugin; resume | +| 1 of 4 plugin tags fails GoReleaser | Post-tag | Defensive draft-edit; if assets missing, debug per-plugin | +| Cross-plugin smoke fails on plugin X | Post-tag | Cut plugin X v1.2.1 hotfix; OR revert workflow v0.54.0 → v0.54.1 reverting proto change (LAST RESORT — undoes hard-cutover) | +| All 4 plugins ship clean, smoke green | — | DONE; proceed to memory updates | + +**ADR 0024 binding constraint:** NO graceful fallback. If workflow v0.54.0 ships but a plugin can't update for any reason, that plugin is permanently incompatible with v0.54.0+. Operator must pin workflow to v0.53.x until plugin catches up. + +## Approaches considered + +**Approach A (CHOSEN — REVISED PER CYCLE-1): additive proto field + ALL 4 PLUGINS DECLARE compute_plan_version="v2" → uniform engine-side population.** wfctl routes via wfctlhelpers.ApplyPlanWithHooks for all 4; engine populates ApplyResult.Actions per dispatch loop iteration. Pro: single dispatch path; engine has 1:1 plan→outcome iteration so length-validation invariant is natural; no per-plugin custom-loop code. Con: aws/gcp/azure's existing IaCProvider.Apply method becomes unreachable post-cutover (dead code; kept in-tree for Phase 2 minimization, deletable in Phase 2.5). + +**Approach B (REJECTED — was previously "custom-loop population for non-canonical plugins"): plugins' own Apply loops populate Actions field via gRPC wire.** Cycle-1 review correctly killed this: hooks fire only on v2 dispatch path; custom-loop plugins take v1 path; wfctl never reads Actions even if populated. Pattern was architecturally broken. + +**Approach C (REJECTED): inject per-action hooks via callback bridge over gRPC streaming.** Pro: real-time hook firing during apply. Con: massively more complex; streaming RPC requires bidirectional client/server state; out of scope; violates ADR 0024 (new dispatch path = compat shim by another name). + +## Assumptions + +1. **Proto wire format is forwards-compatible AT THE FIELD LEVEL** — new `actions` field at tag 7 doesn't conflict with existing fields. Verified by checking iac.proto:295 (ApplyResult tags 1-6 used). New tag 7 + reserved-tags discipline = safe. +2. **All 4 IaC plugins use the same `iac.proto` definition** — verified plan-2 + plan-1 sweep work. Plugins re-import workflow/plugin/external/proto via go module path. +3. **`wfctlhelpers.ApplyPlanWithHooks` is the engine-side path that populates `result.Actions`** — direct path: dispatchAction → action result evidence → append to result.Actions. Verified during Phase 1: ApplyPlanWithHooks is the ONLY engine entrypoint after Phase 1+4 migration. +4. **`pluginmanifest.ValidateBytes` (or equivalent) exists or can be added** — verified existence of `schema/` package with JSON schema generation. Phase 2 adds the runtime validator if not present. +5. **`extractOutputs` helper composing per-resource outputs from interfaces.ResourceOutput is straightforward** — verify during writing-plans. +6. **No plugin currently relies on a "DEFER actions field to engine" behavior** — i.e., custom-loop plugins (aws/gcp/azure) don't expect engine to fill in Actions retroactively. Verified: today they return ApplyResult without Actions; the absence is interpreted as "all aggregated in errors[] field" — Phase 2 forces them to populate explicitly. +7. **Plugins are released with semver MINOR bump (v1.1.0 → v1.2.0).** Phase 2 IS a breaking-compat change (per ADR 0024 the old plugin can't talk to new workflow), but the SEMVER convention treats this as breaking-via-minimum-engine-version, not as a major bump. Verify with operator if minor bump is too aggressive. +8. **wfctl v0.54.0 minEngineVersion enforcement happens via existing `*-engine-range` conformance CI gates per `feedback_check_versions_actively`.** Plugins ship with minEngineVersion: "0.54.0" — old workflow pin (v0.53.x) won't load v1.2.0 plugins. +9. **Test plan: every plugin PR's test suite has a NEW unit test asserting `ApplyResult.Actions` populated with len == len(plan.Actions) + correct ActionStatus per action.** Mandatory; Phase 2 ADR 0041 records this. + +## Self-challenge round (top doubts) + +1. **`ACTION_STATUS_UNSPECIFIED` rejection at decoder layer is strict; could break in-flight upgrade window.** If wfctl is upgraded to v0.54.0 while a stale-tagged plugin (pre-v1.2.0) is still loaded, the plugin emits no actions field → decoder reads len(actions)==0 vs plan having N actions → REJECTS. This is intentional per ADR 0024 (no graceful fallback). But it means operators MUST upgrade plugins immediately after upgrading workflow. + +2. **Per-action `output_keys` semantic ambiguity** — for delete actions, what goes in output_keys? Spec says "empty for delete actions." But ResourceReplacer (delete-then-create) may surface new outputs from the recreate phase. Decision needed: which action is the replace recorded under — the delete or the create-after-delete? + +3. **Manifest validation gate is a Phase 2 scope addition (not in original ADR 0040 invariants).** Including it expands Phase 2. Counter: Phase 1 Assumption 8 explicitly flagged it as Phase 2 scope; the cycle-1 reviewer accepted this. + +## Tech Stack + +- Protobuf (`plugin/external/proto/iac.proto` + `iac.pb.go` regen) +- Go modules (workflow + 4 plugin repos) +- GoReleaser (per-repo release tags) +- JSON schema (manifest validation) +- ALL workflow-side Go cmds need `GOWORK=off` + +## Base branch + +- workflow: `feat/v2-phase2-grpc-contract` off origin/main +- workflow-plugin-aws: `feat/v2-applyresult-actions` off origin/main +- workflow-plugin-gcp: same +- workflow-plugin-azure: same +- workflow-plugin-digitalocean: same + +## Rollback + +Per-PR rollback per Phase 1 precedent. If hard-cutover fails mid-flight: + +1. **workflow v0.54.0 revertible** — cut v0.54.1 reverting iac.proto + iac.pb.go + applyResultFromPB + engine populate + manifest gate. Removes the Phase 2 hard-cutover but preserves Phase 1 (ADR 0040, godoc Deprecated marker, 4 caller migrations). +2. **Plugin tags permanent in Go proxy** — old v1.2.0 tags can't be deleted, but `latest` resolves to subsequent rollback tag (v1.2.1) re-pinning workflow → v0.53.x. +3. **Matched-pair rollback** — workflow v0.54.0 + 4 plugin v1.2.0 tags ALL roll back together if Phase 2 needs full revert. Matches plan-2 cloud-SDK matched-pair pattern. + +## Decisions to record + +ADR 0041 — "V2 ApplyResponse Actions contract": +- Status: Accepted +- Context: Phase 1 (ADR 0040) declared 5 provider compatibility expectations; Phase 2 implements +- Decision: extend ApplyResult proto with `repeated ActionResult actions` + new `enum ActionStatus`; wfctl decoder rejects ApplyResponse with len(actions) != len(plan.actions) OR any UNSPECIFIED status; coordinated 5-repo cutover per ADR 0024 +- Consequences: workflow v0.54.0 + 4 plugin v1.2.0 release window; plugins MUST upgrade simultaneously; old plugin tags permanently incompatible with v0.54.0+; manifest validation DEFERRED to Phase 2.1 (separate workflow-side PR); ALL 4 plugins uniformly declare compute_plan_version="v2" → engine-side population (single dispatch path, no Pattern A/B bifurcation per cycle-1 collapse); aws/gcp/azure existing IaCProvider.Apply impls become dead code (kept for Phase 2; deletable in Phase 2.5); ACTION_STATUS_COMPENSATED + COMPENSATION_FAILED enum values RESERVED but not defined in Phase 2 (no engine emission path; Phase 2.3 when compensation logic lands) + +## Pipeline next step + +After this design lands + adversarial-design-review --phase=design PASSES → invoke `superpowers:writing-plans`. The plan body produces a Scope Manifest with 5 PRs (1 per repo) and 10 tasks (proto + regen + interfaces + decoder + engine populate + manifest gate + aws Apply + gcp Apply + azure Apply + DO pin bump + 4 plugin tag releases + cross-plugin smoke + memory update). + +**Note on autonomous execution:** Phase 2's 5-repo coordinated cutover is substantively heavier than this session's accumulated context budget can sustain in a single autonomous run. After scope-lock, the plan is ready for fresh-session execution per the user mandate to recreate-team-per-task. Fresh session picks up the locked plan + dispatches subagent-driven-development. diff --git a/docs/plans/2026-05-16-v2-lifecycle-phase2.md b/docs/plans/2026-05-16-v2-lifecycle-phase2.md new file mode 100644 index 00000000..713c88f2 --- /dev/null +++ b/docs/plans/2026-05-16-v2-lifecycle-phase2.md @@ -0,0 +1,696 @@ +# V2 Action Lifecycle Phase 2 Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Extend the IaC plugin gRPC contract with per-action outcome evidence (`ApplyResult.actions` + `ActionStatus` enum) so wfctl-side v2 hooks fire correctly, shipped as a HARD-CUTOVER coordinated 5-repo cascade per ADR 0024 + 0040. + +**Architecture:** workflow PR adds proto extension + Go interfaces + decoder + engine-side population. 4 plugin PRs declare `compute_plan_version="v2"` in CapabilitiesResponse so wfctl routes through v2 dispatch (`wfctlhelpers.ApplyPlanWithHooks`) for all 4. Engine-side dispatch populates Actions uniformly; no per-plugin custom Apply loops needed (Pattern B collapsed per cycle-1 review). Plugins' existing `IaCProvider.Apply` impls become dead code post-cutover (kept in-tree for Phase 2 minimization). + +**Tech Stack:** Protobuf (iac.proto + regenerated iac.pb.go), Go modules (workflow + 4 plugin repos), GoReleaser, ADR 0024 + 0040 binding constraints (NO compat shim, NO graceful proto fallback). + +**Base branch:** `main` per repo (workflow + 4 plugins each) + +--- + +## Scope Manifest + +**PR Count:** 5 +**Tasks:** 9 +**Estimated Lines of Change:** ~500 (workflow ~400 incl proto+pb.go regen; per-plugin ~5-10 LOC × 4 plugins; smoke + memory ~50) + +**Out of scope:** +- Phase 2.1: manifest validation gate at `cmd/wfctl/deploy_providers.go::findIaCPluginDir` (deferred per design — separate workflow-side PR; either implements new `plugin/external/manifest` package OR ships lightweight `computePlanVersion ∈ {v1,v2}` check) +- Phase 2.3: engine-side compensation logic + `ACTION_STATUS_COMPENSATED` / `COMPENSATION_FAILED` enum value emission (tags 4+5 reserved in proto comment; defined-without-emitting is dead code per cycle-2 review) +- Phase 2.5: delete dead-code `IaCProvider.Apply` impls on aws/gcp/azure (unreachable post-Phase-2 cutover; kept in Phase 2 to minimize blast radius) +- Phase 3: codemod-driven canonical-form bump for DO (was relevant pre-cycle-1; collapsed in cycle-1 — DO now also routes via v2 dispatch) +- Phase 5: remove `wfctlhelpers.ApplyPlan` (Phase 4 already shipped via #691; Phase 5 gates on Phase 2+3 being clean — now only Phase 2 dependency since Phase 3 collapsed) +- Per-action `output_keys` field on ActionResult (dropped per cycle-2 reviewer's Option — hook firing requires only action_index + status; per-resource outputs already in ApplyResult.resources) + +**PR Grouping:** + +| PR # | Title | Tasks | Branch | +|------|-------|-------|--------| +| 1 | feat: workflow v0.54.0 — ApplyResult.Actions + ActionStatus enum + engine population | Task 1, Task 2, Task 3, Task 4, Task 5 | `feat/v2-phase2-grpc-contract` (in workflow) | +| 2 | feat: declare ComputePlanVersion v2 + bump workflow v0.54.0 pin; release v1.2.0 | Task 6 | `feat/v2-capabilities-v2` (in workflow-plugin-aws) | +| 3 | feat: declare ComputePlanVersion v2 + bump workflow v0.54.0 pin; release v1.2.0 | Task 7 | `feat/v2-capabilities-v2` (in workflow-plugin-gcp) | +| 4 | feat: declare ComputePlanVersion v2 + bump workflow v0.54.0 pin; release v1.2.0 | Task 8 | `feat/v2-capabilities-v2` (in workflow-plugin-azure) | +| 5 | feat: declare ComputePlanVersion v2 + bump workflow v0.54.0 pin; release v1.2.0 | Task 9 | `feat/v2-capabilities-v2` (in workflow-plugin-digitalocean) | + +**Post-cascade closeout** (cross-plugin smoke verification + memory update) is a team-lead operational step described after the task list — NOT a separate plan-task; it doesn't create a PR. + +**Status:** Locked 2026-05-16T12:23:06Z + +--- + +## Pre-dispatch setup (team-lead, ONCE before any task starts) + +Per the cloud-SDK plugin sweep precedent — team-lead actions BEFORE dispatching implementers: + +1. Verify ADR 0024 + 0040 are still binding (read decisions/0024 + decisions/0040; confirm no override in flight). +2. **File `workflow#640-phase-2.1` follow-up tracking issue** (moved from Task 6 per cycle-1 plan-review I-2 — this is a team-lead action, not an implementer task; the tracking issue body references the Phase 2 PR which doesn't exist yet at implementer-time). Issue body: "Phase 2.1 follow-up to #640 Phase 2 (PRs land via Phase 2 cascade) — add manifest validation gate at cmd/wfctl/deploy_providers.go::findIaCPluginDir per Phase 1 Assumption 8. Three implementation options recorded in Phase 2 design doc (full pluginmanifest package; reuse existing schema/; lightweight computePlanVersion enum check). Pick at design time." + +--- + +## Universal per-plugin pattern (PR 2-5 — applies to Tasks 6, 7, 8, 9) + +For tasks 6-9, each follows the same 5-step pattern. Repo-specific details inline per task. + +**Files per plugin PR:** +- Modify: `go.mod` (workflow pin v0.53.1 → v0.54.0) +- Modify: `go.sum` (auto) +- Modify: `plugin.json` (minEngineVersion → "0.54.0") +- Modify: `internal/server.go` OR equivalent gRPC handler file (add `ComputePlanVersion: "v2"` to CapabilitiesResponse) +- Tag: `v1.2.0` + +**Step 1: Branch + ff-pull** + +```bash +cd /Users/jon/workspace/ +git fetch origin +git checkout -b feat/v2-capabilities-v2 origin/main +git pull --ff-only origin main +``` + +**Step 2: Bump workflow pin + declare ComputePlanVersion v2** + +Edit `go.mod`: +``` +require ( + github.com/GoCodeAlone/workflow v0.54.0 # was v0.53.1 + ... +) +``` + +Edit **`internal/iacserver.go`** (verified file location across all 4 plugins via cycle-1 plan-review). The Capabilities method signature is identical across all 4 plugins: + +```go +// Existing pattern (verified per-plugin — receiver varies: awsIaCServer / gcpIaCServer / azureIaCServer / doIaCServer): +func (s *IaCServer) Capabilities(_ context.Context, _ *pb.CapabilitiesRequest) (*pb.CapabilitiesResponse, error) { + // ... existing IaCCapabilityDeclaration build into `out` ... + return &pb.CapabilitiesResponse{Capabilities: out}, nil // CURRENT +} +``` + +**Change ONLY the return-statement struct literal** to add ComputePlanVersion field: + +```go +return &pb.CapabilitiesResponse{ + Capabilities: out, + ComputePlanVersion: "v2", // NEW Phase 2: declare v2 dispatch per ADR 0040 +}, nil +``` + +Do NOT rewrite the function body, receiver type, parameter types, or parameter names. + +**Step 3: Tidy + build + test** + +```bash +go mod tidy +go build ./... +go test ./... -race +``` + +Expected: all green. + +**Step 4: Update plugin.json** + +```json +{ + ... + "minEngineVersion": "0.54.0", + ... +} +``` + +**Step 5: Commit + PR + admin-merge + tag v1.2.0 + verify release** + +```bash +git add go.mod go.sum plugin.json internal/iacserver.go +git commit -m "feat: declare ComputePlanVersion v2 + bump workflow v0.53.1 → v0.54.0; release v1.2.0" +git push -u origin feat/v2-capabilities-v2 +gh pr create --base main --head feat/v2-capabilities-v2 \ + --title "feat: declare ComputePlanVersion v2 + bump workflow v0.54.0 pin; release v1.2.0" \ + --body "Phase 2 of workflow#640 hard-cutover per ADR 0024 + 0040. Plugin declares ComputePlanVersion=\"v2\" in CapabilitiesResponse → wfctl routes through wfctlhelpers.ApplyPlanWithHooks (v2 dispatch path) → engine populates ApplyResult.Actions per dispatch. Existing IaCProvider.Apply impl becomes dead code post-cutover (kept for Phase 2 minimization; Phase 2.5 cleanup may delete). + +Coordinated with workflow PR (workflow v0.54.0 must be tagged + released BEFORE this plugin PR's go.mod bump resolves). Cascading rollback: if this plugin's v1.2.0 fails downstream consumer, cut v1.2.1 reverting workflow pin + Capabilities declaration." +``` + +After CI green + Copilot settle (~10 min) + admin-merge: + +```bash +gh pr merge --squash --admin --delete-branch +git checkout main && git pull +git tag v1.2.0 +git push origin v1.2.0 +gh release list --repo GoCodeAlone/ --json tagName,isDraft,isLatest --limit 1 +``` + +Defensive draft-edit if drafted: `gh release edit v1.2.0 --repo GoCodeAlone/ --draft=false --latest`. + +**Rollback (per-plugin):** if downstream consumer breaks, cut `v1.2.1` re-pinning workflow → v0.53.1 + reverting ComputePlanVersion="v2" → "". Per ADR 0040, plugins on v0.53.x pins permanently incompatible with workflow v0.54.0+; v1.2.1 rollback re-establishes v0.53.x compat. + +--- + +## Tasks + +### Task 1: workflow — extend iac.proto + REGENERATE iac.pb.go in same commit (cycle-1 I-1 bundled) + +**Files:** +- Modify: `plugin/external/proto/iac.proto` (add ActionStatus enum + ActionResult message; add `repeated ActionResult actions = 7;` to ApplyResult) +- Modify: `plugin/external/proto/iac.pb.go` (regenerated; bundled in same commit per cycle-1 plan-review I-1 — splitting proto+regen creates broken intermediate commit that fails CI) + +**Step 1: Edit iac.proto — add ActionStatus enum + ActionResult message** + +Add the following BEFORE `message ApplyResult` (line 295): + +```protobuf +// ActionStatus categorizes per-action outcomes for wfctl-side hook dispatch. +// Per workflow#640 Phase 2 + ADR 0040 invariants 1-2. Tags 4+5 reserved +// for Phase 2.3 ACTION_STATUS_COMPENSATED + ACTION_STATUS_COMPENSATION_FAILED +// when engine-side compensation lands. +enum ActionStatus { + ACTION_STATUS_UNSPECIFIED = 0; // wfctl REJECTS this on receipt + ACTION_STATUS_SUCCESS = 1; + ACTION_STATUS_ERROR = 2; + ACTION_STATUS_DELETE_FAILED = 3; + // 4 + 5 reserved (Phase 2.3 compensation) +} + +// ActionResult is the per-action outcome surfacing for Phase 2 v2 hooks. +// Per ADR 0040 invariant 1. output_keys field DROPPED per cycle-2 review +// (hook firing only needs action_index + status; per-resource outputs +// already in ApplyResult.resources). +message ActionResult { + uint32 action_index = 1; + ActionStatus status = 2; + string error = 3; +} +``` + +Modify ApplyResult (existing at line 295) to add the new field at tag 7: + +```protobuf +message ApplyResult { + string plan_id = 1; + repeated ResourceOutput resources = 2; + repeated ActionError errors = 3; + map initial_input_snapshot = 4; + repeated DriftEntry input_drift_report = 5; + map replace_id_map = 6; + repeated ActionResult actions = 7; // NEW Phase 2 (workflow#640) +} +``` + +**Step 2: Verify proto syntax** + +Run: `protoc --proto_path=plugin/external/proto --descriptor_set_out=/dev/null plugin/external/proto/iac.proto` +Expected: exit 0, no syntax errors. + +**Step 3: Regenerate iac.pb.go (bundled in same commit per cycle-1 I-1)** + +```bash +grep -rn 'protoc.*iac.proto\|//go:generate.*proto' Makefile scripts/ plugin/external/proto/ 2>&1 | head -5 +# Run whichever regen command the repo uses (make proto / go generate / etc.) +make proto # OR go generate ./plugin/external/proto/... +``` + +**Step 4: Verify generated symbols + build** + +```bash +grep -n 'ActionStatus\|ActionResult\|GetActions' plugin/external/proto/iac.pb.go | head -10 +GOWORK=off go build ./plugin/external/proto/... +``` + +Expected: lines matching `type ActionResult struct`, `type ApplyResult struct { ... Actions []*ActionResult ...`, `func (x *ApplyResult) GetActions() []*ActionResult`, enum constants for ActionStatus; build exit 0. + +**Step 5: Commit (BUNDLED — proto + regen atomic)** + +```bash +git add plugin/external/proto/iac.proto plugin/external/proto/iac.pb.go +git commit -m "feat(proto): extend ApplyResult with Actions field + ActionStatus enum; regenerate iac.pb.go (Phase 2)" +``` + +**Verification (internal-logic refactor + proto regen):** protoc clean; build clean; grep confirms new symbol presence. + +**Rollback:** revert commit (regen + proto edit roll back together). + +--- + +### Task 2: workflow — extend interfaces.ApplyResult + ActionOutcome + ActionStatus Go types + +**Files:** +- Modify: `interfaces/iac.go` (add ActionStatus enum + ActionOutcome struct; extend ApplyResult.Actions field) + +**Step 1: Edit interfaces/iac.go — add types** + +Find the existing `type ApplyResult struct` and add Actions field. Add ActionStatus + ActionOutcome above: + +```go +// ActionStatus mirrors pb.ActionStatus for type-safe Go boundary use. +// Per workflow#640 Phase 2 + ADR 0040. +type ActionStatus uint8 + +const ( + ActionStatusUnspecified ActionStatus = iota + ActionStatusSuccess + ActionStatusError + ActionStatusDeleteFailed + // Future Phase 2.3: ActionStatusCompensated, ActionStatusCompensationFailed +) + +// ActionOutcome mirrors pb.ActionResult. +type ActionOutcome struct { + ActionIndex uint32 + Status ActionStatus + Error string +} + +// ApplyResult — Actions field added Phase 2. +type ApplyResult struct { + PlanID string + Resources []ResourceOutput + Errors []ActionError + InitialInputSnapshot map[string]string + InputDriftReport []DriftEntry + ReplaceIDMap map[string]string + Actions []ActionOutcome // NEW Phase 2 +} +``` + +**Step 2: Build verify** + +```bash +GOWORK=off go build ./interfaces/... +``` + +Expected: exit 0. + +**Step 3: Commit** + +```bash +git add interfaces/iac.go +git commit -m "feat(interfaces): add ActionStatus + ActionOutcome; extend ApplyResult.Actions (Phase 2)" +``` + +**Verification (internal-logic refactor class):** build green. No callers break (Actions field is new; existing callers ignore it until Task 5 populates). + +**Rollback:** revert commit; remove Actions field + types. + +--- + +### Task 3: workflow — extend applyResultFromPB to decode + reject UNSPECIFIED + +**Files:** +- Modify: `cmd/wfctl/iac_typed_adapter.go:1177` (extend applyResultFromPB) + +**Step 1: Write failing test** + +Add to `cmd/wfctl/iac_typed_adapter_test.go`: + +```go +func TestApplyResultFromPB_DecodesActions(t *testing.T) { + pbResult := &pb.ApplyResult{ + PlanId: "plan-1", + Actions: []*pb.ActionResult{ + {ActionIndex: 0, Status: pb.ActionStatus_ACTION_STATUS_SUCCESS}, + {ActionIndex: 1, Status: pb.ActionStatus_ACTION_STATUS_DELETE_FAILED, Error: "AWS API error"}, + }, + } + got, err := applyResultFromPB(pbResult) + if err != nil { t.Fatalf("err: %v", err) } + if len(got.Actions) != 2 { t.Fatalf("expected 2 actions, got %d", len(got.Actions)) } + if got.Actions[0].Status != interfaces.ActionStatusSuccess { t.Errorf("action 0: want Success, got %v", got.Actions[0].Status) } + if got.Actions[1].Status != interfaces.ActionStatusDeleteFailed { t.Errorf("action 1: want DeleteFailed, got %v", got.Actions[1].Status) } + if got.Actions[1].Error != "AWS API error" { t.Errorf("action 1 error mismatch") } +} + +func TestApplyResultFromPB_RejectsUNSPECIFIED(t *testing.T) { + pbResult := &pb.ApplyResult{ + Actions: []*pb.ActionResult{ + {ActionIndex: 0, Status: pb.ActionStatus_ACTION_STATUS_UNSPECIFIED}, + }, + } + _, err := applyResultFromPB(pbResult) + if err == nil { t.Fatal("expected error on UNSPECIFIED status, got nil") } + if !strings.Contains(err.Error(), "UNSPECIFIED") { t.Errorf("error should mention UNSPECIFIED: %v", err) } +} +``` + +**Step 2: Run test → expected FAIL** + +```bash +GOWORK=off go test ./cmd/wfctl/ -run 'TestApplyResultFromPB_DecodesActions|TestApplyResultFromPB_RejectsUNSPECIFIED' -v +``` + +Expected: FAIL — `got.Actions` is empty / nil. + +**Step 3: Implement** + +In `cmd/wfctl/iac_typed_adapter.go`, find `func applyResultFromPB(r *pb.ApplyResult) (*interfaces.ApplyResult, error)` (line 1177). Add after existing field decoding, before `return result, nil`: + +```go +// Phase 2: decode per-action outcomes (workflow#640). +actions := make([]interfaces.ActionOutcome, 0, len(r.GetActions())) +for _, a := range r.GetActions() { + if a.GetStatus() == pb.ActionStatus_ACTION_STATUS_UNSPECIFIED { + return nil, fmt.Errorf("plugin returned ActionResult with UNSPECIFIED status at action_index=%d (Phase 2 contract violation per ADR 0041)", a.GetActionIndex()) + } + actions = append(actions, interfaces.ActionOutcome{ + ActionIndex: a.GetActionIndex(), + Status: mapPBActionStatusToInterface(a.GetStatus()), + Error: a.GetError(), + }) +} +result.Actions = actions +``` + +Add helper at file end: + +```go +func mapPBActionStatusToInterface(s pb.ActionStatus) interfaces.ActionStatus { + switch s { + case pb.ActionStatus_ACTION_STATUS_SUCCESS: + return interfaces.ActionStatusSuccess + case pb.ActionStatus_ACTION_STATUS_ERROR: + return interfaces.ActionStatusError + case pb.ActionStatus_ACTION_STATUS_DELETE_FAILED: + return interfaces.ActionStatusDeleteFailed + default: + return interfaces.ActionStatusUnspecified + } +} +``` + +**Step 4: Run tests → expected PASS** + +```bash +GOWORK=off go test ./cmd/wfctl/ -run 'TestApplyResultFromPB_DecodesActions|TestApplyResultFromPB_RejectsUNSPECIFIED' -v +``` + +Expected: PASS. + +**Step 5: Commit** + +```bash +git add cmd/wfctl/iac_typed_adapter.go cmd/wfctl/iac_typed_adapter_test.go +git commit -m "feat(wfctl): applyResultFromPB decodes ActionResult; rejects UNSPECIFIED (Phase 2)" +``` + +**Verification (internal-logic refactor):** tests PASS; existing applyResultFromPB tests still PASS (no regression on existing fields). + +**Rollback:** revert commit. + +--- + +### Task 4: workflow — engine-side ApplyPlanWithHooks populates result.Actions + length-validation assert + +**Files:** +- Modify: `iac/wfctlhelpers/apply.go:118` (applyPlanWithEnvProviderAndHooks — add Actions append per dispatch + post-loop length assert) +- Modify: `iac/wfctlhelpers/apply_hooks_test.go` (add test verifying Actions populated) + +**Step 1: Write failing tests** + +Add to `iac/wfctlhelpers/apply_hooks_test.go` — use the REAL fakeProvider API (verified via grep in cycle-1 plan-review C-2: `type fakeProvider struct { driver *fakeDriver; driverErr error }`; `newFakeProvider() *fakeProvider`; single driver, NOT drivers map): + +```go +func TestApplyPlanWithHooks_PopulatesActions_CleanSuccess(t *testing.T) { + p := newFakeProvider() // single-driver fakeProvider from apply_test.go + plan := &interfaces.IaCPlan{ + ID: "plan-1", + Actions: []interfaces.PlanAction{ + {Resource: interfaces.ResourceRef{Type: "test.resource", Name: "r1"}, Action: "create"}, + {Resource: interfaces.ResourceRef{Type: "test.resource", Name: "r2"}, Action: "create"}, + }, + } + result, err := ApplyPlanWithHooks(context.Background(), p, plan, ApplyPlanHooks{}) + if err != nil { t.Fatalf("err: %v", err) } + if len(result.Actions) != 2 { t.Fatalf("expected 2 ActionOutcomes, got %d", len(result.Actions)) } + for i, a := range result.Actions { + if a.ActionIndex != uint32(i) { t.Errorf("action %d index mismatch: %d", i, a.ActionIndex) } + if a.Status != interfaces.ActionStatusSuccess { t.Errorf("action %d status: %v", i, a.Status) } + } +} + +// CRITICAL test per cycle-1 plan-review C-1: pre-dispatch continue paths +// (driver-resolve error at apply.go:234) MUST still produce ActionOutcome +// entries so the post-loop length assert doesn't false-fail. +func TestApplyPlanWithHooks_PopulatesActions_PreDispatchDriverError(t *testing.T) { + p := &fakeProvider{driverErr: errors.New("driver resolution failed")} + plan := &interfaces.IaCPlan{ + ID: "plan-1", + Actions: []interfaces.PlanAction{ + {Resource: interfaces.ResourceRef{Type: "unknown.resource", Name: "r1"}, Action: "create"}, + }, + } + result, err := ApplyPlanWithHooks(context.Background(), p, plan, ApplyPlanHooks{}) + // Expect best-effort: no top-level error; result.Actions has 1 entry with + // Status==Error (since driver resolution failed pre-dispatch). + if err != nil { t.Fatalf("expected no top-level err on driver-resolve failure, got: %v", err) } + if len(result.Actions) != 1 { t.Fatalf("expected 1 ActionOutcome (length-assert invariant), got %d", len(result.Actions)) } + if result.Actions[0].Status != interfaces.ActionStatusError { + t.Errorf("driver-resolve-error action status: want Error, got %v", result.Actions[0].Status) + } +} +``` + +**Step 2: Run → expected FAIL** + +```bash +GOWORK=off go test ./iac/wfctlhelpers/ -run 'TestApplyPlanWithHooks_PopulatesActions' -v +``` + +Expected: FAIL — `result.Actions` empty (both tests fail). + +**Step 3: Implement engine-side population in applyPlanWithEnvProviderAndHooks (covers ALL continue paths per cycle-1 plan-review C-1)** + +In `iac/wfctlhelpers/apply.go`, the dispatch loop has multiple `continue` exits (verified: lines 224 jit-error, 234 driver-resolve-error, 261 dispatchAction-error, 287, 313). **EVERY continue path must append an ActionOutcome** OR the post-loop length-assert false-fails on legitimate plans with errors. + +Cleanest implementation: deferred closure inside the loop body that records the ActionOutcome on every exit path: + +```go +for i := range plan.Actions { + action := plan.Actions[i] + var dispatchErr error + var loopErr error // captures the actual error of this iteration + + // Deferred closure: runs on EVERY exit from this iteration (continue or fall-through). + // Guarantees 1-to-1 correspondence between plan.Actions and result.Actions + // regardless of which continue/branch the code took. + func() { + defer func() { + status := mapDispatchErrToStatus(loopErr, action.Action) + errStr := "" + if loopErr != nil { errStr = loopErr.Error() } + result.Actions = append(result.Actions, interfaces.ActionOutcome{ + ActionIndex: uint32(i), + Status: status, + Error: errStr, + }) + }() + + // ctx cancellation check + if err := ctx.Err(); err != nil { loopErr = err; return } + + // Existing JIT substitution at apply.go:217 + resolved, err := jitsubst.ResolveSpec(action.Resource, result.ReplaceIDMap, syncedOutputs, os.LookupEnv) + if err != nil { + // ... existing result.Errors append for JIT error ... + loopErr = fmt.Errorf("jit substitution: %w", err) + return + } + + // Existing driver resolution at apply.go:228 + d, err := p.ResourceDriver(action.Resource.Type) + if err != nil { + // ... existing result.Errors append for driver-resolve error ... + loopErr = err + return + } + + // Existing dispatchAction call at apply.go:251 + if err := dispatchAction(ctx, d, resolved, result, actionHooks, deleteHookActive); err != nil { + // ... existing result.Errors handling ... + loopErr = err + return + } + // Success path — loopErr stays nil; deferred closure records ActionStatusSuccess. + }() +} +``` + +The implementer should RESTRUCTURE the existing loop body to fit this shape — the deferred closure pattern preserves the existing best-effort continue-on-error semantics while guaranteeing the ActionOutcome append on every path. + +Add helper at file end: + +```go +// mapDispatchErrToStatus returns the ActionStatus per workflow#640 Phase 2. +// Compensation paths (Phase 2.3) reserved; today only 3 reachable statuses. +func mapDispatchErrToStatus(err error, actionType string) interfaces.ActionStatus { + if err == nil { + return interfaces.ActionStatusSuccess + } + if actionType == "delete" { + return interfaces.ActionStatusDeleteFailed + } + return interfaces.ActionStatusError +} +``` + +After the dispatch loop completes (just before `return result, nil` at function end), add the length-validation assert: + +```go +// Engine-side invariant: len(result.Actions) must equal len(plan.Actions). +// Per workflow#640 Phase 2 + ADR 0041 (length validation lives engine-side, +// not in applyResultFromPB which is on v1 dispatch path). +if len(result.Actions) != len(plan.Actions) { + return result, fmt.Errorf("internal: ApplyPlanWithHooks produced %d ActionOutcomes for %d plan actions (engine invariant violation)", len(result.Actions), len(plan.Actions)) +} +``` + +**Step 4: Run tests → expected PASS** + +```bash +GOWORK=off go test ./iac/wfctlhelpers/ -race +``` + +Expected: all PASS, including the new test + existing tests (no regression). + +**Step 5: Commit** + +```bash +git add iac/wfctlhelpers/apply.go iac/wfctlhelpers/apply_hooks_test.go +git commit -m "feat(engine): applyPlanWithEnvProviderAndHooks populates result.Actions + length assert (Phase 2)" +``` + +**Verification (internal-logic refactor):** tests PASS; race-detector clean. + +**Rollback:** revert commit. + +--- + +### Task 5: workflow — cut v0.54.0 tag from main HEAD (team-lead action post-PR1 merge) + +**Files:** none (git tag only) + +**Step 1: Verify PR 1 merged + Tasks 1-6 all complete** + +```bash +cd /Users/jon/workspace/workflow +git fetch origin +git log origin/main --oneline -8 | grep -E 'Phase 2|ApplyResult.Actions|ActionStatus' +``` + +Expected: 5+ commits from Tasks 1-5 visible (Task 6 is a GitHub issue, no commit; Task 7 is the tag). + +**Step 2: Tag v0.54.0 from main HEAD** + +```bash +git checkout main && git pull +git tag v0.54.0 +git push origin v0.54.0 +``` + +**Step 3: Verify release publishes** + +```bash +sleep 60 +gh release list --repo GoCodeAlone/workflow --json tagName,isDraft,isLatest --limit 3 +``` + +Expected: v0.54.0 in list. Defensive draft-edit if drafted: `gh release edit v0.54.0 --repo GoCodeAlone/workflow --draft=false --latest`. + +**Verification (build pipeline + version pin update class — runtime-launch-validation triggered):** +- `gh release view v0.54.0 --repo GoCodeAlone/workflow --json assets,isDraft --jq '"draft=\(.isDraft) assets=\(.assets|length)"'` → `draft=false assets≥4` +- Plugin PRs (Tasks 8-11) can now resolve workflow v0.54.0 via Go proxy (proxy reads git tags immediately) + +**Rollback:** if v0.54.0 has critical issue, cut v0.54.1 reverting whichever commit broke; old v0.54.0 tag stays in Go proxy (immutable) but `latest` resolves to v0.54.1. Per ADR 0040 matched-pair rollback: if revert needed before plugins migrate, plugins stay on v0.53.x pin. + +--- + +### Task 6: workflow-plugin-aws — declare ComputePlanVersion v2; release v1.2.0 + +**Repo:** `/Users/jon/workspace/workflow-plugin-aws` + +Apply the **Universal per-plugin pattern** at top of plan. + +**Files specifics:** +- `internal/server.go` OR equivalent (grep `func.*Capabilities` to locate) — add `ComputePlanVersion: "v2"` + +**Verification:** +- `go build ./... && go test ./... -race` PASS +- Post-release: `gh release view v1.2.0 --repo GoCodeAlone/workflow-plugin-aws --json assets,isDraft --jq '"draft=\(.isDraft) assets=\(.assets|length)"'` → `draft=false assets≥4` + +**Rollback:** cut v1.2.1 re-pinning workflow → v0.53.1 + removing ComputePlanVersion="v2". + +--- + +### Task 7: workflow-plugin-gcp — declare ComputePlanVersion v2; release v1.2.0 + +**Repo:** `/Users/jon/workspace/workflow-plugin-gcp` + +Apply the **Universal per-plugin pattern**. Same as Task 8. Tag v1.2.0. + +**Rollback:** v1.2.1 re-pin v0.53.1. + +--- + +### Task 8: workflow-plugin-azure — declare ComputePlanVersion v2; release v1.2.0 + +**Repo:** `/Users/jon/workspace/workflow-plugin-azure` + +Apply the **Universal per-plugin pattern**. Same as Task 8. Tag v1.2.0. + +**Note:** azure's release.yml uses `[self-hosted, Linux, X64]` runner (intentional infra per sweep precedent); confirm runners online before tag push: `gh api /orgs/GoCodeAlone/actions/runners --jq '.runners | map(select(.status=="online")) | length'` → ≥1. + +**Rollback:** v1.2.1 re-pin v0.53.1. + +--- + +### Task 9: workflow-plugin-digitalocean — declare ComputePlanVersion v2; release v1.2.0 + +**Repo:** `/Users/jon/workspace/workflow-plugin-digitalocean` + +Apply the **Universal per-plugin pattern**. Same as Task 8. Tag v1.2.0. + +**Note:** DO already routes through wfctlhelpers.ApplyPlan via canonical delegate pattern, but Phase 2 still requires the ComputePlanVersion="v2" Capabilities declaration to flag the v2 dispatch path explicitly. DO's existing DOProvider.Apply impl becomes equally dead post-cutover (wfctl skips provider.Apply → dispatches via provider.ResourceDriver instead). + +**Rollback:** v1.2.1 re-pin v0.53.1. + +--- + +## Post-cascade closeout (team-lead, AFTER all 5 PRs ship) + +These are team-lead operational actions — not separate plan-tasks. They run after the 5-PR cascade lands + before this plan is marked complete. + +**Cross-plugin smoke verification (operator-run; NOT a CI gate per design Testing section):** + +For each of aws/gcp/azure/DO: +```bash +wfctl plugin install github.com/GoCodeAlone/workflow-plugin-@v1.2.0 +``` + +Run a representative apply against a known IaC config; verify: +- wfctl chose v2 dispatch (calls `applyV2ApplyPlanWithHooksFn`, NOT `provider.Apply(ctx, &plan)`) +- ApplyResult.Actions populated with len == len(plan.Actions) +- For successful applies, ActionOutcomes have Status == ActionStatusSuccess +- For applies with delete actions that fail, status == ActionStatusDeleteFailed (use test fixture if real apply doesn't exercise this) + +If desired, capture transcript at `docs/runtime-validation/2026-05-16-phase2-cross-plugin-smoke.md` (optional — operator decides if audit-trail commit is wanted). + +**Rollback if smoke fails:** cut plugin X v1.2.1 hotfix OR cut workflow v0.54.1 reverting whichever commit broke. Per ADR 0040 matched-pair rollback. + +**Memory + close + followup tracking:** + +- Append to `/Users/jon/.claude/projects/-Users-jon-workspace/memory/project_cloud_sdk_extraction_complete.md` #640 entry: "Phase 2 SHIPPED — workflow v0.54.0 + aws/gcp/azure/DO v1.2.0 coordinated cascade per ADR 0024 + 0040. ApplyResult.Actions populated; v2 dispatch path. Followups: Phase 2.1 (manifest validation — tracking issue filed in Pre-dispatch setup step 2); Phase 2.3 (compensation); Phase 2.5 (dead-code cleanup); Phase 5 (remove ApplyPlan)." +- Update `MEMORY.md` index entry to include "Phase 2 ✓". + +--- + +## Out of scope (per design — separate future passes) + +- Phase 2.1: manifest validation gate at deploy_providers.go (tracked via Task 6 GitHub issue) +- Phase 2.3: engine-side compensation logic + ACTION_STATUS_COMPENSATED/COMPENSATION_FAILED enum value emission (tags 4+5 reserved in Phase 2 proto comment) +- Phase 2.5: delete dead-code IaCProvider.Apply impls on aws/gcp/azure (unreachable post-Phase-2 cutover; kept in Phase 2 to minimize blast radius) +- Phase 3: codemod-driven canonical-form bump for DO (cycle-1 collapsed this — DO now also routes via v2 dispatch) +- Phase 5: remove wfctlhelpers.ApplyPlan (gates only on Phase 2) +- Per-action `output_keys` field on ActionResult (dropped per cycle-2 — hook firing only needs action_index + status) diff --git a/docs/plans/2026-05-16-v2-lifecycle-phase2.md.scope-lock b/docs/plans/2026-05-16-v2-lifecycle-phase2.md.scope-lock new file mode 100644 index 00000000..bc03f751 --- /dev/null +++ b/docs/plans/2026-05-16-v2-lifecycle-phase2.md.scope-lock @@ -0,0 +1 @@ +46b795ec5ee95c4df0cec3ccc4697da5c77ec44fb1afa50068108c50a8f174ee