Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions decisions/0024-iac-typed-force-cutover.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# 0024: IaC typed force-cutover — supersedes 2026-04-26 additive design (IaC interfaces only)

- **Date:** 2026-05-10
- **Status:** Accepted

## Context

The 2026-04-26 strict-grpc-plugin-contracts design proposed an **additive**
migration: introduce typed gRPC contracts alongside the legacy
`InvokeService` + `*structpb.Struct` dispatch path; let plugins migrate
when convenient; eventually drop the legacy surface after every plugin
opted in. Workflow shipped that design's PR #497 and the resulting
14-plugin migration tracker is in flight for Module/Step/Trigger types.

For the **IaC interfaces specifically** (`interfaces.IaCProvider` +
`interfaces.ResourceDriver`), the additive path produced a recurring
bug class through 2026-04 and 2026-05:

- **T3.9 finding** — `*structpb.Struct.NewStruct` silently dropped
`map[string]bool` entries (sensitive-key flags on `ResourceOutput`),
reaching the plugin as `args=map[]` rather than producing a typing
error. The bug shipped to production before runtime-launch validation
caught it.
- **v0.27.0 EnumeratorAll bridge gap** — the wfctl-side
`*remoteIaCProvider` proxy did not implement the optional
`EnumeratorAll` method, so type-asserts in `infra audit-keys` /
`infra prune` always failed even when the plugin process implemented
it. Required a v0.27.1 hot-fix and a workspace memory entry
(`feedback_workflow_plugin_structpb_boundary`) to prevent
recurrence.
- **InvokeService case-string typo class** — the 600+ line
`remoteIaCProvider` proxy and the per-method `switch` dispatcher in
`internal/module_instance.go` paired hand-written method-name
literals on both sides. Mismatches were caught only at runtime
dispatch time, on the first call site to exercise the affected RPC.

The bug-cycle data is unambiguous: **for IaC interfaces, the additive
approach has not paid back its own complexity**. The four bug-class
surfaces (string-keyed args, structpb encoding, hand-written client
proxy, hand-written server dispatcher) compound on each other; each
one taken individually is "small," but the failure modes cross the
boundary in ways that single-side typing cannot prevent.

The user mandate (`feedback_force_strict_contracts_no_compat`) closes
the question: hard cutover, no compat shim, no build-tag dual-path,
no codes.Unimplemented loophole. The strict-contracts force-cutover
plan (`docs/plans/2026-05-10-strict-contracts-force-cutover.md`)
implements that mandate.
Comment on lines +47 to +48

## Decision

**Hard cutover for IaC-flavored interfaces (`IaCProvider`,
`ResourceDriver`) supersedes the additive approach of 2026-04-26 for
those interfaces only.** The Module/Step/Trigger additive work
(workflow PR #497 + the 14-plugin migration tracker) remains the live
design for those interfaces.

Concretely:

- Workflow ships `v1.0.0-rc1` introducing `iac.proto` with
`service IaCProviderRequired` + 6 optional services + `service
ResourceDriver`, plus a typed SDK helper that auto-registers every
service the plugin satisfies.
- Workflow ships `v1.0.0` final after the DO plugin migrates to
`pb.IaCProviderRequiredServer`. The cutover commit DELETES the
legacy `remoteIaCProvider`, `remoteResourceDriver`, the
`InvokeService` RPC (for IaC paths), and the DO plugin's
`internal/module_instance.go` switch dispatcher.
- The DO plugin (`workflow-plugin-digitalocean`) ships `v1.0.0`
implementing `pb.IaCProviderRequiredServer` + every optional
service it needs.
- Other GoCodeAlone IaC plugins (`workflow-plugin-aws`,
`workflow-plugin-gcp`, `workflow-plugin-azure`,
`workflow-plugin-tofu`) are NOT impacted: cycle 1 I-5 verified they
do not currently expose IaC via remote dispatch. They migrate
net-new at their own cadence.
- Application consumers (`core-dump`, `BMW`, `workflow-cloud`,
`ratchet`, `ratchet-cli`, `workflow-cloud-ui`) bump their
workflow + DO plugin pins. None imports `interfaces.IaCProvider`
programmatically (cycle 1 I-5 grep verified); the change is
invisible to their code.

The cutover scope is locked at commit `e82b7e0c` per
`superpowers:scope-lock`.

## Consequences

Positive:

- Eliminates four compounding bug-class surfaces (string-keyed args,
structpb encoding, hand-written client proxy, hand-written server
dispatcher) in a single coordinated cutover, rather than one PR
at a time over months.
- Compile-time safety: a missing required method on the plugin side
fails the plugin's `go build`. A missing client-side stub fails
`wfctl`'s build. The bug class becomes unreachable, not better-
caught.
- Shrinks `cmd/wfctl/deploy_providers.go` by ~600 lines (the entire
`remoteIaCProvider` + `remoteResourceDriver` proxy is deleted, not
refactored).

Negative:

- One-shot coordination across workflow + DO plugin + 6 application
pin bumps. Mitigated by the `feat/iac-typed-rc1` rc-tag protocol
(rc1 unblocks DO plugin work; final v1.0.0 unblocks pin bumps) and
by the per-PR adversarial review cycle.
- Application repos that pin workflow as a transitive Go module need
one go.mod bump each. Acceptable per cycle 1 I-5 (none of them
import the affected interfaces directly).
- The dual-path option that 2026-04-26 contemplated is gone; future
bug-class surfaces in the IaC contract require a typed-side fix,
not a structpb-side workaround.

Wire-format: PR 2 workflow ships rc1 first (additive); operators can
test plugins against rc1 before final v1.0.0 lands. State-file format
(JSON-serialized `interfaces.ResourceState`) is invariant across the
cutover — only the wfctl <-> plugin gRPC envelope changes.

## Alternatives Rejected

**Continue the additive 2026-04-26 plan for IaC**: rejected because
the bug-cycle data shows the four bug-class surfaces compound and
single-side typing has not prevented the failures (T3.9 + v0.27.1).

**Compat shim / build-tag dual-path**: rejected by the user mandate
(`feedback_force_strict_contracts_no_compat`). A compat shim
preserves the bug-class surface; build-tag dual-path doubles the
maintenance burden without removing any of the four surfaces.

**Cycle 3 deferral until the 14-plugin Module/Step/Trigger migration
finishes**: rejected because IaC bugs are recurring on the live
production v0.27.x line; the deferral cost (more T3.9-class
incidents) exceeds the cost of the coordinated cutover.
160 changes: 160 additions & 0 deletions decisions/0025-iac-optional-method-typed-services-not-bool.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# 0025: IaC optional methods are typed gRPC services, not NotSupported flags or codes.Unimplemented

- **Date:** 2026-05-10
- **Status:** Accepted

## Context

`interfaces.IaCProvider` exposes 11 required methods (`Initialize`,
`Plan`, `Apply`, `Destroy`, …) and a set of OPTIONAL sub-interfaces
that a provider MAY implement: `Enumerator`, `EnumeratorAll`,
`DriftConfigDetector`, `ProviderCredentialRevoker`,
`ProviderMigrationRepairer`, `ProviderValidator`. The wfctl call
sites historically `type-assert` against the optional interface and
treat the negative case as "skip this provider."

The strict-contracts force-cutover (ADR 0024) replaces the legacy
`InvokeService` + `*structpb.Struct` dispatch with typed gRPC
services. The question this ADR settles: **how does the typed
contract surface "optional" capability?**

Three candidate designs were considered:

### Candidate A — single service + `NotSupported bool` per method

```proto
service IaCProvider {
rpc EnumerateAll(EnumerateAllRequest) returns (EnumerateAllResponse);
// …
}
message EnumerateAllResponse {
repeated ResourceOutput outputs = 1;
bool not_supported = 2; // plugin sets when it doesn't implement
}
```

The plugin author embeds an `Unimplemented` server stub for every
RPC; per-method, they choose whether to override. Untouched methods
return `NotSupported=true` from the embed.

**Cycle 1 I-1 finding**: this is the legacy bug class repackaged. The
"is this method supported?" decision shifts from a typed Go
interface (caught at compile time on the plugin side) to a runtime
boolean field (caught at the first call site that observes the
flag). The "stub-everything-and-forget-to-flip-not_supported"
failure mode is identical in shape to the legacy
"forget-to-add-a-case-string-to-the-switch" failure mode.

### Candidate B — single service + `codes.Unimplemented` for unsupported methods

```proto
service IaCProvider { /* same as A */ }
```

The plugin author embeds `Unimplemented*Server` stubs that return
`status.Error(codes.Unimplemented, ...)` from every method by
default. wfctl call sites check `status.FromError` for
`codes.Unimplemented` and treat it as "skip."

**Cycle 1 I-1 finding** (escalated): worse than A. The bug-class
shape is now identical to v0.27.1's
`isPluginMethodUnimplemented` boundary translator: a string match
on the gRPC status, with each call site repeating the boilerplate.
A new optional method ships, the plugin author forgets to override
the embed, every call site that uses it observes
`codes.Unimplemented`, and the system silently skips real work.

### Candidate C — split into REQUIRED + OPTIONAL gRPC services (chosen)

```proto
service IaCProviderRequired {
rpc Initialize(InitializeRequest) returns (InitializeResponse);
// … 11 methods — every plugin MUST implement every one.
}
service IaCProviderEnumerator {
rpc EnumerateAll(EnumerateAllRequest) returns (EnumerateAllResponse);
rpc EnumerateByTag(EnumerateByTagRequest) returns (EnumerateByTagResponse);
}
service IaCProviderDriftDetector { /* … */ }
// … 6 optional services total.
```

The plugin author implements `pb.IaCProviderRequiredServer` (compile-
fail-equivalent error if any method is missing) plus whichever
optional service interfaces match capability. The SDK helper
`RegisterAllIaCProviderServices(grpcSrv, provider)` uses Go type-
assertion to register every typed service the provider satisfies, in
one call.

Plugin authors **cannot half-implement** an optional capability and
forget to register it: if their Go type satisfies the interface, the
SDK auto-registers. wfctl detects "absent capability" via the
service NOT appearing in the `GetContractRegistry` response — the
absence of the service IS the negative signal, no flag in any
response body.

Per cycle 3 I-1 of the design.

## Decision

Adopt **Candidate C**: split the typed IaC contract into one
REQUIRED service (`IaCProviderRequired`, 11 RPCs) plus 6 OPTIONAL
services, one per capability. The plugin SDK auto-registers every
service whose Go interface the provider satisfies via a single helper
call (`sdk.RegisterAllIaCProviderServices`). wfctl detects
"capability absent" by the service not appearing in the
`ContractRegistry` payload — there is NO `NotSupported bool` field
on any response and the typed contract NEVER returns
`codes.Unimplemented` as a "supported by intent" signal.

`ResourceDriver` ships as a separate gRPC service (9 RPCs), also
auto-registered, also discovered via the same `ContractRegistry`
mechanism.

## Consequences

Positive:

- Capability decision moves to compile time on the plugin side: a
Go type that doesn't satisfy `pb.IaCProviderEnumeratorServer`
cannot be auto-registered as an enumerator. The plugin author
cannot stub-and-forget.
- Capability discovery on the wfctl side is a single grep:
`descriptor.ServiceName == "workflow.plugin.external.iac.IaCProviderEnumerator"`.
No status-code string matching, no per-method response-flag
checking.
- Adding a new optional capability later is purely additive: append
a service to `iac.proto`, add the `if v, ok := provider.(...)`
branch to `RegisterAllIaCProviderServices`, and the existing
plugins continue to work unchanged (their type doesn't satisfy
the new interface, so registration skips).

Negative:

- Six small gRPC services rather than one large one — slightly more
proto surface, slightly more generated code, slightly more
service descriptors emitted in `ContractRegistry`. Cycle 2 I-1's
belt-and-braces wftest BDD test
(`AssertProviderCapabilitiesMatchRegistration`) catches the rare
case where a plugin author ignores the auto-register helper and
manually picks a subset of services to register (advanced
use case).
- The split makes the existing Go interface naming (`Enumerator`
vs `EnumeratorAll`) non-1:1 with the proto naming
(`IaCProviderEnumerator` covers both methods). Documented in
`iac.proto` comments; no functional impact.

## Alternatives Rejected

**Candidate A (NotSupported flag)**: rejected per cycle 1 I-1 — same
bug-class shape as the legacy `InvokeService` switch dispatcher.

**Candidate B (codes.Unimplemented)**: rejected per cycle 1 I-1 +
v0.27.1 evidence — string-matching `Unimplemented` at the boundary
is the same boilerplate the cutover is trying to delete.

**One service with method-level type-assert in the SDK**: rejected
because the plugin author can register a partial server (some
methods bound, others stubbed via embed); the bug class returns at
the first call site that hits a stubbed method. The service-level
split forces the decision at registration time, not at call time.
Loading
Loading