Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 44 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -395,12 +395,54 @@ jobs:
! grep -qH "digitalocean/godo" go.mod example/go.mod

cloud-sdk-audit:
name: Cloud-SDK inventory + k8s-backend init() partition audit
name: Cloud-SDK inventory + k8s-backend init() partition + asymmetric graph audit
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Audit cloud-SDK imports + init() partition
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
cache: true
- name: Download module deps (needed for `go list -deps`)
env:
GOWORK: "off"
run: go mod download
- name: Audit cloud-SDK imports + init() partition + Phase C asymmetric gate
# The script enforces (once .phase-c-complete is armed):
# - module/ zero gcp/azure SDK real imports
# - build graph: zero Azure/azure-sdk-for-go, zero google.golang.org/api,
# zero cloud.google.com/go except compute/metadata (OAuth2 ADC helper
# legitimately pulled by provider/gcp's service-account auth).
run: bash scripts/audit-cloud-symbols.sh --check
- name: Standalone asymmetric gate — `go list -deps` (mirrors audit script)
# Independent assertion of the same invariant for clarity in CI failure logs.
# Any cloud.google.com/go path other than compute/metadata fails the build.
env:
GOWORK: "off"
run: |
set +e
DEPS=$(go list -deps ./... 2>&1)
LIST_EXIT=$?
set -e
if [ $LIST_EXIT -ne 0 ]; then
echo "FAIL: \`go list -deps ./...\` exited $LIST_EXIT (gate cannot enforce):"
echo "$DEPS" | head -10 | sed 's/^/ /'
exit 1
fi
# `|| true` on grep is fine: exit 1 from grep means "no matches" = success.
UNEXPECTED=$(echo "$DEPS" \
| grep -E '^(cloud\.google\.com/go|google\.golang\.org/api|github\.com/Azure/azure-sdk-for-go)' \
| grep -v '^cloud\.google\.com/go/compute/metadata$' \
|| true)
if [ -n "$UNEXPECTED" ]; then
echo "FAIL: unexpected gcp/azure/api transitive deps in workflow-core build graph:"
printf ' %s\n' $UNEXPECTED
echo
echo "Only cloud.google.com/go/compute/metadata is allowlisted (OAuth2 ADC helper)."
echo "Other gcp/azure SDK packages belong in a plugin, not workflow core."
echo "See docs/migrations/2026-05-15-plugin-modules-on-iac.md (Phase C)."
exit 1
fi

aws-sdk-banned:
name: Verify removed AWS SDK packages are not imported (issue #653)
Expand Down
Empty file added .phase-c-complete
Empty file.
3 changes: 1 addition & 2 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -502,7 +502,7 @@ steps. See [v0.52.0 migration guide](docs/migrations/v0.52.0-godo-removal.md).
Use the generic `infra.*` module types with `provider: aws` and `step.iac_*` pipeline steps.
See [v0.53.0 migration guide](docs/migrations/v0.53.0-aws-iac-removal.md).
| `iac.provider` | Cloud provider configuration (aws, gcp, azure, digitalocean) for IaC operations | platform |
| `iac.state` | IaC state persistence (memory, filesystem, spaces, gcs, azure_blob, postgres) | platform |
| `iac.state` | IaC state persistence (memory + filesystem + postgres in-core; spaces / s3 / gcs / azure_blob via plugins) | platform |
| `infra.vpc` | Virtual Private Cloud and subnet management | platform |
| `infra.database` | Managed database instance provisioning and configuration | platform |
| `infra.cache` | In-memory cache cluster provisioning (Redis, Memcached) | platform |
Expand Down Expand Up @@ -536,7 +536,6 @@ See [v0.53.0 migration guide](docs/migrations/v0.53.0-aws-iac-removal.md).
### Storage
| Type | Description | Plugin |
|------|-------------|--------|
| `storage.gcs` | Google Cloud Storage | storage |
| `storage.local` | Local filesystem storage | storage |
| `storage.sqlite` | SQLite storage | storage |
| `storage.artifact` | Artifact store for build artifacts shared across pipeline steps | storage |
Expand Down
5 changes: 2 additions & 3 deletions cmd/wfctl/infra_state_store.go
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,8 @@ func resolveStateStore(cfgFile, envName string) (infraStateStore, error) {
"install and load the plugin to use the S3 backend (wfctl direct-path commands no longer support in-tree s3)", backend)

case "gcs":
return nil, fmt.Errorf("gcs state store backend not yet supported by wfctl direct-path commands; " +
"create the bucket manually and reference it in iac.state.bucket. " +
"Contribute a resolveGCSStateStore helper to unblock this")
return nil, fmt.Errorf("iac.state backend %q is now plugin-served by workflow-plugin-gcp v1.1.0; "+
"install and load the plugin to use the GCS backend (wfctl direct-path commands no longer support in-tree gcs)", backend)

case "azure":
return nil, fmt.Errorf("azure state store backend not yet supported by wfctl direct-path commands; " +
Expand Down
182 changes: 158 additions & 24 deletions docs/migrations/2026-05-15-plugin-modules-on-iac.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,25 @@
# 2026-05-15 — Plugin-modules-on-IaC: Phase B clean break
# 2026-05-15 — Plugin-modules-on-IaC: Phase B + Phase C clean break

This migration covers **Phase B** of the
[plugin-modules-on-IaC plan](../plans/2026-05-15-plugin-modules-on-iac.md):
workflow-core sheds the remaining in-core AWS/DO storage + state surfaces and
the SDK-bearing AWS credential resolvers. Each surface is now plugin-native.
This migration covers **Phase B** (AWS / DigitalOcean) and **Phase C** (GCP)
of the
[plugin-modules-on-IaC plan](../plans/2026-05-15-plugin-modules-on-iac.md).
Workflow-core sheds the remaining in-core cloud-SDK-bearing surfaces:
S3/Spaces/GCS IaC state stores, `storage.s3` + `storage.gcs` modules,
`step.s3_upload`, the in-core `gkeBackend`, and the SDK-bearing AWS
profile/role_arn credential resolvers. Each surface is now plugin-native.

The companion **Phase C** migration (GCP) follows in a separate PR; this doc is
amended in-place when that ships.
Phase B shipped in PR `feat/phase-b-core-deletion`; Phase C in
`feat/phase-c-core-deletion`. Engine + plugin versioning is covered below.

## Engine floor

Phase B requires **workflow `>= v0.53.0`** in any deployment that uses the
Both phases require **workflow `>= v0.53.0`** in any deployment that uses the
affected backends. The `>= v0.53.0` engine has the typed `IaCStateBackend`
gRPC contract (Phase A, decisions/0036), the `Configure` RPC that delivers the
`iac.state` module YAML to the plugin, and the plugin-backend registry that
`IaCModule.Init` consults in its `default:`-arm.
`IaCModule.Init` consults in its `default:`-arm. Phase C additionally relies on
the `grpcKubernetesBackend` adapter + plugin-backend registry shipped in PR
`#681` (ADR 0037) for the `platform.kubernetes type: gke` resolution path.

## What changed

Expand All @@ -25,9 +30,17 @@ gRPC contract (Phase A, decisions/0036), the `Configure` RPC that delivers the
| `storage.s3` module | in-core `module.S3Storage` (registered by `plugins/storage`) | plugin-native in `workflow-plugin-aws >= v1.1.0` |
| `step.s3_upload` pipeline step | in-core `module.S3UploadStep` (registered by `plugins/pipelinesteps`) | plugin-native in `workflow-plugin-aws >= v1.1.0` |
| `cloud.account` `provider: aws` + `credentials.type: profile` or `role_arn` | SDK-bearing resolver loaded the profile / called `sts:AssumeRole` in-core | core records a `credential_source` marker only; the aws plugin performs SDK resolution via `awscreds.BuildAWSConfig` (decisions/0036 + 0038) |
| `iac.state` `backend: gcs` | in-core `module.GCSIaCStateStore` (via `cloud.google.com/go/storage`) | plugin-served by [`workflow-plugin-gcp`](https://github.com/GoCodeAlone/workflow-plugin-gcp) `>= v1.1.0` |
| `storage.gcs` module | in-core `module.GCSStorage` (registered by `plugins/storage`) | plugin-native in `workflow-plugin-gcp >= v1.1.0` |
| `platform.kubernetes` `type: gke` | in-core `gkeBackend` (via `google.golang.org/api/container/v1`) | plugin-served by `workflow-plugin-gcp >= v1.1.0`; routed through the `grpcKubernetesBackend` adapter (ADR 0037) |

The YAML field names and `backend:` values are **unchanged**. The break is
strictly about *which binary* serves them.
The YAML field names and `backend:` / `type:` / `provider:` values are
**unchanged**. The break is strictly about *which binary* serves them.

`platform.kubernetes type: kind`, `k3s`, `eks`, and `aks` stay in core
(kind/k3s are in-memory test backends; eks is an actionable-error stub
pointing at `workflow-plugin-aws`; aks uses Azure REST + OAuth2 with no
Azure-SDK import — see Phase A's `cloud_account_azure.go` rewrite).

## Why

Expand All @@ -51,10 +64,11 @@ Without the plugin, `IaCModule.Init` fails fast:

```
iac.state "<name>": backend "spaces" is not built into workflow core
(in-core backends: 'memory', 'filesystem', 'gcs', 'postgres').
(in-core backends: 'memory', 'filesystem', 'postgres').
If "spaces" is a plugin-provided backend (e.g. 'azure_blob' via
workflow-plugin-azure, 'spaces' via workflow-plugin-digitalocean,
's3' via workflow-plugin-aws), install and load that plugin
's3' via workflow-plugin-aws, 'gcs' via workflow-plugin-gcp),
install and load that plugin
```

### `iac.state backend: s3`
Expand Down Expand Up @@ -104,32 +118,152 @@ warning is what tells operators which side to upgrade.
`credentials.type: static` and `credentials.type: env` are unaffected — those
paths have always been SDK-free and resolve in-core.

### `iac.state backend: gcs`

Load `workflow-plugin-gcp >= v1.1.0`. The YAML `backend: gcs` value and all
config keys (`bucket`, `prefix`, plus any GCP credential config) keep their
semantics. Application Default Credentials and service-account JSON resolution
still work — they just happen in the plugin process now.

Without the plugin, `IaCModule.Init` returns the same actionable error as the
spaces/s3 cases (in-core backends list now `'memory', 'filesystem',
'postgres'`; plugin examples list includes `'gcs' via workflow-plugin-gcp`).
The wfctl direct-path commands (`wfctl infra ...`) return the same shape:

```
iac.state backend "gcs" is now plugin-served by workflow-plugin-gcp v1.1.0;
install and load the plugin to use the GCS backend (wfctl direct-path
commands no longer support in-tree gcs)
```

### `storage.gcs` module

Moves into `workflow-plugin-gcp >= v1.1.0`. Same shape as `storage.s3`:
credentials inline or referenced via `credentials_ref:` pointing at a
`gcp.credentials` module loaded by the plugin. With no plugin loaded the
module type is unknown at engine boot — load the plugin in the deployment's
plugin manifest.

### `platform.kubernetes type: gke`

The in-core `gkeBackend` (which spoke directly to
`google.golang.org/api/container/v1`) is removed. The `type: gke` dispatch
now flows through the `kubernetesBackendClientRegistry` populated at
plugin-load time by `workflow-plugin-gcp >= v1.1.0`, routed via the
`grpcKubernetesBackend` adapter shipped in PR `#681` per
[ADR 0037](../../decisions/0037-gke-cross-process-contract.md).

The YAML `type: gke` value is unchanged. All cluster-level config keys
(`project`, `location`/`zone`, `version`, `nodeGroups`, …) keep their
semantics; the plugin's `GKEDriver.Read` conforms its output to the same
status/endpoint keys the in-core `gkeBackend` produced.

Without the plugin, `PlatformKubernetes.Init` fails fast and the error
message identifies `workflow-plugin-gcp` as the missing plugin (same shape
as the iac.state error above).

## OAuth2 ADC allowlist disclosure

Workflow core's `provider/gcp/` package retains
`golang.org/x/oauth2/google` for its service-account credential resolution
(`google.Credentials`, `FindDefaultCredentials`,
`CredentialsFromJSONWithTypeAndParams`). That import transitively pulls
**`cloud.google.com/go/compute/metadata`** — the OAuth2 Application Default
Credentials helper used to fetch tokens from the GCE/GKE metadata server.

The Phase C asymmetric audit gate (in
[`scripts/audit-cloud-symbols.sh`](../../scripts/audit-cloud-symbols.sh)) and
the mirroring `.github/workflows/ci.yml` `cloud-sdk-audit` job **allowlist
this single transitive path** and **fail CI on any other** `cloud.google.com/go/*`
dep. Any new GCP SDK package (e.g. `cloud.google.com/go/storage`,
`google.golang.org/api/*`) belongs in `workflow-plugin-gcp`, not core.

This is the GCP-side mirror of Phase B's `aws-sdk-go-v2`-retention paragraph:
`provider/aws/` legitimately uses the AWS SDK for its deploy pipeline,
`provider/gcp/` legitimately uses OAuth2 ADC for service-account auth, and
both arrangements are intentional — the audit gate just guards against
scope creep beyond those known seams.

## Rollback

Phase B's clean-breaks roll back only as a **matched pair** with the plugin
releases that serve them — reverting PR `feat/phase-b-core-deletion`
restores the in-core paths, but the plugin v1.1.0 tags are immutable. A
patch-level defect in either plugin port is resolved with a `v1.1.1`
release, not by re-introducing the in-core implementation.
Both Phase B and Phase C clean-breaks roll back only as a **matched pair**
with the plugin releases that serve them — reverting PR
`feat/phase-b-core-deletion` or `feat/phase-c-core-deletion` restores the
in-core paths, but the corresponding plugin v1.1.0 tags are immutable. A
patch-level defect in any plugin port is resolved with a `v1.1.1` release,
not by re-introducing the in-core implementation.

A running deployment that has already cut over to plugin-served `gcs` /
`gke` must coordinate engine + plugin versions on rollback — pinning both
sides to a pre-Phase-C state in the deploy manifest.

The `cloud_account_aws.go` deletion (164 lines of dead code that #653 had
already orphaned) is not part of the matched-pair rollback — it had zero
non-test consumers.

## Verification

Once Phase B is merged:
### Phase B (post-merge)

- `go mod tidy` against the merged tree should make no net change to AWS SDK
- `go mod tidy` against the merged tree makes no net change to AWS SDK
service modules — `aws-sdk-go-v2` stays in `go.mod` because `provider/aws/`,
`plugin/rbac/aws.go`, `iam/aws.go`, and `artifact/s3.go` still import it.
- The `.phase-b-complete` marker arms
`scripts/audit-cloud-symbols.sh --check`'s zero-`aws-sdk-go-v2` invariant on
`module/cloud_account_aws_creds.go`. Running the audit script post-merge
must report `audit-cloud-symbols: OK`.
`module/cloud_account_aws_creds.go`.

### Phase C (post-merge)

- `go mod tidy` drops `cloud.google.com/go/storage`, `google.golang.org/api`,
`cloud.google.com/go/auth*`, `cloud.google.com/go/monitoring`,
`cloud.google.com/go/iam`, and `GoogleCloudPlatform/opentelemetry-operations-go/*`
(~24 lines). `cloud.google.com/go/compute/metadata` remains as the only
`cloud.google.com/go/*` entry (allowlisted, see disclosure above).
- The `.phase-c-complete` marker arms two additional `--check` invariants:
- `module/` has **zero real imports** of `cloud.google.com/go`,
`google.golang.org/api`, or `github.com/Azure/azure-sdk-for-go`.
- The whole-repo build graph (`go list -deps ./...`) has zero
`Azure/azure-sdk-for-go`, zero `google.golang.org/api`, and zero
`cloud.google.com/go/*` **except** `compute/metadata`.

### Cross-phase invariant re-check

Run from a checkout of the merged tree:

```bash
bash scripts/audit-cloud-symbols.sh --check # → audit-cloud-symbols: OK
GOWORK=off go list -deps ./... \
| grep '^cloud\.google\.com/go' \
| grep -v '^cloud\.google\.com/go/compute/metadata$' # → empty
GOWORK=off go list -deps ./... \
| grep -E '^(google\.golang\.org/api|github\.com/Azure/azure-sdk-for-go)' # → empty
GOWORK=off go build ./... && GOWORK=off go test ./... # → all green
```

Phase A's invariants (typed `IaCStateBackend` contract, `Configure` RPC) are
re-validated by the same audit run since `module/` is the scope they protect.

## Phase recap

| Phase | What | PRs | ADRs |
|---|---|---|---|
| **A** | Typed `IaCStateBackend` gRPC contract; `Configure` RPC; plugin-backend registry; `azure_blob` → workflow-plugin-azure v1.1.0 | plan-1 PRs 1–3; locked B/C/D plan PRs 1–2 | 0035, 0036 |
| **B** | In-core `iac_state_spaces`, `s3_storage`, `pipeline_step_s3_upload` deletion; SDK-free AWS profile/role_arn resolvers with `credential_source` markers; `cloud_account_aws.go` (dead) deletion; `aws-sdk-go-v2` *retained* in `go.mod` for `provider/aws/` et al. | `feat/phase-b-core-deletion` (PR `#687`) | 0034, 0038 |
| **C** | In-core `iac_state_gcs`, `storage_gcs`, `platform_kubernetes_gke` deletion; GCP SDKs dropped from `go.mod` (one allowlisted OAuth2 ADC transitive); permanent asymmetric audit + CI gate; wfctl gcs/s3/spaces actionable errors | `feat/phase-c-core-deletion` | 0037, 0039 (TBD — captures the gate-allowlist trade-off; follow-up) |

Plan-1 and plan-2 manifests + per-task spec records live under
`docs/plans/` (`2026-05-14-cloud-sdk-extraction-bcd.md`,
`2026-05-15-plugin-modules-on-iac.md`).

**Final invariant statement:** workflow-core now imports zero cloud-provider
SDK clients in `module/`; provider-specific surfaces (`provider/aws/`,
`provider/gcp/`'s OAuth2-only path) retain only what's needed for the
out-of-scope deploy-pipeline / credential-resolution work that #653 +
decisions/0034 explicitly carve out. Every other cloud-provider integration
crosses the engine ↔ plugin gRPC boundary.

## Related design + plans

- Plan: [docs/plans/2026-05-15-plugin-modules-on-iac.md](../plans/2026-05-15-plugin-modules-on-iac.md)
- Decisions: 0034 (autonomous plugin releases), 0035 (assumed-seam grep), 0036 (Configure RPC), 0038 (credential_source marker)
- Plans: [2026-05-14 cloud-SDK extraction (B/C/D)](../plans/2026-05-14-cloud-sdk-extraction-bcd.md), [2026-05-15 plugin-modules-on-iac](../plans/2026-05-15-plugin-modules-on-iac.md)
- Decisions: 0034 (autonomous plugin releases), 0035 (assumed-seam grep / real-import audit), 0036 (Configure RPC), 0037 (GKE cross-process contract — ResourceDriver fold), 0038 (credential_source marker), 0039 (TBD — asymmetric audit gate + compute/metadata allowlist trade-off; follow-up filing)
- Predecessors: [v0.52.0 godo removal](v0.52.0-godo-removal.md), [v0.53.0 AWS IaC removal](v0.53.0-aws-iac-removal.md), [2026-05-14 azure plugin extraction](2026-05-14-cloud-sdk-extraction.md)
Loading
Loading