Skip to content

Single-state-per-solution + cloud-only status outputs map + gapp deploy --ref flag #32

@krisrowe

Description

@krisrowe

Tier 2: Single-state-per-solution + cloud-only status + gapp deploy --ref

Summary

Three coupled changes to the build/state/status path:

  1. Drop workspace-mode TF state partitioning. One TF state per solution, always.
  2. Status reads service URLs from a TF outputs map (service_urls), not a single service_url string.
  3. Add gapp deploy --ref=<ref> to override the local-repo build ref.

Background

Today's behavior

Workspace-mode TF state. When a solution declares paths: in gapp.yaml (multi-service workspace mode), each sub-service's TF state is partitioned at gs://gapp-{solution}-{project}/terraform/state/{svc-name}/. Single-service solutions write to gs://gapp-{solution}-{project}/terraform/state/. Promoting a single-service solution to multi-service requires manually moving state objects between prefixes — not a seamless transition.

Status output read. core.py status() reads service_url (string) from terraform output -json. For multi-service it iterates a list constructed from paths: in the local manifest and reads each sub-state separately. This forces status to load + validate the local manifest before any cloud read can complete.

Deploy ref. gapp deploy builds from git archive HEAD of the local repo. No CLI override exists for selecting a different local ref. A clean working tree is required.

Why this is wrong

The "solution" abstraction is meant to be a single deployable unit:

  • One bucket: gapp-{solution}-{project}
  • One project label: gapp_<owner>_<solution>=v-N
  • One TF state — but workspace mode partitions this ✗

If state is per-service, "solution" becomes a docstring with no resources at the solution level, and the bucket is the only shared resource. Inconsistent middle ground.

Best practice (Terraform community consensus): one state per "deployment unit" — the smallest thing typically applied/destroyed together. Split state only when you need:

  • Independent lifecycles (apply A without touching B)
  • Blast radius isolation (a bad apply on A doesn't risk B)
  • Permission boundaries (different humans manage different services)

For the typical user (single developer, single repo, services that ship together), none of those apply. Splitting state imposes complexity (multi-init, multi-plan, drift across multiple states, migrations between modes) for zero benefit. Per-service SAs, IAM, and secrets are already isolated at the resource level inside one state.

Proposed Changes

1. Single state per solution, always

  • TF state always lives at gs://gapp-{solution}-{project}/terraform/state/
  • Multiple services in a solution are TF resources/modules within that one state
  • The workspace-mode state-prefix branching in core.py deploy and status paths is deleted
  • The paths: field in gapp.yaml retains its meaning (multiple services) but no longer changes state location

2. Status reads service URLs from outputs map

TF templates emit service_urls (map) instead of service_url (string):

output "service_urls" {
  value = { for s in var.services : s.name => google_cloud_run_service.svc[s.name].status[0].url }
}

Single-service solutions emit a 1-entry map keyed by the solution name. This produces one code path in status:

  1. Resolve solution → resolve project (cloud read)
  2. terraform output -json → get service_urls map
  3. Iterate, probe /health per service

Status no longer needs the local manifest. The bucket and TF state are the source of truth for what the solution actually has deployed.

Backward compat. During the transition, status reads service_urls (map) first, falls back to service_url (string) for older deployments. After all live deployments have been re-deployed under the new template, the fallback can be removed.

3. gapp deploy --ref=<ref> flag

Single coherent semantic: --ref selects which ref of the local repo to use as the build context for any service drawing from it.

Service shape What --ref=v1.0.0 does
Singular service: (no source: block) Build local repo at v1.0.0 instead of HEAD
Singular service: with remote source: Refuses — no local code involved, flag has nothing to apply to
Plural services:, all local-code entries Build local repo at v1.0.0 once; all entries share that build context
Plural services: mixed (some local, some remote) Local entries built from v1.0.0; remote entries keep their pinned source.ref
Plural services:, all remote Refuses — no local code involved

Remote-sourced services have their own pinned source.ref in gapp.yaml (see the composition issue). CLI --ref does not override them. To bump a remote pin, edit gapp.yaml and commit.

Dirty-tree gate.

Scenario Refuse on dirty tree?
gapp deploy (any local-code service) Yes — accidental risk
gapp deploy --ref=v1.0.0 No — explicit ref means local working state irrelevant
gapp deploy for a solution with all remote sources N/A — local tree doesn't feed any build

v-3 Contract Preservation

None of these changes intrinsically requires a contract bump:

  • Label shape unchanged
  • Bucket name shape unchanged
  • gapp-env semantics unchanged
  • Reserved env names unchanged

The TF outputs map is internal — outputs are not part of the contract. Workspace state collapse changes state-layout semantics, but the workspace-mode prefix layout is documented as deprecated rather than treated as a contract change.

If the new TF template emits different terraform resource addresses for an equivalent single-service deploy, existing TF state stops resolving and terraform apply would want to destroy + recreate. Mitigation, in priority order:

  1. Preserve addresses. Engineer the template so single-service resource addresses are byte-identical to the v-3 template. CI runs terraform plan against a snapshotted v-3 state and asserts zero changes.
  2. moved {} blocks (Terraform 1.1+). Declare address renames inside the template; TF auto-migrates state references on next apply, no manual state mv, no contract bump. This is the strongest escape hatch and almost always sufficient.
  3. Targeted terraform state mv. For solutions already deployed, surgically rename resources in state to match new addresses. Tractable for small N.
  4. Template carve-out. Keep the existing v-3 template path active for solutions already deployed under it; new composed solutions use the new template. Two paths, single contract.
  5. Defer or scope-cut the part of the refactor that would change addresses.

v-4 is not on the table without a separate design discussion. Only after exhausting (1)–(5) does a contract bump enter the conversation, and only with explicit owner sign-off.

Work Breakdown

  • Drop workspace-mode state-prefix branching in core.py deploy and status paths
  • Update TF templates to emit service_urls map output (single-entry for single-service solutions)
  • Status reads service_urls map; falls back to service_url string for legacy state
  • Add --ref flag to gapp deploy CLI with the rules table above
  • Refuse --ref when no service uses local code; clear error message
  • Loosen the dirty-tree gate when --ref is given
  • Use moved {} blocks in the new TF template if any resource addresses change
  • CI: terraform plan against a v-3-shape state must show zero diff for an unchanged single-service solution
  • Tests: state-shape-collapse for single-service deploys, multi-service via TF resources within one state, status reads outputs map, --ref flag honored for local-code services, --ref refused for all-remote solutions, dirty-tree gate relaxed under --ref
  • Update CONTRIBUTING.md: document one-state-per-solution as the contract, document workspace-mode prefixes as deprecated, capture the v-4 escalation rules above as the standing policy
  • Update the deploy skill: describe --ref flag, dirty-tree relaxation, and the single-state-per-solution contract
  • Capture the terraform moved {} mitigation pattern in CONTRIBUTING.md as the standard approach when a template needs to rename resource addresses
  • Capture state-collapse migration policy in CONTRIBUTING.md: workspace-prefix-layout deployments cannot be applied under the new template without a one-time terraform state mv; document the exact commands

Out of Scope

  • Per-service --ref syntax (e.g., --ref worker=v2.0.0). Multi-service builds with mixed refs require editing gapp.yaml.
  • Composition (services: plural + remote source:). See the composition issue.
  • Changing the contract major to v-4.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions