From b390e5cd89ce65f381fe512984eaac0d7f083382 Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Mon, 4 May 2026 16:23:25 -0700 Subject: [PATCH 01/12] handoff snapshot 2026-05-04_16:23 --- README.md | 62 +++++--- docs/completed_sprints.md | 7 + docs/paved-road.md | 306 ++++++++++++++++++++++++++++++++++++++ docs/remaining_sprints.md | 77 ++++++++++ 4 files changed, 430 insertions(+), 22 deletions(-) create mode 100644 docs/completed_sprints.md create mode 100644 docs/paved-road.md diff --git a/README.md b/README.md index c91fda4..58ce082 100644 --- a/README.md +++ b/README.md @@ -90,21 +90,32 @@ Wet-bulb fail-open interlocks, critical-asset tagging, and human-in-the-loop gat ## 5. Local Development -> **Status:** Sprint 1 in progress. Commands below are the *target* state and will be live as sprints land. +> **Status:** Sprint 1 complete — paved road shipped, no traffic on it yet. The stub service exists to prove the chart and reusable workflow function end-to-end. Sprint 2 brings the producer/consumer that actually exercises the road. ### Prerequisites -- Python 3.11+ (managed via `uv` or `poetry`) -- Docker & Docker Compose -- Kind or Minikube +- Python 3.11+ via [`uv`](https://docs.astral.sh/uv/) +- Docker +- [Kind](https://kind.sigs.k8s.io/) (Kubernetes-in-Docker) - Helm 3.0+ -### Quickstart (post-Sprint 1) +### Quickstart + +```bash +make setup # uv sync + install pre-commit hooks +make infra-up # Start the local Kind cluster +make deploy-local # Build the stub container, kind-load it, helm-install +``` + +Then: + ```bash -make setup # Install deps and pre-commit hooks -make infra-up # Start Kafka, Schema Registry, Prometheus/Grafana via Docker Compose -make deploy-local # Helm-install the reference service into Kind +kubectl get pods -l app.kubernetes.io/instance=standard-service-stub +kubectl port-forward svc/standard-service-stub 8000:80 +curl http://localhost:8000/healthz ``` +`make help` lists every target. For the full adoption walkthrough, see [`docs/paved-road.md`](./docs/paved-road.md). + --- ## 6. Repository Structure @@ -112,27 +123,34 @@ make deploy-local # Helm-install the reference service into Kind ``` . ├── .github/ -│ ├── workflows/ # Reusable CI workflow templates (Sprint 1) -│ ├── CODEOWNERS -│ ├── PULL_REQUEST_TEMPLATE.md -│ └── ISSUE_TEMPLATE/ -├── charts/ # Paved-road Helm chart (Sprint 1) +│ └── workflows/ +│ ├── standard-python-service.yml # Reusable – paved road's CI +│ └── ci.yml # GridStream's caller +├── charts/ +│ └── standard-service/ # Paved-road Helm chart ├── docs/ │ ├── ARCHITECTURE.md │ ├── CONTEXT.md │ ├── CONTRIBUTING.md │ ├── DISCOVERY.md -│ ├── adr/ # Architectural Decision Records +│ ├── paved-road.md # 10-min adoption tutorial +│ ├── adr/ # ADRs │ └── remaining_sprints.md +├── packages/ # uv workspace members (ADR-0010) +│ ├── standard-service-stub/ # Sprint 1 — the paved road's first traveler +│ ├── producer/ # Sprint 2 +│ ├── consumer/ # Sprint 2 +│ └── models/ # Sprint 2 — shared Pydantic + Avro models +├── kind/ +│ └── cluster.yaml # Local cluster config ├── infra/ -│ └── terraform/ # AWS modules (Sprint 5, stubbed) -├── schemas/ # Avro contracts (Sprint 2) -├── src/ -│ ├── producer/ # (Sprint 2) -│ └── consumer/ # (Sprint 2) -├── data/ # Sample energy CSVs -├── scripts/ # scaffold-a-service, etc. (Sprint 4) -└── Makefile +│ └── terraform/ # AWS modules (Sprint 5, stubbed) +├── schemas/ # Avro contracts (Sprint 2) +├── data/ # Sample energy CSVs (Sprint 2) +├── scripts/ # scaffold-a-service, etc. (Sprint 4) +├── Makefile +├── pyproject.toml # Workspace root (virtual; ADR-0010) +└── .pre-commit-config.yaml ``` --- diff --git a/docs/completed_sprints.md b/docs/completed_sprints.md new file mode 100644 index 0000000..b366d8d --- /dev/null +++ b/docs/completed_sprints.md @@ -0,0 +1,7 @@ +# GridStream Completed Sprints + +In reverse chronological order. + +--- + +# Sprint 0: Discovery, Requirements, Architecture, Planning - DONE 4/29/2026 11:15a diff --git a/docs/paved-road.md b/docs/paved-road.md new file mode 100644 index 0000000..72779b6 --- /dev/null +++ b/docs/paved-road.md @@ -0,0 +1,306 @@ +# Adopting the GridStream Paved Road + +> *A ten-minute tutorial. Read this; come out the other side with a service +> that builds in CI, deploys via Helm, and inherits GridStream's defaults +> for security, probes, and labels.* + +The paved road is what GridStream's platform team offers to any application +team that wants standardized deployment without inventing it themselves. It +is **opt-in** and **incremental** — the four stages from +[ADR-0006](./adr/0006-gitops-adoption-path.md) are designed so each one +delivers value on its own, and a team can pause between any two of them +without losing what they already gained. + +This tutorial walks Stages 1 through 3. Stage 4 (ArgoCD) lands in Sprint 3. +[SPRINT-3-CLEANUP] + +--- + +## What you get + +| Stage | What you adopt | Time | What you gain | +| --- | --- | --- | --- | +| 1 | A standardized `Makefile` | ~2 hrs | One command to build/test/deploy. Deploy commands stop being tribal knowledge. | +| 2 | The reusable CI workflow | ~3 hrs | Lint, type-check, test, container build — all standardized, all updated centrally. | +| 3 | The `standard-service` Helm chart | ~1 day | K8s deployment with paved-road defaults. Probes, security context, labels: solved. | +| 4 | ArgoCD `Application` manifest | ~1 day | Pull-based GitOps. Cluster state matches Git. *Sprint 3.* [SPRINT-3-CLEANUP] | + +--- + +## The 10-minute version + +```bash +# 0. Prerequisites +# Python 3.11+, Docker, kind, helm 3, uv (https://docs.astral.sh/uv/) + +# 1. Clone, set up, run +git clone https://github.com/sooperD00/gridstream.git +cd gridstream +make setup +make infra-up +make deploy-local + +# 2. Hit the deployed stub +kubectl port-forward svc/standard-service-stub 8000:80 & +curl http://localhost:8000/healthz +curl -X POST http://localhost:8000/echo \ + -H 'content-type: application/json' \ + -d '{"message": "first traveler"}' +``` + +If those curls return 200, the paved road works. Now read the rest of this +document to understand *what* you're inheriting and *how to adopt it for +your own service*. + +--- + +## Stage 1 — The Makefile + +Every paved-road repo gets the same target names. Adopters copy the +`Makefile` from this repo, adjust the `STUB_*` variables to their service +name, and they're done. The named targets: + +| Target | What it does | +| --- | --- | +| `make setup` | `uv sync` + install pre-commit hooks | +| `make lint` | Ruff check + format check | +| `make typecheck` | Mypy strict | +| `make test` | Pytest with the 80% coverage gate | +| `make build` | Distroless production container ([ADR-0009](./adr/0009-container-base-image.md)) | +| `make build-dev` | Slim local-dev container — **not for production** | +| `make infra-up` / `make infra-down` | Local Kind cluster lifecycle | +| `make deploy-local` | Build, kind-load, helm-install | +| `make help` | The list you're reading | + +The point isn't the targets themselves — it's that *every paved-road repo +has the same ones*. A platform engineer onboarding a new team doesn't have +to learn that team's bespoke deploy script. The friction of cross-team +context-switching collapses. + +--- + +## Stage 2 — The reusable CI workflow + +The shared workflow lives at +[`.github/workflows/standard-python-service.yml`](../.github/workflows/standard-python-service.yml) +in this repo. Adopters call it from their own CI: + +```yaml +# .github/workflows/ci.yml in your repo +name: CI +on: [push, pull_request] + +jobs: + ci: + # requires v1 tag — coming once Sprint 1 ships post-verification [SPRINT-1-CLEANUP] + uses: sooperD00/gridstream/.github/workflows/standard-python-service.yml@v1 + with: + image-name: my-team-service +``` + +That's it. The workflow does: + +1. `uv sync` — install workspace deps (or your single-project deps). +2. `ruff check` and `ruff format --check`. +3. `mypy src`. +4. `pytest tests` with `--cov-fail-under=80`. +5. `docker buildx build` — container build (no push yet; see the registry-push + TODO in the workflow file and [ADR-0006](./adr/0006-gitops-adoption-path.md) + for why push lands in Sprint 4 or 5). + +Inputs you can override: + +| Input | Default | When to set | +| --- | --- | --- | +| `python-version` | `"3.11"` | If you've moved to 3.12+. | +| `coverage-threshold` | `80` | Almost never — argue for the change at platform review. | +| `image-name` | (required) | Always. | +| `working-directory` | `"."` | If your service isn't at repo root. | +| `dockerfile-context` / `dockerfile-path` | (default to working-directory) | Workspace-aware builds. Most adopters don't need these. | + +When the platform team upgrades the workflow (adds a new lint, tightens a +config, fixes a bug), every adopter's next CI run picks it up. The whole +point of `uses:` versus copy-paste. + +--- + +## Stage 3 — The `standard-service` chart + +The chart at [`charts/standard-service/`](../charts/standard-service/) +parameterizes deployment of a Python HTTP service. Read the chart's +[README](../charts/standard-service/README.md) for the values contract; +the short version: + +```yaml +# values-myservice.yaml +app: + name: my-service +image: + repository: ghcr.io/myorg/my-service + tag: "1.4.2" +config: + LOG_LEVEL: INFO +deployment: + replicas: 3 +resources: + requests: { cpu: 200m, memory: 256Mi } + limits: { cpu: 1000m, memory: 1Gi } +``` + +```bash +helm upgrade --install my-service oci://ghcr.io/sooperD00/charts/standard-service \ + --version 0.1.0 \ + -f values-myservice.yaml +``` + +What you inherit without configuring: + +- Liveness `/healthz` and readiness `/readyz` probes — your service implements + these endpoints; the chart wires the probes. +- Pod-level security context (`runAsNonRoot`, `nonroot` UID, `RuntimeDefault` + seccomp). +- Container-level security context (`readOnlyRootFilesystem`, all caps + dropped, no privilege escalation). +- `app.kubernetes.io/*` labels including `part-of: gridstream`. Observability + selectors and ArgoCD selectors will rely on these in Sprint 3. + +What you don't get yet, with sprint pointers: + +- HPA — Sprint 3, lag-based per [ADR-0002](./adr/0002-consumer-lag-based-autoscaling.md). +- Ingress — Sprint 3. +- ServiceAccount with IRSA — Sprint 5. +- Job/CronJob template — added in Sprint 2 if needed. + +--- + +## Debugging without a shell + +The production container is distroless ([ADR-0009](./adr/0009-container-base-image.md)). +That means **`kubectl exec -it -- sh` does not work** — there is no +shell in the image. This is deliberate: the image is hardened against an +entire class of attack, and the absence of an exec path forces investment +in observability instead of `tail -f`-driven debugging. + +Here are the debugging affordances that *do* work, in roughly the order +you'll reach for them. + +### 1. Logs — your primary debugging surface + +```bash +# Live logs from the pod's running container: +kubectl logs -f deploy/my-service + +# Logs from the pod that just crashed and got replaced: +kubectl logs deploy/my-service --previous + +# Logs from a specific container in a multi-container pod: +kubectl logs -c +``` + +Logs follow [ADR-0004](./adr/0004-logging-and-stub-standards.md) — every line +is structured Python `logging` output with operational context (`device_id`, +`schema_version`, etc.). No `print()`, no unbuffered stdout, every level +filterable. + +### 2. `kubectl describe` — the pod's autobiography + +```bash +kubectl describe pod +``` + +Shows the events leading up to the current state: image pulls, probe +failures, OOM kills, scheduling waits. When a pod is in `CrashLoopBackOff`, +this is where you find the *why*. The events surface failures the logs +might not — e.g. the container's process never started because the image +couldn't pull. + +### 3. `kubectl debug` — ephemeral debug containers + +When you need shell-shaped tools against a running pod (without baking +them into the production image), attach an ephemeral debug container: + +```bash +# Drop a busybox sidecar sharing the target container's process namespace: +kubectl debug -it --image=busybox --target= + +# Or use a debug image that mirrors your app's environment: +kubectl debug -it --image=python:3.11-slim --target= -- bash +``` + +The ephemeral container is destroyed when you exit. The production image +stays clean. This is the right tool for "I need to inspect the network +namespace from inside the pod" or "I need to verify the mounted ConfigMap +contents from the container's perspective." + +### 4. Sprint 3 — observability stack + +When Sprint 3 lands, the debugging surface widens: + +- **Jaeger** — distributed traces from producer → Kafka → consumer. Track a + message through the whole pipeline by trace ID. +- **Grafana** — golden-signals dashboard (latency, traffic, errors, + saturation) per service. +- **Prometheus** — raw metrics, ad-hoc queries, alert rule evaluation. +- **OpenTelemetry collector** — sidecar in the chart, ships traces and + metrics without per-team code changes. + +The pattern is: *logs tell you what happened, traces tell you what's slow, +metrics tell you what's broken.* No shell needed for any of them. + +### Migration tip + +If your team is currently doing exec-based debugging, the transition isn't +a flag-day rewrite. The local-dev image (`Dockerfile.dev`) keeps the slim +base with shell — use it locally during the transition while you build +fluency with `kubectl logs` and `kubectl debug`. The production image +goes distroless from the start. + +--- + +## This repo's layout vs. yours + +[ADR-0010](./adr/0010-multi-package-layout-with-uv-workspaces.md) chose +**uv workspaces** for the GridStream platform repo because it ships +multiple services (stub, producer, consumer, models) that need divergent +dependencies and per-service container builds. + +**Your repo is one workspace member's worth of files.** Specifically: + +``` +your-service-repo/ +├── Makefile # copy from gridstream, adjust STUB_* vars +├── pyproject.toml # single-project (no [tool.uv.workspace] block) +├── Dockerfile # distroless final stage; no `--package` flag +├── .github/workflows/ci.yml # 5-line caller of the reusable workflow +├── src/ +│ └── your_service/ +└── tests/ +``` + +The workspace shape in the GridStream repo is for the platform repo's +benefit — it doesn't propagate to adopters. Your `Dockerfile` is simpler +than GridStream's stub Dockerfile because there's no workspace root to +reach for. Your `pyproject.toml` is one file at root, not a root + member +pair. + +If you ever grow into running multiple services from one repo, you can +adopt the workspace pattern then — it's additive. + +--- + +## What to read next + +- [`charts/standard-service/README.md`](../charts/standard-service/README.md) — + the chart's full values contract. +- [`docs/ARCHITECTURE.md`](./ARCHITECTURE.md) — system-level design context. +- [`docs/adr/`](./adr/) — every decision the road encodes, with the + reasoning preserved. +- [`docs/remaining_sprints.md`](./remaining_sprints.md) — what's coming + next; in particular, when Stage 4 (ArgoCD) and the observability stack + arrive. + +--- + +*Questions, gaps, or "this should work differently for my service" — open +a chart PR or drop into #platform-standards. The road improves by +adopters telling the platform team where it bumps.* diff --git a/docs/remaining_sprints.md b/docs/remaining_sprints.md index ff4b3dd..b34d86e 100644 --- a/docs/remaining_sprints.md +++ b/docs/remaining_sprints.md @@ -261,6 +261,41 @@ Stopping here yields the complete platform story. The migration narrative is wha --- - [ ] Add a recommendation to the Sprint 4 adoption playbook that adopting teams configure CODEOWNERS on their `values-*.yaml` files, requiring platform-team review for any change that touches `podSecurityContext` or `containerSecurityContext`. Social enforcement complementing the schema (ADR-0011); puts a human in the loop on the override path the schema can't pin without breaking legitimate use cases. +### CI workflow follow-ups deferred from Sprint 1 + +When registry push lands (per ADR-0006, Stage 4): + +- [ ] Wire registry push in `.github/workflows/standard-python-service.yml` + build job — replace the `[SPRINT-4-CLEANUP]` comment block with the + actual push step. Tag with both `${{ github.sha }}` (immutable, what + ArgoCD pins to) and a moving tag (`:main` or `:latest`) for human + convenience. +- [ ] Add job-level `permissions:` to the build job: +```yaml + build: + permissions: + contents: read # checkout still needs this + packages: write # registry push +``` + **Footgun:** job-level permissions *replace* workflow-level, they do + not merge. Dropping `contents: read` here will break `actions/checkout` + with a permissions error. The workflow-level `contents: read` does + not flow down once a job declares its own `permissions:` block. +- [ ] Add a smoke-test step after build: `docker run --rm --version` + (or equivalent health check). Catches "image built but won't start" + before it ships to the registry. +- [ ] Revisit the `coverage-threshold` input. Has any adopter exercised + the override? If yes, are the cases legitimate, and should the + threshold be tuned? If no, the input is dead weight on the public + contract and worth removing at v2. **Trigger:** post-first-adopter, + same window as the version-pin audit. +- [ ] Audit adopter version pins: run the GitHub code-search query for + `uses: sooperD00/gridstream/.github/workflows/standard-python-service.yml` + and confirm no adopter is still on a pre-push version that would + break when push lands. Coordinate cutover timing in #platform-standards. + + + ## ⚪ Sprint 5: AWS Deployment (Deferred / Stretch) **Goal (Cloud Substrate Validation):** Migrate the local Kind deployment to a real AWS EKS cluster. This sprint is deferred — it does not need to be complete to validate the platform's design. @@ -325,3 +360,45 @@ This sprint is **explicitly deferred**. It does not block any Sprint 1–4 deliv - Feature flags (LaunchDarkly or Unleash) for production deploy/release decoupling - SOC 2 / NERC-CIP compliance documentation pass - A second reference service (different language? different domain?) to prove the paved road generalizes + + +--- + +## Housekeeping + +Cross-cutting items not tied to a sprint. Promote to a sprint commit when +convenient. + +- [ ] CHANGELOG.md scaffolded with `[Unreleased]` section. Pre-@v1. +- [ ] Semver policy documented in paved-road.md (patch/minor/major + contract for adopters pinning @v1). Pre-@v1. + +## Tech debt + +Things we'd do differently if we were starting over, or know we'll have +to revisit. Triggers usually internal — pain accumulating in CI, refactor +opportunities, deferred SPRINT-N-CLEANUP markers coming due. + +*(empty for now)* + +## Post-adoption + +Items waiting on external triggers — adopters arriving, scale crossing +a threshold, a second cloud entering scope. Each item names its trigger. +Promote when the trigger fires; delete from here when promoted. + +- [ ] Document the adopter-version code-search query in paved-road.md + under a new "Platform team operations" section. + **Trigger:** first external adopter merges a `uses:` line. +- [ ] Scripted adopter audit (GitHub API → weekly CSV → manager report). + **Trigger:** ≥2 external adopters. Worth a dedicated ADR at build + time — "how the platform team monitors adoption" is architectural. + Backstage's service catalog is the prebuilt alternative to revisit + per ADR-0008 at this point. +- [ ] Regex-validate `image-name` input in standard-python-service.yml + to reject registry-prefixed or tagged values (currently caught + only by build-step failure). Cheap to add — a single shell step + with a regex check before the build job runs. + **Trigger:** first adopter who hits the double-tag failure and + asks "why didn't you just check this?" If nobody hits it, the + documentation in the input description is sufficient. From fdab8e319a3a43da9b57ab098d23b75f5f9b474c Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Tue, 5 May 2026 16:12:12 -0700 Subject: [PATCH 02/12] chore(gitignore): ignore .DS_Store and tee'd test result files --- .gitignore | 8 ++++++++ scripts/check-chart-schema.sh | 0 2 files changed, 8 insertions(+) mode change 100644 => 100755 scripts/check-chart-schema.sh diff --git a/.gitignore b/.gitignore index 599205b..a20789b 100644 --- a/.gitignore +++ b/.gitignore @@ -10,6 +10,14 @@ DEVLOG/ # .python-version # comment out is correct here - Python version is a paved-road policy decision +# macOS Finder metadata +.DS_Store + +# Tee'd test result files (per Sprint1.txt §7) +packages/*/tests/results_*.txt + + + # Byte-compiled / optimized / DLL files diff --git a/scripts/check-chart-schema.sh b/scripts/check-chart-schema.sh old mode 100644 new mode 100755 From de45f26e4b0e2b024bf531b44bb82649ea12f7bf Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Tue, 5 May 2026 16:24:11 -0700 Subject: [PATCH 03/12] fix(tests): correct field name + add mypy-strict regression test --- .../tests/test_mypy_strict.py | 35 +++++++++++++++++++ .../tests/test_standard_service_stub.py | 2 +- pyproject.toml | 5 ++- 3 files changed, 40 insertions(+), 2 deletions(-) create mode 100644 packages/standard-service-stub/tests/test_mypy_strict.py diff --git a/packages/standard-service-stub/tests/test_mypy_strict.py b/packages/standard-service-stub/tests/test_mypy_strict.py new file mode 100644 index 0000000..209fb79 --- /dev/null +++ b/packages/standard-service-stub/tests/test_mypy_strict.py @@ -0,0 +1,35 @@ +""" +Regression test for Sprint-1 §6 bug: mypy must run in strict mode when +invoked with this package's pyproject.toml. If this test fails, someone +removed or weakened the [tool.mypy] block in +packages/standard-service-stub/pyproject.toml. Restore it. +""" + +import subprocess +import sys +from pathlib import Path + +UNTYPED_CANARY = "def explode(x): return x.upper()\n" + + +def test_mypy_strict_rejects_untyped_def(tmp_path: Path) -> None: + canary = tmp_path / "canary.py" + canary.write_text(UNTYPED_CANARY) + + package_dir = Path(__file__).resolve().parent.parent + result = subprocess.run( + [ + sys.executable, + "-m", + "mypy", + "--config-file", + str(package_dir / "pyproject.toml"), + str(canary), + ], + capture_output=True, + text=True, + ) + assert result.returncode != 0, ( + f"mypy accepted untyped def — strict mode is NOT active.\n" + f"stdout:\n{result.stdout}\nstderr:\n{result.stderr}" + ) diff --git a/packages/standard-service-stub/tests/test_standard_service_stub.py b/packages/standard-service-stub/tests/test_standard_service_stub.py index a44030a..5e61d5a 100644 --- a/packages/standard-service-stub/tests/test_standard_service_stub.py +++ b/packages/standard-service-stub/tests/test_standard_service_stub.py @@ -46,7 +46,7 @@ def test_echo_round_trips_message_with_metadata() -> None: body = response.json() assert body["message"] == "ping" assert body["service"] == "standard-service-stub" - assert "received_at" in body # server-stamped; we don't pin its value + assert "server_received_at" in body # server-stamped; we don't pin its value def test_echo_rejects_empty_message_with_422() -> None: diff --git a/pyproject.toml b/pyproject.toml index e2a7eb9..1c1551c 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -50,7 +50,10 @@ ignore = [ ] [tool.ruff.lint.per-file-ignores] -"**/tests/**" = ["S101"] # `assert` is the whole point of a test +"**/tests/**" = [ + "S101", # `assert` is the whole point of a test + "S603", # subprocess.run inputs are controlled by the test author by definition +] [tool.mypy] python_version = "3.11" # mypy target-version From 2c94ce8827c732c55c72d9c5e2ebb4d666eafb15 Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Tue, 5 May 2026 16:34:10 -0700 Subject: [PATCH 04/12] fix(docker): centralize Python version + fix distroless venv + include README --- Makefile | 16 +++++++++++++++- packages/standard-service-stub/Dockerfile | 15 ++++++++++++--- packages/standard-service-stub/Dockerfile.dev | 6 +++++- 3 files changed, 32 insertions(+), 5 deletions(-) diff --git a/Makefile b/Makefile index e096e50..758cb92 100644 --- a/Makefile +++ b/Makefile @@ -8,6 +8,9 @@ # ─── Configuration ────────────────────────────────────────────────────────── +# single-source-of-truth +PYTHON_VERSION := $(shell cut -d. -f1,2 .python-version) + # Local Kind cluster name. Override with `make CLUSTER=foo infra-up`. CLUSTER ?= gridstream KIND_CONFIG ?= kind/cluster.yaml @@ -64,11 +67,21 @@ test: ## Run pytest with the 80% coverage gate. check-schema: ## Smoke-test charts/standard-service/values.schema.json (ADR-0011). @scripts/check-chart-schema.sh +# [SPRINT-1-CLEANUP] Behavior companion to check-schema (ADR-0011). +# check-schema validates the schema FILE is well-formed; this validates +# the schema's BEHAVIOR — that helm actually rejects const-pinned and +# required-field violations at template/install time. Add the +# corresponding step to .github/workflows/ci.yml during CI review. +.PHONY: check-chart +check-chart: ## Verify helm enforces the values schema (ADR-0011 phase 2). + @scripts/check-chart-behavior.sh + ##@ Build (ADR-0009) .PHONY: build build: ## Build the stub's distroless production image. docker build \ + --build-arg PYTHON_VERSION=$(PYTHON_VERSION) \ -f $(STUB_DIR)/Dockerfile \ -t $(STUB_IMAGE) \ . @@ -76,6 +89,7 @@ build: ## Build the stub's distroless production image. .PHONY: build-dev build-dev: ## Build the stub's slim local-dev image (NOT FOR PRODUCTION). docker build \ + --build-arg PYTHON_VERSION=$(PYTHON_VERSION) \ -f $(STUB_DIR)/Dockerfile.dev \ -t $(STUB_IMAGE)-dev \ . @@ -95,7 +109,7 @@ infra-down: ## Tear down the local Kind cluster. kind delete cluster --name $(CLUSTER) .PHONY: deploy-local -deploy-local: build check-schema ## Build, kind-load, and helm-install the stub. +deploy-local: build check-schema check-chart ## Build, kind-load, and helm-install the stub. kind load docker-image $(STUB_IMAGE) --name $(CLUSTER) helm upgrade --install $(STUB_NAME) charts/standard-service \ --set app.name=$(STUB_NAME) \ diff --git a/packages/standard-service-stub/Dockerfile b/packages/standard-service-stub/Dockerfile index c820fc7..782490d 100644 --- a/packages/standard-service-stub/Dockerfile +++ b/packages/standard-service-stub/Dockerfile @@ -8,9 +8,13 @@ # (`make build` cmd runs from gridstream repo root where pyproject.toml and uv.lock must exist): # docker build -f packages/standard-service-stub/Dockerfile -t … . +# Default for standalone `docker build`. `make build` overrides from .python-version +# (the canonical source). Must stay in sync with the distroless image's Python +# version below — a venv built on 3.X won't run on a 3.Y runtime. +ARG PYTHON_VERSION=3.11 # ─── Build stage ───────────────────────────────────────────────────────────── -FROM python:3.11-slim-bookworm AS builder +FROM python:${PYTHON_VERSION}-slim-bookworm AS builder # uv via the official distroless installer image (pinned by tag). COPY --from=ghcr.io/astral-sh/uv:0.5 /uv /uvx /usr/local/bin/ @@ -27,7 +31,9 @@ WORKDIR /workspace # Workspace metadata — copy first so dep resolution caches by manifest, not source. COPY pyproject.toml uv.lock ./ -COPY packages/standard-service-stub/pyproject.toml packages/standard-service-stub/ +COPY packages/standard-service-stub/pyproject.toml \ + packages/standard-service-stub/README.md \ + packages/standard-service-stub/ # Resolve and install this workspace member's deps. Two-step pattern: deps # first, project second, so the (slow) deps layer stays Docker-cached when @@ -49,6 +55,9 @@ RUN --mount=type=cache,target=/root/.cache/uv \ # ─── Final stage: distroless ───────────────────────────────────────────────── FROM gcr.io/distroless/python3-debian12 +# Re-declare to inherit the global ARG inside this stage. +ARG PYTHON_VERSION + WORKDIR /app # Copy the resolved venv and source from the builder. The venv path here @@ -59,7 +68,7 @@ COPY --from=builder --chown=nonroot:nonroot \ /workspace/packages/standard-service-stub/src /app/src ENV PATH="/app/.venv/bin:$PATH" \ - PYTHONPATH="/app/src" \ + PYTHONPATH="/app/.venv/lib/python${PYTHON_VERSION}/site-packages:/app/src" \ PYTHONUNBUFFERED=1 \ PORT=8000 diff --git a/packages/standard-service-stub/Dockerfile.dev b/packages/standard-service-stub/Dockerfile.dev index a2198e2..f667e6d 100644 --- a/packages/standard-service-stub/Dockerfile.dev +++ b/packages/standard-service-stub/Dockerfile.dev @@ -13,7 +13,11 @@ # ║ See ADR-0009 for the rationale. ║ # ╚══════════════════════════════════════════════════════════════════════════╝ -FROM python:3.11-slim-bookworm +# Global build arg — see Dockerfile for rationale. Same default keeps standalone +# `docker build -f Dockerfile.dev` working; `make build-dev` overrides from .python-version. +ARG PYTHON_VERSION=3.11 + +FROM python:${PYTHON_VERSION}-slim-bookworm COPY --from=ghcr.io/astral-sh/uv:0.5 /uv /uvx /usr/local/bin/ From dc951590db0627229bbc5ba6beb0a2ff531103ad Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Tue, 5 May 2026 16:35:54 -0700 Subject: [PATCH 05/12] feat(scripts): smoke-test script + chart-behavior lint fix --- Makefile | 4 ++ scripts/check-chart-behavior.sh | 29 +++++++++ scripts/smoke-test.sh | 108 ++++++++++++++++++++++++++++++++ 3 files changed, 141 insertions(+) create mode 100755 scripts/check-chart-behavior.sh create mode 100755 scripts/smoke-test.sh diff --git a/Makefile b/Makefile index 758cb92..2f9e348 100644 --- a/Makefile +++ b/Makefile @@ -121,3 +121,7 @@ deploy-local: build check-schema check-chart ## Build, kind-load, and helm-insta @echo " kubectl get pods -l app.kubernetes.io/instance=$(STUB_NAME)" @echo " kubectl port-forward svc/$(STUB_NAME) 8000:80" @echo " curl http://localhost:8000/healthz" + +.PHONY: smoke-test +smoke-test: ## Smoke-test the deployed standard-service-stub. + @scripts/smoke-test.sh diff --git a/scripts/check-chart-behavior.sh b/scripts/check-chart-behavior.sh new file mode 100755 index 0000000..5a3ebce --- /dev/null +++ b/scripts/check-chart-behavior.sh @@ -0,0 +1,29 @@ +#!/usr/bin/env bash +# Verifies helm enforces values.schema.json when rendering the chart. +# Two cases should fail (schema rejection), one should succeed. + +set -uo pipefail # NOT -e — we *want* to capture expected failures +CHART="charts/standard-service" +FAIL=0 + +# Minimum-valid placeholders that satisfy the schema. Reused wherever we're +# exercising chart *behavior* rather than its rejection logic. +VALID=(--set app.name=demo --set image.repository=nginx) + +helm lint "$CHART" "${VALID[@]}" >/dev/null 2>&1 \ + && echo "✓ helm lint passes" \ + || { echo "✗ helm lint FAILED — schema may be malformed"; FAIL=$((FAIL+1)); } + +helm template test "$CHART" --set podSecurityContext.runAsNonRoot=false >/dev/null 2>&1 \ + && { echo "✗ ACCEPTED runAsNonRoot=false — tier-1 const not enforced!"; FAIL=$((FAIL+1)); } \ + || echo "✓ rejects runAsNonRoot=false" + +helm template test "$CHART" >/dev/null 2>&1 \ + && { echo "✗ ACCEPTED empty app.name — required-field check broken"; FAIL=$((FAIL+1)); } \ + || echo "✓ rejects empty app.name" + +helm template test "$CHART" "${VALID[@]}" >/dev/null 2>&1 \ + && echo "✓ valid values render" \ + || { echo "✗ valid values REJECTED — schema over-restrictive"; FAIL=$((FAIL+1)); } + +[ "$FAIL" -eq 0 ] || exit 1 diff --git a/scripts/smoke-test.sh b/scripts/smoke-test.sh new file mode 100755 index 0000000..0b39dc2 --- /dev/null +++ b/scripts/smoke-test.sh @@ -0,0 +1,108 @@ +#!/usr/bin/env bash +# Smoke-test the deployed standard-service-stub. +# +# Runs after `make deploy-local`. Wired up via `make smoke-test`. +# Verifies the paved road's first traveler answers correctly through the +# Helm-deployed Service: probe endpoints, /docs renders, /echo round-trips +# valid input and rejects empty input, and ADR-0004 structured logging +# fires with the expected service-context shape. +# +# Exits non-zero on any failure so CI (eventually) can gate on it. + +set -euo pipefail + +SERVICE=standard-service-stub +LOCAL_PORT=8000 +SVC_PORT=80 +BASE_URL="http://localhost:${LOCAL_PORT}" + +# ─── tiny output helpers ──────────────────────────────────────────────────── + +GREEN='\033[32m'; RED='\033[31m'; CYAN='\033[36m'; RESET='\033[0m' +PASS=0; FAIL=0 +ok() { printf " ${GREEN}✓${RESET} %s\n" "$1"; PASS=$((PASS+1)); } +ko() { printf " ${RED}✗${RESET} %s\n" "$1"; FAIL=$((FAIL+1)); } +section() { printf "\n${CYAN}%s${RESET}\n" "$1"; } + +# ─── port-forward setup (cleaned up on any exit path) ─────────────────────── + +section "Port-forward svc/${SERVICE} ${LOCAL_PORT}→${SVC_PORT}" +kubectl port-forward "svc/${SERVICE}" "${LOCAL_PORT}:${SVC_PORT}" >/dev/null 2>&1 & +PF_PID=$! +trap 'kill ${PF_PID} 2>/dev/null || true' EXIT + +# Poll /healthz until the forward is live (max ~10s). +for i in $(seq 1 20); do + if curl -sf "${BASE_URL}/healthz" >/dev/null 2>&1; then + ok "port-forward live (${i} attempt(s))" + break + fi + sleep 0.5 + if [ "$i" -eq 20 ]; then + ko "port-forward never came up — is the pod Ready?" + exit 1 + fi +done + +# ─── HTTP checks ──────────────────────────────────────────────────────────── + +section "Probes" + +healthz_body=$(curl -sf "${BASE_URL}/healthz") +echo "$healthz_body" | grep -q '"status":"alive"' \ + && ok "/healthz status=alive" \ + || ko "/healthz body: $healthz_body" +echo "$healthz_body" | grep -q '"service":"standard-service-stub"' \ + && ok "/healthz service=standard-service-stub" \ + || ko "/healthz service field wrong" + +readyz_body=$(curl -sf "${BASE_URL}/readyz") +echo "$readyz_body" | grep -q '"status":"ready"' \ + && ok "/readyz status=ready" \ + || ko "/readyz body: $readyz_body" + +section "Swagger UI" +docs_code=$(curl -s -o /dev/null -w "%{http_code}" "${BASE_URL}/docs") +[ "$docs_code" = "200" ] \ + && ok "/docs returns 200 (Swagger renders)" \ + || ko "/docs returned ${docs_code}" + +section "Echo — Pydantic round-trip" +echo_body=$(curl -sf -X POST "${BASE_URL}/echo" \ + -H 'content-type: application/json' \ + -d '{"message":"hello"}') +echo "$echo_body" | grep -q '"message":"hello"' \ + && ok "POST /echo mirrors message" \ + || ko "/echo body: $echo_body" +echo "$echo_body" | grep -q '"server_received_at"' \ + && ok "/echo includes server_received_at (default_factory ran server-side)" \ + || ko "/echo missing server_received_at" +echo "$echo_body" | grep -q '"service":"standard-service-stub"' \ + && ok "/echo stamps service name" \ + || ko "/echo service field wrong" + +section "Echo — Pydantic rejects empty (proves validator is real, not theatrical)" +empty_code=$(curl -s -o /dev/null -w "%{http_code}" -X POST "${BASE_URL}/echo" \ + -H 'content-type: application/json' \ + -d '{"message":""}') +[ "$empty_code" = "422" ] \ + && ok "POST /echo {message:\"\"} → 422" \ + || ko "empty-message POST returned ${empty_code} (expected 422)" + +# ─── ADR-0004 structured-logging shape check ──────────────────────────────── + +section "Structured logging (ADR-0004)" +# Brief flush window after the echo POST. +sleep 1 +if kubectl logs "deploy/${SERVICE}" --tail=50 \ + | grep -q "echo request received service=standard-service-stub message_length=5"; then + ok "ADR-0004 log line present in pod logs" +else + ko "expected log line not found — see: kubectl logs deploy/${SERVICE}" +fi + +# ─── verdict ──────────────────────────────────────────────────────────────── + +section "Result" +printf " %d passed, %d failed\n\n" "$PASS" "$FAIL" +[ "$FAIL" -eq 0 ] From 9293314fe356cee25d5560918d7f72f67d087950 Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Tue, 5 May 2026 16:39:08 -0700 Subject: [PATCH 06/12] docs: paved-road adoption tutorial + README quickstart for Sprint 1 --- docs/paved-road.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/paved-road.md b/docs/paved-road.md index 72779b6..09c548a 100644 --- a/docs/paved-road.md +++ b/docs/paved-road.md @@ -48,9 +48,14 @@ curl -X POST http://localhost:8000/echo \ -d '{"message": "first traveler"}' ``` -If those curls return 200, the paved road works. Now read the rest of this -document to understand *what* you're inheriting and *how to adopt it for -your own service*. +If those curls return 200, the paved road works. + +Or skip the manual curls and run `make smoke-test` — same endpoints, plus +an ADR-0004 log-shape check, exits non-zero on any failure (so CI can gate +on it later). + +Now read the rest of this document to understand *what* you're inheriting +and *how to adopt it for your own service*. --- From bddd688f56752821673101be23d66c094c3c383a Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Tue, 5 May 2026 16:59:31 -0700 Subject: [PATCH 07/12] docs: housekeeping items found during sprint-1 closeout testing MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Four items surfaced during today's deploy testing that don't block the v1 tag but want to be findable next time: - §Housekeeping: script-mode discipline. New scripts/*.sh need chmod +x committed as mode 100755. Today's chart-behavior.sh permission-denied burned debug time on a fresh Mac checkout. - §Housekeeping: Python upgrade coordination. .python-version, the Dockerfile's builder FROM, and distroless's bundled Python must agree. The ARG pattern centralizes the *string* but not the underlying coupling. - §Sprint 3 Housekeeping: image-tag SHA discipline. Today's :dev tag mutation required helm uninstall + make deploy-local because helm saw an unchanged manifest. ArgoCD reconciliation + content-addressed image refs fix this naturally. - §Sprint 3 Housekeeping: wire make smoke-test into post-deploy verification. Script already exits non-zero on failure; it's gate-ready when CI grows a deployed cluster (ArgoCD post-sync, kind-in-CI). Plus a chart README callout (§MUST override): the empty-string defaults aren't placeholders to fill in, they're the schema's rejection trigger. Tooling that runs against bare defaults must pass placeholders; scripts/check-chart-behavior.sh is the canonical pattern. --- charts/standard-service/README.md | 6 ++++++ docs/remaining_sprints.md | 17 +++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/charts/standard-service/README.md b/charts/standard-service/README.md index f1d6c13..302d109 100644 --- a/charts/standard-service/README.md +++ b/charts/standard-service/README.md @@ -76,6 +76,12 @@ These have no sensible default. Helm fails fast if missing. | `app.name` | Container name and identification surface. Logs and dashboards rely on this being explicit per service. | | `image.repository` | The chart can't guess where your image lives. | +> The empty strings in `values.yaml` defaults aren't placeholders to fill +> in — they're the schema's rejection trigger. Tooling that runs against +> bare defaults (e.g. plain `helm lint charts/standard-service`) will +> fail; pass placeholder values to exercise chart behavior. See +> `scripts/check-chart-behavior.sh` for the canonical pattern. + ### SHOULD configure These have safe defaults, but a production-shaped service usually wants its diff --git a/docs/remaining_sprints.md b/docs/remaining_sprints.md index b34d86e..a7c4525 100644 --- a/docs/remaining_sprints.md +++ b/docs/remaining_sprints.md @@ -201,6 +201,15 @@ Stopping here yields the full technical narrative — paved road, reference serv ### Housekeeping - [ ] Install an admission-policy engine (Kyverno or OPA Gatekeeper — pick during sprint) and ship a baseline policy that rejects pods without `runAsNonRoot: true`, `readOnlyRootFilesystem: true`, and `capabilities.drop` containing `ALL`. Server-side defense-in-depth complementing ADR-0011's `values.schema.json`: the schema catches misconfiguration at `helm install`; admission policy catches anything that bypasses the chart. Pairs with the Sprint 3 observability stack — policy violations should surface in Grafana alongside the golden-signals dashboards. +- [ ] Image-tag SHA discipline. Today's `:dev` tag mutation required + `helm uninstall + make deploy-local` to force a rollout because + helm sees the manifest as unchanged. ArgoCD reconciliation + + content-addressed image refs (or imagePullPolicy: Always with + digests) makes this a non-issue. +- [ ] Wire `make smoke-test` into the post-deploy verification path. + Can't run in plain CI without a deployed cluster; ArgoCD post- + sync hooks or a kind-in-CI job is the right home. The script + already exits non-zero on failure, so it's gate-ready. --- @@ -372,6 +381,14 @@ convenient. - [ ] CHANGELOG.md scaffolded with `[Unreleased]` section. Pre-@v1. - [ ] Semver policy documented in paved-road.md (patch/minor/major contract for adopters pinning @v1). Pre-@v1. +- [ ] CONTRIBUTING.md note on script-mode discipline: new scripts in + scripts/ need `chmod +x` so the executable bit is committed + (mode 100755). Today's debugging burned 10 minutes on a permission- + denied for check-chart-behavior.sh that was committed as 100644. +- [ ] CONTRIBUTING.md or paved-road.md note on Python upgrade + coordination: .python-version, the Dockerfile's builder FROM, and + the distroless image's bundled Python must all agree. The ARG + pattern centralizes the *string* but not the underlying coupling. ## Tech debt From 8293c918947aac9f4de1f96581694004247a154f Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Wed, 6 May 2026 09:47:34 -0700 Subject: [PATCH 08/12] feat(ci): wire ADR-0011 chart enforcement (schema + behavior) into CI MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Both halves of ADR-0011's contract enforcement now run in CI: - chart-schema (uv, jsonschema): validates values.schema.json is well-formed and exercises known-good / known-bad inputs through the schema validator. - chart-behavior (helm): runs helm lint + 3× helm template (1 known- good, 2 known-bad) to verify helm itself rejects const-pinned and required-field violations at template/install time. Closes both [SPRINT-1-CLEANUP] markers in the Makefile that were waiting for the corresponding ci.yml steps. azure/setup-helm pinned by SHA per ADR-0012 (v5.0.0). Dependabot tracks it automatically via the existing github-actions package-ecosystem config (counter bumped 4 → 5 in the comment). Chart enforcement now matches the local make deploy-local gate: any chart change that breaks the schema or its enforcement fails CI before landing on main. --- .github/dependabot.yml | 2 +- .github/workflows/ci.yml | 40 ++++++++++++++++++++++++++++++++++++++++ Makefile | 11 +++-------- 3 files changed, 44 insertions(+), 9 deletions(-) diff --git a/.github/dependabot.yml b/.github/dependabot.yml index 5101a67..e131199 100644 --- a/.github/dependabot.yml +++ b/.github/dependabot.yml @@ -19,7 +19,7 @@ updates: # PRs land Monday morning; reviewing during the week; merge before next batch interval: weekly day: monday - # generous for 4 actions + # generous for 5 actions open-pull-requests-limit: 5 commit-message: # commit convention: "chore(github-actions):" diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index added12..70ad16e 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -50,3 +50,43 @@ jobs: dockerfile-context: . dockerfile-path: packages/standard-service-stub/Dockerfile image-name: gridstream-standard-service-stub + chart-schema: + name: Chart values schema (ADR-0011) + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 + + - name: Install uv + uses: astral-sh/setup-uv@caf0cab7a618c569241d31dcd442f54681755d39 # v3.2.4 + with: + enable-cache: true + + - name: Sync dev deps + # jsonschema is a root-pyproject dev dep for the schema smoke test; + # --frozen so a stale lock fails CI loudly, matching the reusable + # workflow's lockfile discipline. + run: uv sync --all-extras --dev --frozen + + - name: Validate values.schema.json (ADR-0011) + run: make check-schema + + # with: no working-directory set - this job runs from repo root by default + chart-behavior: + name: Chart behavior (ADR-0011) + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 + + - name: Install Helm + # Helm doesn't have its own helm/setup-helm first-party action -- + # Microsoft's K8s team filled the gap and the community settled there + # (it's the de facto standard for Helm-in-CI pipelines now). Hence `azure/`. + # This is a CI install path, nothing to do with what the action is coupled to. + uses: azure/setup-helm@dda3372f752e03dde6b3237bc9431cdc2f7a02a2 # v5.0.0 + + - name: Validate chart behavior (ADR-0011) + # Runs helm lint + 3× helm template (1 known-good, 2 known-bad). + # No uv/Python needed — pure helm. + run: make check-chart diff --git a/Makefile b/Makefile index 2f9e348..48f3b78 100644 --- a/Makefile +++ b/Makefile @@ -60,19 +60,14 @@ test: ## Run pytest with the 80% coverage gate. --cov-report=term-missing \ --cov-fail-under=80 -# [SPRINT-1-CLEANUP] Wire-up verification: confirm `jsonschema` is in dev deps, -# helm is installed locally, then run this target end-to-end. Add the -# corresponding step to .github/workflows/ci.yml during CI review. .PHONY: check-schema +# Validates the schema FILE is well-formed check-schema: ## Smoke-test charts/standard-service/values.schema.json (ADR-0011). @scripts/check-chart-schema.sh -# [SPRINT-1-CLEANUP] Behavior companion to check-schema (ADR-0011). -# check-schema validates the schema FILE is well-formed; this validates -# the schema's BEHAVIOR — that helm actually rejects const-pinned and -# required-field violations at template/install time. Add the -# corresponding step to .github/workflows/ci.yml during CI review. .PHONY: check-chart +# Validates schema's BEHAVIOR — that helm actually rejects const-pinned and +# required-field violations at template/install time check-chart: ## Verify helm enforces the values schema (ADR-0011 phase 2). @scripts/check-chart-behavior.sh From 19a6bf7f0e1475c3a1a9b274a40ba20c4307a202 Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Wed, 6 May 2026 12:20:13 -0700 Subject: [PATCH 09/12] chore(ci): pin uv version floor across workflow + Dockerfiles MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CI's setup-uv step had no version: input, defaulting to whatever the action shipped with at run time. Dockerfiles pinned at :0.5 (stale — local install is 0.11.8). Picked one floor — 0.11 — and applied it to all four sites: both workflow files (reusable + GridStream caller's chart-schema job) and both Dockerfiles (prod + dev). Convention documented in CONTRIBUTING.md 'Tool version pinning': major.minor floor allows automatic patch updates while keeping breaking bumps explicit. Same string resolves the same way in GHA version: inputs and Docker :tag references — drift between the two is the failure mode this prevents. Closes the CI/local drift gap that would otherwise produce subtle 'works on my machine' diffs as uv evolves. --- .github/workflows/ci.yml | 2 ++ .github/workflows/standard-python-service.yml | 2 ++ docs/CONTRIBUTING.md | 14 ++++++++++++++ packages/standard-service-stub/Dockerfile | 2 +- packages/standard-service-stub/Dockerfile.dev | 2 +- 5 files changed, 20 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 70ad16e..d54e425 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -60,6 +60,8 @@ jobs: - name: Install uv uses: astral-sh/setup-uv@caf0cab7a618c569241d31dcd442f54681755d39 # v3.2.4 with: + # major.minor floor — see CONTRIBUTING.md "Tool version pinning" + version: "0.11" enable-cache: true - name: Sync dev deps diff --git a/.github/workflows/standard-python-service.yml b/.github/workflows/standard-python-service.yml index 6e1a735..9cf3dc9 100644 --- a/.github/workflows/standard-python-service.yml +++ b/.github/workflows/standard-python-service.yml @@ -89,6 +89,8 @@ jobs: - name: Install uv uses: astral-sh/setup-uv@caf0cab7a618c569241d31dcd442f54681755d39 # v3.2.4 with: + # major.minor floor — see CONTRIBUTING.md "Tool version pinning" + version: "0.11" enable-cache: true - name: Set up Python ${{ inputs.python-version }} diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index 22892d4..8ca79f7 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -24,6 +24,20 @@ If you are modifying a `.avsc` file: 2. Run the compatibility checker: `make schema-check`. 3. Notify the #platform-standards Slack channel. +### Tool version pinning + +Tool versions shared between CI and Dockerfiles are pinned to a `major.minor` +floor — e.g. `uv@0.11`, not `uv@0.11.8`. This allows automatic patch updates +(security and bugfix releases) without manual bumps, while breaking-version +moves remain explicit PRs that touch this floor and the lockfile in the same +diff. + +The same floor string appears in both `version:` inputs to GitHub Actions +(`astral-sh/setup-uv@... { version: "0.11" }`) and `:major.minor` Docker +image tags (`COPY --from=ghcr.io/astral-sh/uv:0.11 ...`), so CI and local +builds resolve to the same family on every run. Drift between the two is +the failure mode this convention is designed to prevent. + --- ## References diff --git a/packages/standard-service-stub/Dockerfile b/packages/standard-service-stub/Dockerfile index 782490d..da7bad4 100644 --- a/packages/standard-service-stub/Dockerfile +++ b/packages/standard-service-stub/Dockerfile @@ -17,7 +17,7 @@ ARG PYTHON_VERSION=3.11 FROM python:${PYTHON_VERSION}-slim-bookworm AS builder # uv via the official distroless installer image (pinned by tag). -COPY --from=ghcr.io/astral-sh/uv:0.5 /uv /uvx /usr/local/bin/ +COPY --from=ghcr.io/astral-sh/uv:0.11 /uv /uvx /usr/local/bin/ # UV_COMPILE_BYTECODE=1 precompile .py→.pyc at build time so cold starts don't pay it # UV_LINK_MODE=copy cache mount lives on a different filesystem; uv's default diff --git a/packages/standard-service-stub/Dockerfile.dev b/packages/standard-service-stub/Dockerfile.dev index f667e6d..73704c2 100644 --- a/packages/standard-service-stub/Dockerfile.dev +++ b/packages/standard-service-stub/Dockerfile.dev @@ -19,7 +19,7 @@ ARG PYTHON_VERSION=3.11 FROM python:${PYTHON_VERSION}-slim-bookworm -COPY --from=ghcr.io/astral-sh/uv:0.5 /uv /uvx /usr/local/bin/ +COPY --from=ghcr.io/astral-sh/uv:0.11 /uv /uvx /usr/local/bin/ ENV UV_COMPILE_BYTECODE=1 \ UV_LINK_MODE=copy \ From 571b7e553730137f8f70f0340c1bed865d1d4873 Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Wed, 6 May 2026 13:00:18 -0700 Subject: [PATCH 10/12] fix(ci): remove literal expression syntax from input description GHA's expression parser scans the entire workflow file for ${{ ... }} syntax and validates it regardless of whether the expression is meant as documentation or evaluation. The image-name input description used ':${{ github.sha }}' to illustrate what the build step concatenates, but inside workflow_call.inputs the github context isn't available, so the expression failed parsing and invalidated the whole file. Replaced with ':' placeholder. The actual github.sha reference in the build step is unchanged (step-level context is fine). --- .github/workflows/standard-python-service.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/standard-python-service.yml b/.github/workflows/standard-python-service.yml index 9cf3dc9..15fdda8 100644 --- a/.github/workflows/standard-python-service.yml +++ b/.github/workflows/standard-python-service.yml @@ -40,7 +40,7 @@ on: description: | Image name without registry prefix or tag (e.g. "my-team-service", not "ghcr.io/myorg/my-team-service:latest"). The build step - concatenates this with ":${{ github.sha }}" to produce the final + concatenates this with ":" to produce the final tag, so passing a registry-prefixed or tagged value yields an invalid double-tag like "ghcr.io/foo:latest:abc123" that fails at build time. Not currently enforced — see remaining_sprints.md From fa7698e2523547b16bf9dbb686823b60921996d2 Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Wed, 6 May 2026 13:16:11 -0700 Subject: [PATCH 11/12] docs: resolve [SPRINT-1-CLEANUP] now that v1 is tagged --- docs/paved-road.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/paved-road.md b/docs/paved-road.md index 09c548a..80bc5a3 100644 --- a/docs/paved-road.md +++ b/docs/paved-road.md @@ -97,7 +97,6 @@ on: [push, pull_request] jobs: ci: - # requires v1 tag — coming once Sprint 1 ships post-verification [SPRINT-1-CLEANUP] uses: sooperD00/gridstream/.github/workflows/standard-python-service.yml@v1 with: image-name: my-team-service From ad8de83bba3d8f391f139be3502109b3b057656d Mon Sep 17 00:00:00 2001 From: sooperD00 Date: Wed, 6 May 2026 13:30:39 -0700 Subject: [PATCH 12/12] =?UTF-8?q?docs(housekeeping):=20track=20Node.js=202?= =?UTF-8?q?0=20=E2=86=92=2024=20deadline=20for=20pinned=20actions?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GHA flagged Node 20 deprecation on today's CI run: forced switch to Node 24 on June 2, 2026; full removal Sept 16. Several SHA-pinned actions (checkout, setup-uv, docker/*) are still on Node 20. Existing weekly Dependabot cadence already watches them, so this is a tickler for sprint-review visibility, not new work. --- docs/remaining_sprints.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/remaining_sprints.md b/docs/remaining_sprints.md index a7c4525..117fc8a 100644 --- a/docs/remaining_sprints.md +++ b/docs/remaining_sprints.md @@ -389,6 +389,10 @@ convenient. coordination: .python-version, the Dockerfile's builder FROM, and the distroless image's bundled Python must all agree. The ARG pattern centralizes the *string* but not the underlying coupling. +- [ ] Watch Dependabot PRs for Node-24-compatible action bumps + (checkout, setup-uv, docker/* still on Node 20). Forced switch + June 2, 2026; removal Sept 16. Existing weekly cadence should + catch each action's release; merge as they arrive. ## Tech debt