You are an experienced, pragmatic software engineering AI agent. Do not over-engineer a solution when a simple one is possible. Keep edits minimal. If you want an exception to ANY rule, you MUST stop and get permission first.
coder-k8s is a Go-based Kubernetes control-plane project with two app modes: a controller-runtime operator for CoderControlPlane (coder.com/v1alpha1) and an aggregated API server for CoderWorkspace/CoderTemplate (aggregation.coder.com/v1alpha1).
Tech stack
- Go
1.25.7(go.mod) - Kubernetes libraries:
controller-runtime,client-go,apimachinery,apiserver,code-generator - Vendored dependencies committed under
vendor/ - Tooling:
make,golangci-lint/gofumpt, Bash scripts inhack/andscripts/, GitHub Actions, GoReleaser, optional Nix dev shell (flake.nix)
main.go: process entrypoint; initializes logging and exits on application failure.app_dispatch.go:--appmode dispatch betweencontrollerandaggregated-apiserver.main_test.go: dispatch and defensive nil-check coverage.internal/app/controllerapp/controllerapp.go: controller mode bootstrap (scheme, manager, health/readiness checks).internal/app/apiserverapp/apiserverapp.go: aggregated API server bootstrap and API group installation.api/v1alpha1/codercontrolplane_types.go: CRD spec/status and list types forCoderControlPlane.api/aggregation/v1alpha1/types.go: aggregated API types forCoderWorkspaceandCoderTemplate.internal/controller/codercontrolplane_controller.go: reconciler andSetupWithManagerlogic.internal/aggregated/storage/workspace.go+template.go: hardcoded in-memory aggregated API storage.hack/update-manifests.sh: CRD/RBAC generation entrypoint.hack/update-codegen.sh: deepcopy codegen entrypoint.Makefile: canonical build/test/lint/vendor/codegen/manifests commands..golangci.yml: lint and formatting rules (includinggofumpt)..github/workflows/ci.yamland.github/workflows/release.yaml: CI and release pipelines..goreleaser.yamlandDockerfile.goreleaser: release packaging and container build configuration.
api/v1alpha1/: CRD API group/version types and generated deepcopy code.api/aggregation/v1alpha1/: aggregated API group/version types and generated deepcopy code.internal/app/: application-mode bootstrap packages (controllerapp,apiserverapp).internal/controller/: controller reconciliation logic and envtest coverage.internal/aggregated/: aggregated API server storage implementation.internal/deps/: blank imports to keep Kubernetes tool deps pinned ingo.mod/vendor.config/: generated CRDs, RBAC, and sample manifests.deploy/: deployment manifests for controller and aggregated API server components.hack/: maintenance scripts (codegen/manifests).scripts/: PR workflow automation and review/check helpers..github/workflows/: CI and release automation.vendor/: checked-in module dependencies (required by project workflow)..mux/skills/coder-docs/: Mux agent skill with offline coder/coder docs snapshot (update:make update-coder-docs-skill).
- Keep Terraform backend values out of committed
.tfcode beyond shared backend settings interraform/versions.tf. - Shared sandbox EKS state location:
- S3 bucket:
coder-k8s-tfstate-112158171837 - State key:
terraform-ncp3/sandbox-eks/terraform.tfstate
- S3 bucket:
- Initialize Terraform against that backend with explicit config flags (example):
nix develop -c terraform -chdir=terraform init -reconfigure -backend-config="bucket=coder-k8s-tfstate-112158171837" -backend-config="key=terraform-ncp3/sandbox-eks/terraform.tfstate"
- When AWS CLI access is needed, run commands through the Nix dev shell (
nix develop -c ...).
maindelegates torun(...), which requires--app=<controller|aggregated-apiserver>.controllermode registers core Kubernetes +coder.com/v1alpha1schemes, starts the controller-runtime manager, and wires health/readiness probes.aggregated-apiservermode builds a generic API server foraggregation.coder.com/v1alpha1and installscoderworkspaces/codertemplatesstorage.- Defensive checks are intentional (
assertion failed: ...) and used to fail fast during development.
Run from repository root.
- Build:
make build - Test:
make test - Integration tests (controller envtest):
make test-integration - Lint + format checks:
make lint - Format (apply):
GOFLAGS=-mod=vendor golangci-lint fmt - Format (check):
GOFLAGS=-mod=vendor golangci-lint fmt --diff - Vulnerability scan:
make vuln - Lint (workflows):
go run github.com/rhysd/actionlint/cmd/actionlint@v1.7.10 - Development run (controller mode):
GOFLAGS=-mod=vendor go run . --app=controller(requires Kubernetes config via your env, e.g.KUBECONFIG) - Development run (aggregated API mode):
GOFLAGS=-mod=vendor go run . --app=aggregated-apiserver - Vendor consistency:
make verify-vendor - Manifest generation:
make manifests(orbash ./hack/update-manifests.sh) - Code generation:
make codegen(orbash ./hack/update-codegen.sh) - Docs (serve):
make docs-serve - Docs (strict build):
make docs-check - Clean:
go clean -cache -testcache && rm -f ./coder-k8s && rm -rf ./dist - Shell scripts:
find . -type f -name '*.sh' -not -path './vendor/*' - Update coder-docs skill:
make update-coder-docs-skill
.mux/tool_envis sourced before everybashtool call (Mux docs:/hooks/tools).- Use
run_and_report <step_name> <command...>for multi-step validation in one bash invocation. - The helper writes full logs to
/tmp/mux-<workspace>-<step>.log, prints pass/fail markers, and tails failures. - Do not pipe, redirect, prepend, append, or otherwise wrap
run_and_reportoutput. Invoke it directly so the helper’s step markers remain human-readable in the Mux UI. - Example:
run_and_report verify-vendor make verify-vendorrun_and_report test make test
-
Do preserve fail-fast assertions for impossible states (nil manager/client/scheme, mismatched fetched objects). Don’t silently ignore these paths or convert them to soft failures.
-
Do keep vendoring in sync when dependencies change (
go mod tidy,go mod vendor, then verify diff). Don’t submit dependency changes without updatingvendor/. Don’t manually delete or editvendor/modules.txt; refresh vendoring viago mod tidy && go mod vendor(ormake vendor) instead. -
Do regenerate generated artifacts after API changes (
make codegen,make manifests). Don’t hand-edit generated files likezz_generated.deepcopy.goor CRD/RBAC manifests. -
Do regenerate API reference docs (
make docs-reference) after changing API structs inapi/type-definition files (for example,api/**/**/*_types.go). Don’t merge API struct changes without updateddocs/reference/api/*.mdoutput. -
Do keep controller, aggregated API server, and storage changes paired with focused tests (
main_test.go,internal/controller/*_test.go, and package tests underinternal/app//internal/aggregated/). Don’t add behavior without coverage for critical assumptions. -
Do update the docs in
docs/when you change user-facing behavior (APIs, flags, manifests, deployment). Don’t let docs drift from the implementation.
- Unpinned GitHub Action versions in workflow files (CI uses SHA-pinned actions).
- Running CI-sensitive commands without vendoring mode when behavior differs from CI.
- Removing assertion messages that start with
assertion failed:; these are deliberate diagnostics. - Wrapping
run_and_reportoutput with shell redirection/pipes or extra surrounding text; this obscures the helper’s built-in markers in the Mux UI.
- Follow idiomatic Go and the Uber Go Style Guide as a baseline; project-specific rules in this file take precedence.
- Keep code
gofumpt-formatted (enforced viagolangci-lint fmt). - Keep comments concise and purposeful (package docs, exported type/function docs).
- Match existing error style: contextual wrapping + explicit assertion messages for impossible conditions.
- Run
make test. - Run
make build. - Run
make verify-vendor. - Run
make lint(or explain why it was skipped). - If API types changed, run
make codegenandmake manifests, then include generated updates. - If
.github/workflows/*changed, rungo run github.com/rhysd/actionlint/cmd/actionlint@v1.7.10. - If your change affects user-facing behavior (APIs, flags, manifests, deployment), update the documentation in
docs/and runmake docs-check.
- Match repository history style: short imperative summary, optionally prefixed by type (e.g.,
chore: ...). - Prefer
type: messageif unsure. - Include issue/PR reference when available (examples in history use
(#N)).
- Include: what changed, why, validation commands run, and any follow-up work.
- For public mux-generated PRs/commits in this environment, include the attribution footer defined in
.mux/skills/pull-requests/SKILL.md.
- Before creating or updating any PR, commit, or public issue, read
.mux/skills/pull-requests/SKILL.mdand follow it. - Use
./scripts/wait_pr_ready.sh <pr_number>for a one-command wait flow after requesting review. - Prefer
ghCLI for GitHub interactions over manual web/curl flows.
PR readiness is mandatory. You MUST keep iterating until the PR is fully ready. A PR is fully ready only when: (1) Codex explicitly approves, (2) all Codex review threads are resolved, and (3) all required CI checks pass. You MUST NOT report success or stop the loop before these conditions are met.
When a PR exists, you MUST remain in this loop until the PR is fully ready:
- Push your latest fixes.
- Run local validation (
make verify-vendor,make test,make build,make lint). - Request review with
@codex review. - Run
./scripts/wait_pr_ready.sh <pr_number>(it polls Codex + required checks concurrently and fails fast). - If Codex leaves comments, address them, resolve threads with
./scripts/resolve_pr_comment.sh <thread_id>, push, and repeat. - If checks/mergeability fail, fix issues locally, push, and repeat.
The only early-stop exception is when the reviewer is clearly misunderstanding the intended change and further churn would be counterproductive. In that case, leave a clarifying PR comment and pause for human direction.