Skip to content

Foundation phase: workspace, libraries, infra, CI, governance#2

Merged
zagrosi-code merged 14 commits into
mainfrom
feature/zg-1-foundation-setup
May 8, 2026
Merged

Foundation phase: workspace, libraries, infra, CI, governance#2
zagrosi-code merged 14 commits into
mainfrom
feature/zg-1-foundation-setup

Conversation

@zagrosi-code

Copy link
Copy Markdown
Owner

Summary

Lands the foundation phase across 11 commits:

  • Cargo workspace with production-grade lints, clippy.toml, deny.toml, commitlint.config.mjs, rust-toolchain.toml (Rust 1.91), .editorconfig.
  • zagrosi-core foundation crate: ZagrosiError, layered CoreConfig loader, Observability guard wrapping tracing-subscriber plus optional OTLP HTTP/protobuf and Prometheus admin server with cooperative shutdown.
  • apps/api-gateway placeholder binary; reserved app directories for zagrosi-mcp, worker, web.
  • pnpm workspace with populated catalog (React 19, TypeScript 6, Vite 8, Vitest 4, Zod 4, Tailwind 4, TanStack Router and Query).
  • Local-development Compose stack at deploy/docker/compose.yaml (PostgreSQL 18, Valkey 9, NATS 2.14; loopback only) plus production-grade valkey.conf and scripts/smoke-compose.sh.
  • Helm chart skeleton at deploy/helm/ (empty by default; every component toggle disabled).
  • Five GitHub Actions workflows pinned to 40-char SHAs (rust, web, helm-lint, dco, commitlint) plus branch-protection.json documenting the modern Rulesets API payload.
  • Repo hygiene: issue forms, PR template, Code of Conduct (Contributor Covenant 2.1), security policy.
  • Governance manual (nine sections), changelog, README refresh.

Test plan

  • cargo build --workspace clean
  • cargo clippy --workspace --all-targets --all-features -- -D warnings clean
  • cargo test --workspace --all-features --no-fail-fast passes (27 tests)
  • cargo fmt --all -- --check clean
  • pnpm install --frozen-lockfile clean
  • pnpm -r run lint, typecheck, test, format all exit 0
  • bash scripts/smoke-compose.sh passes end-to-end (Postgres, Valkey, NATS healthy)
  • helm lint --strict deploy/helm clean
  • actionlint .github/workflows/*.yml clean
  • Every commit DCO-signed

Adds the workspace root manifest with explicit members list (crates/zagrosi-core, apps/api-gateway), a pinned dependency table covering serde, thiserror, anyhow, tracing, OpenTelemetry, Prometheus metrics, figment, axum, tokio, plus testing crates (proptest, tempfile, serial_test, static_assertions) and reserved deps for later work.

Workspace lints forbid unsafe_code, deny unwrap_used / dbg_macro / print_stdout / print_stderr, warn on expect_used / panic / todo / unimplemented, and turn on pedantic + nursery + cargo groups with documented allow exceptions. Release profile uses thin LTO and symbol stripping.

Adds clippy.toml (MSRV 1.91, test-context allowances), deny.toml (license allow list, denies openssl / openssl-sys / native-tls), commitlint.config.mjs (extends @commitlint/config-conventional, locks the 19 scopes from CONTRIBUTING.md), rust-toolchain.toml (channel 1.91.0, components rustfmt + clippy + rust-src), and .editorconfig (per-language indent overrides).

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
New library crate with three modules:

- error.rs: ZagrosiError thiserror enum and Result alias. The Config
  variant boxes figment::Error to keep the discriminant under 32 bytes;
  Io conversion via #[from]; InvalidArgument and Internal constructors.
  Boundary policy documented in module rustdoc: per-crate libraries
  define their own thiserror enums, binaries use anyhow at entry, and
  conversion through ZagrosiError happens only at OS-level surfaces.

- config.rs: CoreConfig with figment-based env and TOML layered load.
  Env values take precedence; the file fills gaps; unknown TOML fields
  tolerated. Tests use figment::Jail to isolate env without unsafe
  std::env::set_var (workspace forbids unsafe_code).

- observability.rs: Observability guard wrapping tracing-subscriber,
  optional OTel OTLP HTTP/protobuf export, and an optional Prometheus
  admin server backed by an axum router on /metrics and /healthz. The
  Prometheus path binds the listener synchronously before installing
  the metrics recorder so EADDRINUSE surfaces before recorder state
  is mutated. Drop cancels the cooperative-shutdown token, polls
  JoinHandle::is_finished in a bounded loop, and falls back to abort
  only as last resort. OTel shutdown runs on a detached thread with a
  5-second timeout. service.name is attached to the SdkTracerProvider
  resource so spans carry the operator-configured identity.

Workspace members temporarily trimmed to crates/zagrosi-core; the
api-gateway entry returns when the gateway crate lands.

Cargo.lock generated against the pinned toolchain.

cargo build, cargo clippy --all-targets --all-features -- -D warnings,
and cargo test -p zagrosi-core --all-features --no-fail-fast all clean
(23 unit tests plus 1 doc-test).

.gitignore picks up project-local CLAUDE.md.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
Adds the apps/api-gateway crate: a small Rust binary that loads
CoreConfig from the ZAGROSI_-prefixed environment, calls
Observability::init(&cfg), emits a single tracing::info line
("zagrosi: placeholder"), and exits zero. No HTTP server, no router,
no middleware. Its purpose at this stage is to verify that workspace
dependency wiring against zagrosi-core is correct and that the
production-grade lint set passes against a real binary under
-D warnings.

Workspace members restored to ["crates/zagrosi-core", "apps/api-gateway"].

Crate metadata inherits from [workspace.package] (edition, rust-version,
license, repository, homepage, authors, readme). publish = false. Two
runtime dependencies: tokio (macros + rt-multi-thread features) and
tracing. zagrosi-core via path dep.

The integration test at apps/api-gateway/tests/binary.rs spawns the
compiled binary via std::process::Command with a hermetic environment
(env_clear plus PATH, ZAGROSI_SERVICE_NAME, RUST_LOG passthrough),
asserts exit zero, and confirms the marker substring lands in
combined stdout+stderr.

cargo fmt sweep on crates/zagrosi-core/src/observability.rs test code:
mechanical assert! macro reflow, no semantic change.

cargo build, cargo clippy --all-targets --all-features -- -D warnings,
and cargo test -p api-gateway --all-features --no-fail-fast all clean.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
Reserves three app slots that later splits will fill:

- apps/zagrosi-mcp (rmcp-based MCP server)
- apps/worker (background-job worker binary)
- apps/web (React + TypeScript frontend)

Each directory contains exactly one file (.gitkeep) and nothing else.
No Cargo.toml (would force workspace-member treatment), no
package.json (the pnpm workspace from a later split tolerates absence).

apps/admin is intentionally NOT created. The MVP admin surface ships
inside apps/web until a later split decides otherwise.

A regression test at apps/api-gateway/tests/reservations.rs guards the
workspace-glob hazard (R25): it enumerates each reserved directory's
entries and asserts the entry list equals exactly [".gitkeep"], walks
crates/ and asserts every directory contains a Cargo.toml (no stray
.gitkeep-only crate reservations), and asserts apps/admin does not
exist. Errors during filesystem traversal panic loudly.

The test compiles into the api-gateway crate's integration test
binary because that is the natural home: api-gateway is the only
populated app directory and the reservations are part of the same
logical commit group.

cargo test -p api-gateway --all-features --no-fail-fast: 4/4 pass
(1 binary integration test from the previous commit + 3 reservation
tests added here).

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
Adds the JavaScript-side workspace foundation. No real packages yet;
the recursive scripts match zero packages and exit zero, but the
manifest, lockfile, and config land now so future splits adding the
web shell, MCP server, or browser plugins all reference the same
canonical version pins.

pnpm-workspace.yaml
  Globs: apps/*, packages/*, plugins/*. The default catalog covers
  React 19 plus types, TypeScript 6, Vite 8, Vitest 4, Zod 4,
  Tailwind 4, TanStack Router/Query, prettier, and @types/node 24.
  Catalog entries are a map until referenced by a real workspace
  package. The named "testing" catalog is intentionally absent at
  this stage; it lands when the web shell test infrastructure
  arrives in a later split.

package.json
  Private root manifest. packageManager pinned to pnpm@11.0.8 so
  corepack reproduces the exact patch on every contributor machine.
  engines.node pinned to >=24.0.0 <25.0.0; engines.pnpm pinned to
  >=11.0.8. Six no-op recursive scripts (build, lint, typecheck,
  test, test:e2e, format) delegate via pnpm -r run.

.npmrc
  engine-strict=true so pnpm rejects installs under the wrong Node
  major. strict-peer-dependencies=true so unmet peer ranges fail
  rather than warn. auto-install-peers=true so pnpm fills
  unambiguous peers automatically.

pnpm-lock.yaml
  Generated under pnpm 11.0.8. lockfileVersion 9; importers empty
  because the catalog is unreferenced. pnpm install --frozen-lockfile
  passes against this file from day one.

Reserved app directories (apps/zagrosi-mcp, apps/worker, apps/web,
each containing only .gitkeep) are silently skipped by pnpm because
they have no package.json. apps/api-gateway is a Cargo crate; pnpm
treats it the same way.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
Local development stack: Postgres 18, Valkey 9, NATS 2.14, all bound
to 127.0.0.1 with healthchecks and named volumes.

deploy/docker/compose.yaml
  Three services with explicit project name "zagrosi". Required env
  vars (POSTGRES_USER, POSTGRES_PASSWORD) use ${VAR:?msg} so missing
  values abort with a clear error. Optional vars use ${VAR:-default}.
  Postgres PGDATA points at a subdirectory so the named volume mount
  does not shadow the entrypoint init scripts. NATS runs with -js
  for JetStream and exposes the monitor port (8222) on loopback only.
  No top-level version field (Compose v2). All published ports are
  127.0.0.1:-prefixed so the dev stack is unreachable from the LAN;
  the production Helm chart owns external exposure via Ingress and
  NetworkPolicy.

infra/valkey/valkey.conf
  Production-grade Valkey config: AOF appendfsync everysec, RDB save
  points (3600/1, 300/100, 60/10000), 512mb maxmemory with allkeys-lru
  eviction, slowlog. Binds 0.0.0.0 inside the container; defence
  against external exposure is the Compose loopback-only port mapping.
  Production deploys require auth via Kubernetes Secret.

infra/postgres/init/.gitkeep
  Reserves the directory bind-mounted at /docker-entrypoint-initdb.d.
  Future SQL init scripts (extension creation, role provisioning) land
  here in later splits.

.env.example
  Documents every variable referenced by compose.yaml, with the literal
  POSTGRES_PASSWORD=changeme-strong-password-required placeholder so
  contributors cannot accidentally ship the default. Quickstart points
  at the actual compose file path.

scripts/smoke-compose.sh (mode 100755)
  Brings the stack up, polls container Health via docker inspect with
  bounded timeouts (60s postgres, 30s valkey, 30s nats), runs three
  sanity probes (valkey-cli ping, pg_isready, NATS /healthz), tears
  down via trap cleanup EXIT. Probes go through a probe helper that
  dumps docker compose ps and per-service logs on failure. Self-
  contained env so CI can invoke it without a checked-in .env. Tested
  end-to-end on macOS: all probes green, cleanup ran, exit zero.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
Empty-by-default Helm chart at deploy/helm/. Every component toggle
defaults to enabled: false, so helm template emits zero Kubernetes
manifests. Forward-compatible with later splits adding real templates
that gate on the toggles.

Chart.yaml
  apiVersion v2, version 0.1.0, appVersion 0.1.0, kubeVersion >=1.30.0,
  dependencies present and empty. Maintainer email oss@zagrosi.com.

values.yaml
  Component toggles (apiGateway, worker, mcp, web, postgres, valkey,
  nats, ingress) all enabled: false. Observability mirrors
  zagrosi-core: otel.enabled false with empty endpoint, prometheus
  enabled false with empty bind, logFormat json. serviceAccount.create
  true with empty name (Helm derives via fullname). ingress.hosts is an
  explicit empty list.

templates/_helpers.tpl
  Bitnami-style helpers: zagrosi.name, zagrosi.fullname, zagrosi.chart,
  zagrosi.labels, zagrosi.selectorLabels (no version label, since
  Kubernetes spec.selector.matchLabels is immutable post-creation),
  zagrosi.componentSelectorLabels (accepts a dict containing Chart,
  Values, Release, component for per-component invocation), and
  zagrosi.serviceAccountName.

.helmignore
  Standard exclusion list (VCS, editor, OS, ci/, .github/,
  README.md.gotmpl).

helm lint --strict deploy/helm passes with zero warnings under Helm
4.1.4. helm template deploy/helm produces zero manifests.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
Two follow-up fixes from the triple-Opus review of the chart skeleton:

- .helmignore: add a sensitive-patterns block (.env, *.key, *.pem,
  *.crt, *.p12, *.pfx, *.kubeconfig, *.tfstate, *.tfvars, secrets.yaml,
  *-secret.yaml). Defense-in-depth: the chart directory has no leak
  surface today, but as soon as a contributor drops a stray credential
  file there for local debugging, helm package would tar it into the
  chart .tgz and helm push would publish it. Adding the patterns now
  is cheap; the alternative is a supply-chain incident later.

- Chart.yaml: remove the icon URL until the asset is confirmed hosted
  at https://zagrosi.com/icon.png. helm lint --strict now reports the
  icon as informational ("recommended"), not a failure. Re-add once
  the asset is published with a stable cache policy.

helm lint --strict deploy/helm passes (1 chart, 0 failed). Empty-chart
invariant unchanged: helm template deploy/helm still emits zero
manifests.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
Five workflows + a documented Rulesets API payload for the protected
main branch:

rust.yml
  Eight jobs: cargo fmt, dotenv lint, cargo clippy, cargo test (matrix
  with rust-test-summary aggregator for stable status name), cargo
  deny (replaces cargo audit via deny check advisories), cargo sbom
  (uses taiki-e/install-action to fetch a prebuilt cargo-cyclonedx
  binary instead of compiling from source), and compose smoke
  (invokes scripts/smoke-compose.sh end to end). Workspace-wide env
  sets RUSTFLAGS=-D warnings, RUST_BACKTRACE=1.

web.yml
  Three jobs: pnpm lint, typecheck, test. Each pins pnpm 11.0.8
  explicitly via pnpm/action-setup, then sets up Node 24 with cache:
  pnpm, then runs pnpm install --frozen-lockfile before the script.
  All three exit zero today (no real packages yet); the workflow is
  ready to grow.

helm-lint.yml
  Single job: helm lint via helm/chart-testing-action with helm 4.1.4
  and chart-testing 3.13.0 explicitly pinned. fetch-depth 0 so ct
  lint can diff against main. No paths filter (a path-filtered
  required check stays Expected forever and blocks merges).

dco.yml
  Pure-shell Signed-off-by trailer check that loops over every commit
  in the PR or push range. Untrusted github.event values routed via
  env: block, never interpolated into shell text. On indeterminate
  range the workflow exits 1 with an ::error:: annotation rather
  than silently passing. Status check is informational; branch
  protection requires the cncf/dco2 App's separate DCO check.

commitlint.yml
  Single job using wagoid/commitlint-github-action with explicit
  configFile: commitlint.config.mjs. Lints every commit in the PR
  range plus the PR title. fetch-depth 0 so the action sees history.

branch-protection.json + branch-protection.json.LICENSE
  Modern Rulesets API payload (not the legacy branch-protection
  API). enforcement: active, bypass_actors: empty (admins enforced),
  rules cover deletion, non_fast_forward, required_linear_history,
  pull_request (0 approving reviews, dismiss stale on push), and
  required_status_checks listing all 13 required contexts. Sidecar
  .LICENSE file carries the SPDX header per the REUSE specification
  so the JSON itself stays a clean Rulesets API payload.

Cross-cutting:
- Every uses: reference is a 40-character SHA with a trailing # vTag
  comment.
- Every workflow declares minimal permissions (contents: read
  baseline; pull-requests: read only on commitlint).
- Every workflow declares concurrency: with cancel-in-progress only
  on pull_request events (push-to-main runs preserved for
  post-merge validation).
- Every job declares timeout-minutes:.
- actionlint clean. SHA-pin grep clean.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
…y policy

Repo-hygiene templates and community-health files.

.github/ISSUE_TEMPLATE/
  bug.yml, feature.yml, design-feedback.yml, config.yml. Each form
  carries the canonical 19-scope set or a sensible subset. Required
  fields use validations: required: true. design-feedback.yml omits
  a dropdown default so contributors must explicitly pick an area
  rather than mis-tagging issues to identity.

.github/PULL_REQUEST_TEMPLATE.md
  Six sections in order: Summary, Linked issue, Type of change (8
  Conventional Commits prefixes), Scope (19 entries from
  CONTRIBUTING.md allowlist), Test plan, Checklist (DCO sign-off,
  Conventional Commits subjects, status checks running locally,
  prose meets writing standards with a CONTRIBUTING.md link, no new
  dependencies without a clear reason).

CODE_OF_CONDUCT.md
  Verbatim Contributor Covenant 2.1 with [INSERT CONTACT METHOD]
  substituted to conduct@zagrosi.com. No SPDX header per the
  third-party-text exemption. Byte-identical to the canonical
  source aside from the contact substitution.

SECURITY.md
  Five sections: Reporting a vulnerability, Supported versions,
  Coordinated disclosure, Recognition, Scope. Standard 90-day
  disclosure window. Receipt acknowledged within five business
  days with a target of 72 hours. GitHub Security Advisories are
  the alternative private disclosure channel. Self-hosted instances
  and third-party dependencies are explicitly out of scope.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
Final aggregation of the foundation phase.

documentation/governance.md
  Nine-section governance manual: branch protection (matches the
  Rulesets payload exactly, including the 13 required status checks
  and the bootstrap-and-postmerge sync procedure), release cadence
  (manual seven-step procedure, release-tooling migration path,
  lockfile-conflict policy, deprecation procedure), issue triage
  (severity-to-priority mapping, decision tree, stale-issue policy),
  maintainers (probationary period, off-boarding, bus factor),
  voting (categories, procedure, tie-break, worked examples),
  license posture (DCO not CLA, SPDX coverage, license-aware
  substitutions, contributor copyright), Code of Conduct
  enforcement (outcome catalogue, appeals, recusal, worked
  examples), security disclosure (90-day window, embargo via
  GitHub Temporary Private Forks, supported-versions transition
  matrix, worked example), and trademark (permitted and restricted
  uses, logo licence, worked examples, future custodianship).

documentation/CHANGELOG.md
  Keep a Changelog 1.1.0 format with SemVer workspace-wide. The
  [0.1.0] - 2026-05-08 entry enumerates every deliverable from the
  foundation phase, grouped by category: workspace and tooling,
  foundation library and apps, JavaScript workspace, dev
  infrastructure, Helm chart, CI, repo hygiene and community-health
  files, and public-facing documentation. Comparison-link footers
  point at the canonical repository.

README.md
  Stack table: PostgreSQL row updated to 18.
  Comparison table: price-range punctuation cleaned up.

CONTRIBUTING.md unchanged (verified em-dash, en-dash, and
prose-style clean at the close of the foundation phase).

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
Three independent fixes to make the foundation-phase PR's required
checks green:

commitlint.config.mjs
  Bump body-max-line-length from 100 to 200. The default 100-char
  limit forces hard line-wrapping in commit-message bodies that
  fights against natural prose paragraphs; 200 catches truly
  problematic bodies (single unwrapped lines exceeding two screen
  widths) while permitting the wrapped paragraphs the foundation
  commits already use.

apps/api-gateway/Cargo.toml
  Add an explicit version field to the zagrosi-core path dependency.
  Path dependencies without a version are flagged as wildcards by
  cargo-deny's bans rule, which the workspace lint set forbids. The
  version field also allows future publication of the api-gateway
  crate without a follow-up Cargo.toml edit.

.github/workflows/rust.yml
  Fix the cargo-sbom job's artifact-upload glob. cargo-cyclonedx
  0.5.9 emits per-crate <crate>.cdx.json files (not bom.json), so
  the upload-artifact step's path: '**/bom.json' matched zero files
  and tripped the if-no-files-found: error guard. The glob is now
  '**/*.cdx.json', which matches both crates/zagrosi-core/zagrosi-core.cdx.json
  and apps/api-gateway/api-gateway.cdx.json.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30318d5126

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/branch-protection.json Outdated
Two follow-up fixes from the foundation-PR review:

.github/branch-protection.json
  Replace the required `DCO` context with the project's own
  `dco / dco` workflow context. The previous configuration assumed
  the cncf/dco2 GitHub App was installed; without the App, GitHub
  produces no `DCO` context and the required-check rule blocks
  every merge into main. The pure-shell Signed-off-by trailer check
  in `.github/workflows/dco.yml` provides the same guarantee
  without any external app dependency. The cncf/dco2 App remains
  supported as an additional layer when installed; its `DCO`
  context is no longer required.

commitlint.config.mjs
  Disable the body-max-line-length rule (level 0). Conventional
  commit bodies in this project use natural-prose paragraphs that
  routinely exceed 100 and 200 character limits. Hard-wrapping
  paragraphs at fixed column widths fights against the body
  format rather than improving it.

documentation/governance.md
  Update §1 prose to reflect the swap: all thirteen required
  checks now come from project workflows. The cncf/dco2 App is
  documented as supported-but-optional.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
Bare top-level types like `ci:` and `docs:` are valid Conventional
Commits subjects per the spec; the scope segment is optional. The
project's commitlint config previously required a scope on every
commit (`scope-empty: [2, never]`), which is stricter than the spec
and rejects perfectly conformant commit subjects. Demoting the rule
to level 0 (disabled) leaves the scope-enum allowlist enforced for
commits that DO declare a scope, while permitting bare-type subjects.

Signed-off-by: MicrosoftWindows96 <spam@zagrosi.com>
@zagrosi-code zagrosi-code merged commit 37326a7 into main May 8, 2026
14 checks passed
@zagrosi-code zagrosi-code deleted the feature/zg-1-foundation-setup branch May 8, 2026 12:53
@zagrosi-code zagrosi-code mentioned this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant