Skip to content

feat(jangar): surface failure-domain lease holdbacks#5454

Merged
gregkonush merged 2 commits intomainfrom
codex/swarm-jangar-control-plane
May 5, 2026
Merged

feat(jangar): surface failure-domain lease holdbacks#5454
gregkonush merged 2 commits intomainfrom
codex/swarm-jangar-control-plane

Conversation

@gregkonush
Copy link
Copy Markdown
Member

@gregkonush gregkonush commented May 5, 2026

Summary

  • Added Phase 0 shadow failure_domain_leases to Jangar control-plane status, with typed per-domain leases, a lease-set digest, rollback targets, evidence refs, and per-action holdback decisions.
  • Synthesizes database, route, rollout, registry, storage, workflow artifact, NATS runtime-kit, and source-schema leases from existing probes plus read-only Kubernetes pod/event evidence.
  • Renders the lease set in the control-plane status UI and documents operator/deployer validation in the Jangar README, agent runbook, and governing design.
  • Governing design for this PR: docs/agents/designs/75-jangar-failure-domain-leases-and-database-routability-holdbacks-2026-05-05.md.
  • Runtime requirement provenance: latest general NATS context selected the P0 launcher-admission gap governed by docs/agents/designs/61-jangar-runtime-kits-and-admission-passports-contract-2026-03-20.md and extended by docs/agents/designs/65-jangar-recovery-warrants-and-runtime-proof-cells-contract-2026-03-21.md; that launcher gate landed in fix(jangar): gate swarm schedule runners by admission passport #5444, and this PR carries the next accepted runtime evidence surface for failure-domain holdbacks.

Related Issues

None

Testing

  • PASS: bun install --frozen-lockfile --ignore-scripts
  • PASS: bun run test -- src/server/__tests__/supporting-primitives-controller.test.ts src/server/__tests__/control-plane-runtime-admission.test.ts src/server/__tests__/control-plane-status.test.ts src/server/__tests__/control-plane-failure-domain-leases.test.ts src/components/__tests__/agents-control-plane-status.test.tsx src/server/__tests__/kube-gateway.test.ts from services/jangar (6 files, 87 tests)
  • PASS: bun run lint from services/jangar
  • PASS: bunx tsc --noEmit --project tsconfig.paths.json from services/jangar
  • PASS: bun run docs:inventory:check from services/jangar
  • PASS: bun run lint:oxlint from services/jangar (0 errors; 85 existing warnings)
  • PASS: bun run lint:oxlint:type from services/jangar (0 errors; 284 existing warnings)
  • PASS: bun run check:module-sizes from services/jangar
  • PASS: bun run build from services/jangar
  • PASS: git diff --check origin/main...HEAD
  • PASS: gh pr checks 5454 --repo proompteng/lab --watch on head bc7d84624e5f9336d4d17b9b27a56b67865ec42d
  • PASS: GitHub jangar-ci / lint-and-typecheck / run
  • PASS: GitHub agents-ci / validate
  • PASS: GitHub agents-ci / integration (rerun passed in 29m1s after the first attempt was canceled while stuck without logs in local image build/preload)
  • PASS: GitHub post-merge Lint commit messages
  • PASS: Codex review posted no major issues at 2026-05-05T22:47:41Z, and GraphQL reports zero review threads

Screenshots (if applicable)

N/A. Status UI rendering is covered by services/jangar/src/components/__tests__/agents-control-plane-status.test.tsx.

Breaking Changes

None. The lease set is additive and mode="shadow"; no AgentRun admission or deploy widening enforcement changes in this PR.

Risk, Rollback, and Handoff

  • Risk: lease synthesis is advisory in this phase; consumers could ignore it until later enforcement. Mitigation: the status payload and UI expose lease_set_digest, per-action holdbacks, and reason codes for deployer evidence.
  • Risk: Kubernetes pod/event RBAC gaps can reduce evidence quality. Mitigation: collection failures are captured as evidence errors and the lease surface remains additive.
  • Rollback: revert merge commit a4c52389366f74e58e0adfaa68be2e2c70e0dbab or ignore failure_domain_leases; because enforcement is not enabled, rollback does not require deleting evidence or changing admission paths.
  • Launcher-admission rollback remains the design-approved existing knob from fix(jangar): gate swarm schedule runners by admission passport #5444: set JANGAR_SWARM_RUNTIME_ADMISSION_ENFORCEMENT=false to return launcher admission to advisory mode while preserving passport projections.
  • Large-diff gate: this PR is 1,542 changed lines (1,538 additions and 4 deletions). Codex review posted no major issues after the final rebase, and review-thread inspection found zero threads.
  • Merge state: feat(jangar): surface failure-domain lease holdbacks #5454 was squash-merged at 2026-05-05T23:39:57Z as merge commit a4c52389366f74e58e0adfaa68be2e2c70e0dbab after required/visible checks were green.

Checklist

  • Testing section documents the exact validation performed.
  • Screenshots and Breaking Changes sections are handled appropriately.
  • Documentation, release notes, and follow-ups are updated or tracked.

@gregkonush
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@gregkonush
Copy link
Copy Markdown
Member Author

gregkonush commented May 5, 2026

Marco release gate as of 2026-05-06T00:10Z: PR #5454 is merged and Jangar rollout-verified.

  • Merge outcome: feat(jangar): surface failure-domain lease holdbacks is merged at a4c52389366f74e58e0adfaa68be2e2c70e0dbab. I did not initiate the merge; I verified the completed gate.
  • Pre-merge gate: required checks were green or intentionally skipped on the PR head, review threads were empty, and Codex review had posted with no major issues before merge despite the >1000-line diff.
  • Build and promotion: jangar-build-push run 25408344788 succeeded and produced tag a4c52389, runtime digest sha256:8cd77a9dad5d0d9654b0c6131e2def641d1081c206d8a48d72671b4b679da7c3, and control-plane digest sha256:d47c8eac9c382605f6771818772e224a84259f6ea808098e37f56c5eb0d4209c. jangar-release run 25408676322 generated promotion PR chore(jangar): promote image a4c52389 #5587.
  • Rollout evidence: chore(jangar): promote image a4c52389 #5587 merged at c3c5e7633f0d88b7b374174d345eae6f13f3129c; post-merge argo-lint, kubeconform, and jangar-post-deploy-verify are green. The post-deploy job reported Argo sync=Synced health=Healthy revision=c3c5e7633f0d88b7b374174d345eae6f13f3129c.
  • Runtime evidence: the Jangar pod is Running with image registry.ide-newton.ts.net/lab/jangar:a4c52389@sha256:8cd77a9dad5d0d9654b0c6131e2def641d1081c206d8a48d72671b4b679da7c3, and /health returns status=ok.
  • Rollback path: revert promotion commit c3c5e7633f0d88b7b374174d345eae6f13f3129c, or revert source commit a4c52389366f74e58e0adfaa68be2e2c70e0dbab if failure-domain lease holdback behavior regresses.

@gregkonush
Copy link
Copy Markdown
Member Author

@codex review

Retrying once after GitHub API quota recovered; merge remains blocked unless a Codex review is posted and all threads are resolved.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@gregkonush
Copy link
Copy Markdown
Member Author

@codex review

Re-requesting after rebasing #5454 onto current main at e1a852c. Large-diff gate remains active; do not merge until Codex review is posted and all threads are resolved, or a maintainer explicitly waives the gate.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@gregkonush gregkonush force-pushed the codex/swarm-jangar-control-plane branch from e1a852c to bc7d846 Compare May 5, 2026 22:42
@gregkonush
Copy link
Copy Markdown
Member Author

@codex review

Re-requesting after rebasing #5454 onto current main at bc7d846. Large-diff gate remains active; do not merge until Codex review is posted and all threads are resolved, or a maintainer explicitly waives the gate.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@gregkonush gregkonush merged commit a4c5238 into main May 5, 2026
20 of 21 checks passed
@gregkonush gregkonush deleted the codex/swarm-jangar-control-plane branch May 5, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant