Skip to content

fix(jangar): harden control-plane dependency health#5576

Merged
gregkonush merged 7 commits intomainfrom
codex/swarm-jangar-control-plane-verify
May 5, 2026
Merged

fix(jangar): harden control-plane dependency health#5576
gregkonush merged 7 commits intomainfrom
codex/swarm-jangar-control-plane-verify

Conversation

@gregkonush
Copy link
Copy Markdown
Member

@gregkonush gregkonush commented May 5, 2026

Summary

  • Raises the Jangar Torghut /trading/status timeout default from 5s to 15s and sets the same value in the GitOps deployment.
  • Adds checked-out Postgres client error handling so transient pg disconnect events log instead of terminating the Jangar process.
  • Keeps the DB hardening modular by moving Postgres SSL/error helpers out of the oversized db.ts, refreshes the generated architecture inventory, and adds quant health lookup indexes.
  • Skips unscoped Torghut quant pipeline-health reads so /api/torghut/trading/control-plane/quant/health?window=1h remains a cheap rollout probe; scoped account plus window requests still read indexed stage health.

Related Issues

None

Testing

  • PASS: bun install --frozen-lockfile --ignore-scripts
  • PASS: bun run build from packages/otel
  • PASS: bun run build from packages/temporal-bun-sdk
  • PASS: git diff --check origin/main...HEAD
  • PASS: bunx oxfmt --check services/jangar/src/routes/api/torghut/trading/control-plane/quant/health.ts services/jangar/src/routes/api/torghut/trading/control-plane/quant/-health.test.ts services/jangar/src/server/db.ts services/jangar/src/server/__tests__/db.test.ts services/jangar/src/server/postgres-client-errors.ts services/jangar/src/server/postgres-ssl.ts services/jangar/src/server/kysely-migrations.ts services/jangar/src/server/__tests__/kysely-migrations.test.ts services/jangar/src/server/torghut-quant-metrics-store.ts services/jangar/src/server/__tests__/torghut-quant-metrics-store.test.ts services/jangar/src/server/migrations/20260505_torghut_quant_metrics_latest_account_window_index.ts services/jangar/src/server/migrations/20260505_torghut_quant_pipeline_health_account_window_asof_index.ts docs/jangar/architecture-inventory.md
  • PASS: bunx vitest run --config vitest.config.ts src/routes/api/torghut/trading/control-plane/quant/-health.test.ts src/server/__tests__/db.test.ts src/server/__tests__/kysely-migrations.test.ts src/server/__tests__/torghut-quant-metrics-store.test.ts src/server/__tests__/control-plane-config.test.ts src/server/__tests__/control-plane-empirical-services.test.ts from services/jangar
  • PASS: bun run docs:inventory:check from services/jangar
  • PASS: bun run check:module-sizes from services/jangar
  • PASS: bun run lint from services/jangar
  • PASS: bun run lint:oxlint from services/jangar with existing warnings only
  • PASS: bun run lint:oxlint:type from services/jangar with existing warnings only
  • PASS: bun run tsc from services/jangar
  • PASS: bun run build from services/jangar
  • NOTE: local full-suite bun run test from services/jangar still times out unrelated 5s default tests in this low-CPU runner after the targeted quant and DB suites pass independently; GitHub CI is the merge gate before squash merge.

Screenshots (if applicable)

N/A

Breaking Changes

None

Checklist

  • Testing section documents the exact validation performed (or N/A with justification).
  • Screenshots and Breaking Changes sections are handled appropriately (removed or filled in).
  • Documentation, release notes, and follow-ups are updated or tracked.

@gregkonush
Copy link
Copy Markdown
Member Author

gregkonush commented May 5, 2026

Marco release final 2026-05-05 23:26 UTC

Merge gate: complete. #5576 merged at 2026-05-05T23:07:18Z as d17e045 after all required checks were pass/skipped, including agents-ci integration in 14m02s.

Rollout evidence:

  • chore(jangar): promote image d17e0457 #5581 promoted image tag d17e045 with digest sha256:90b75799cea530ef9b02736db3ae122d28c6ca4448a92e9305c27cf1b9ff7309.
  • fix(jangar): break torghut status health cycle #5580 merged the final GitOps env fix to point Jangar at Torghut /trading/autonomy with a 3000 ms timeout.
  • Latest post-deploy verifier passed for fix(jangar): break torghut status health cycle #5580 after deployment health/digest verification and Temporal routing sync.
  • Live pod jangar-5899b7c8d9-g9wfs is 2/2 Running with zero restarts; app image ID matches sha256:90b75799cea530ef9b02736db3ae122d28c6ca4448a92e9305c27cf1b9ff7309.
  • /health returns ok; control-plane status reports database healthy and empirical-service endpoint values using /trading/autonomy.
  • Torghut /trading/autonomy returns HTTP 200 in ~16 ms; quant health ?window=1h skips unscoped pipeline reads as intended.

Residual risk: dependency quorum is still blocked by stale empirical jobs, and Torghut /readyz still returns 503. This is a runtime dependency backlog, not a failed Jangar rollout.

@gregkonush gregkonush changed the title fix(jangar): extend torghut status timeout fix(jangar): harden control-plane dependency health May 5, 2026
@gregkonush gregkonush force-pushed the codex/swarm-jangar-control-plane-verify branch from 6d2b581 to e755ff7 Compare May 5, 2026 22:33
@gregkonush gregkonush merged commit d17e045 into main May 5, 2026
25 checks passed
@gregkonush gregkonush deleted the codex/swarm-jangar-control-plane-verify branch May 5, 2026 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant