Skip to content

ci: no-docker mock money-path lane + fix silent-pass orchestrator exit#3

Draft
shroominic wants to merge 1 commit into
mainfrom
fix/mock-money-ci-lane
Draft

ci: no-docker mock money-path lane + fix silent-pass orchestrator exit#3
shroominic wants to merge 1 commit into
mainfrom
fix/mock-money-ci-lane

Conversation

@shroominic

Copy link
Copy Markdown

Problem

routstr-testing is a working full-stack E2E harness but has no CI (no .github/ on main). The intended money-path guard is to drive a scenario through runner.orchestrate — and building that guard surfaced a latent CI-integrity bug: orchestrate.main() returned exit 0 regardless of the scenario outcome. A scenario whose tests FAIL (status=failed) or ERROR still exited 0, so a money-path CI lane would stay green while the swap/refund logic under test was red — the silent-pass / false-safety failure mode.

Verified before fix: SKIP_SYNC=1 python -m runner.orchestrate --scenario smoke_fail → exit 0.

Root cause

main() returned the run_id and unconditionally return 0; the run's real outcome lived only in the runs.db row and was never reflected in the process exit code.

Fix (surgical)

  • runner/orchestrate.py: main() reads the finalized run status from runs.db (new _read_run_status) and returns exit 1 on failed/error. The JSON summary gains an additive status field; orchestrate()'s run_id contract is unchanged.
  • server/runs.py: spawn_orchestrator tolerates the new non-zero exit — a recorded run still returns its id (UI shows status=failed); a non-zero exit with no run_id still raises OrchestratorError.

CI (.github/workflows/ci.yml, docs/ci.md)

  • pr lane (every PR/push, no Docker, no funds): ruff + mypy (scoped to the money-path runner/ package) + unit/mock pytest + two orchestrator guards — smoke MUST exit 0, smoke_fail MUST exit non-zero (inverted, so a silent-pass fails the build).
  • nightly lane (cron/manual, never on PRs): docker make up/down stub; the foreign-mint swap+retry scenarios are commented out until test: e2e foreign-mint swap + retry (covers routstr-core #549) #1 merges (pin ROUTSTR_CORE_REF=refs/pull/549/head for the retry scenario until core#549 lands).

Tooling made hermetic (pinned ruff/mypy in a dev group); cleared 10 pre-existing ruff errors so the lint gate is meaningful.

Test evidence

  • tests/test_orchestrate_exit_code.py: passed→0, failed→non-zero, error→non-zero. With the fix neutralized, the failed/error cases FAIL (2 failed, 1 passed); with the fix, 3 passed. (smoke_fail is a genuine assert 1 == 2, not a skip — a real negative control.)
  • Full pr-lane: ruff clean, mypy clean, 86 passed / 73 skipped; guards: smoke exit 0, smoke_fail exit 1.

Risk

Low–medium. The one behavior change (a failing UI scenario now exits non-zero) is handled so the run_id is still returned, pinned by a regression test. Nightly docker money scenarios are stubbed pending #1.

Follow-ups (not in this PR)

Independently adversarially reviewed: root cause + non-tautological regression tests reproduced, verdict solid / ready. Opened as draft for maintainer review.

🤖 Generated with Claude Code

… exit

Adds a two-lane GitHub Actions workflow (the repo had no CI at all) and
fixes the bug that would have made the money-path lane a false-safety net.

Root cause: runner.orchestrate.main() returned 0 regardless of the
scenario outcome — a scenario whose tests FAILED (status=failed) or
ERRORED (status=error) still exited 0. A CI guard that drives the
orchestrator (the intended money-path gate) would stay green while the
swap/refund logic under test was red — the silent-pass failure mode.

Fix (surgical):
- main() now reads the finalized run status from runs.db and exits 1 on
  failed/error; the JSON summary gains a "status" field (additive).
- server.runs.spawn_orchestrator tolerates the new non-zero exit: a run
  that was recorded (run_id summary present) still returns its id so the
  UI shows status=failed; a non-zero exit with no run_id still raises.

CI lanes (.github/workflows/ci.yml, docs/ci.md):
- pr: no Docker, no funds. ruff + mypy(runner/) + unit/mock pytest +
  orchestrator pipeline guards (smoke MUST exit 0; smoke_fail MUST exit
  non-zero, inverted so a silent-pass fails the build). The mock money
  guard tests/integration/test_spend_unit.py runs here.
- nightly: docker lane stub (make up/down) with the foreign-mint swap +
  retry scenario steps commented out until #1
  merges (it adds those services_required scenarios; not on main yet).

Tooling made hermetic: pinned ruff/mypy/types-PyYAML in a [dependency-
groups] dev group; [tool.ruff]/[tool.mypy] config added (mypy scoped to
the money-path runner/ package — server/ SQLModel typing is out of scope).
Cleared 10 pre-existing ruff lint errors so the lint gate is meaningful.

Regression tests:
- tests/test_orchestrate_exit_code.py — passed→0, failed→non-zero,
  error→non-zero (failed/error cases fail without the fix).
- tests/test_server_orchestrator_exit.py — server returns run_id on
  non-zero-with-run_id, raises on non-zero-without-run_id.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@9qeklajc

9qeklajc commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Took a proper pass through this (plus a second independent review) — the core change is solid. The exit-code bug is real and worth fixing: a scenario that fails/errors but still exits 0 makes the money-path lane false-green, which defeats the whole point of the harness. Reading the final status from runs.db and exiting non-zero is the right call, and the regression tests actually fail without the fix, so they're not just decoration. Nice.

One thing I'd genuinely fix before merge, then a couple of things I'd rather see split out so this doesn't turn into a mega-PR.

Fix before merge — the suite isn't hermetic against OPENAI_API_KEY. test_local_real_requires_api_key calls _resolve(sc, env={}) expecting no key, but _resolve_upstream overlays {**os.environ, ...} (orchestrate.py:338), so an ambient key leaks in. I reproduced it: OPENAI_API_KEY=dummy uv run pytest -k requires_api_key and the test no longer raises. It's pre-existing brittleness, but now that this PR adds a lane that runs the suite, it can flake on a runner that happens to have the var set. Easiest fix is monkeypatch.delenv("OPENAI_API_KEY", raising=False), or make _resolve_upstream not overlay os.environ when an explicit env is passed.

Could you split this into two PRs? Right now it's bundling three concerns and I think it'd review/land faster apart:

  1. ruff + mypy tooling on its own — and while you're in there, bump them. ruff==0.1.7 and mypy==1.9.0 are both from late 2023/early 2024; pinning the gate to rules that old means everyone on a modern ruff locally sees different results than CI. Latest versions please. This PR also quietly drops some greenlet platform wheels (s390x/riscv64) from uv.lock from the regen — fine to keep here, just call it out.
  2. the test hermeticity fix above, on its own.

That leaves this PR as purely "fix the silent-pass exit code + wire the CI lanes," which is the part I'm happy to approve as-is.

Couple of non-blockers, your call:

  • Exit mapping is fail-open — unknown/missing status falls through to 0. passed -> 0, everything else -> non-zero would be safer if a run ever lands in an unexpected state.
  • mypy is scoped to runner/ only, so the one source change in this PR (server/runs.py) isn't actually type-checked. It's got a dedicated test so not a big deal, just noting it.

tl;dr: pull the tooling bump and the test fix into their own PRs, land the env fix, and the exit-code fix here is good to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants