ci: no-docker mock money-path lane + fix silent-pass orchestrator exit by shroominic · Pull Request #3 · Routstr/routstr-testing

shroominic · 2026-06-19T07:33:44Z

Problem

routstr-testing is a working full-stack E2E harness but has no CI (no .github/ on main). The intended money-path guard is to drive a scenario through runner.orchestrate — and building that guard surfaced a latent CI-integrity bug: orchestrate.main() returned exit 0 regardless of the scenario outcome. A scenario whose tests FAIL (status=failed) or ERROR still exited 0, so a money-path CI lane would stay green while the swap/refund logic under test was red — the silent-pass / false-safety failure mode.

Verified before fix: SKIP_SYNC=1 python -m runner.orchestrate --scenario smoke_fail → exit 0.

Root cause

main() returned the run_id and unconditionally return 0; the run's real outcome lived only in the runs.db row and was never reflected in the process exit code.

Fix (surgical)

runner/orchestrate.py: main() reads the finalized run status from runs.db (new _read_run_status) and returns exit 1 on failed/error. The JSON summary gains an additive status field; orchestrate()'s run_id contract is unchanged.
server/runs.py: spawn_orchestrator tolerates the new non-zero exit — a recorded run still returns its id (UI shows status=failed); a non-zero exit with no run_id still raises OrchestratorError.

CI (`.github/workflows/ci.yml`, `docs/ci.md`)

pr lane (every PR/push, no Docker, no funds): ruff + mypy (scoped to the money-path runner/ package) + unit/mock pytest + two orchestrator guards — smoke MUST exit 0, smoke_fail MUST exit non-zero (inverted, so a silent-pass fails the build).
nightly lane (cron/manual, never on PRs): docker make up/down stub; the foreign-mint swap+retry scenarios are commented out until test: e2e foreign-mint swap + retry (covers routstr-core #549) #1 merges (pin ROUTSTR_CORE_REF=refs/pull/549/head for the retry scenario until core#549 lands).

Tooling made hermetic (pinned ruff/mypy in a dev group); cleared 10 pre-existing ruff errors so the lint gate is meaningful.

Test evidence

tests/test_orchestrate_exit_code.py: passed→0, failed→non-zero, error→non-zero. With the fix neutralized, the failed/error cases FAIL (2 failed, 1 passed); with the fix, 3 passed. (smoke_fail is a genuine assert 1 == 2, not a skip — a real negative control.)
Full pr-lane: ruff clean, mypy clean, 86 passed / 73 skipped; guards: smoke exit 0, smoke_fail exit 1.

Risk

Low–medium. The one behavior change (a failing UI scenario now exits non-zero) is handled so the run_id is still returned, pinned by a regression test. Nightly docker money scenarios are stubbed pending #1.

Follow-ups (not in this PR)

Merge test: e2e foreign-mint swap + retry (covers routstr-core #549) #1 (test/foreign-mint-swap, mergeable) to add the real swap_foreign_mint regression scenarios, then uncomment the nightly steps.
Delete the stale feat/remote-capable-scenarios branch (main is strictly ahead).

Independently adversarially reviewed: root cause + non-tautological regression tests reproduced, verdict solid / ready. Opened as draft for maintainer review.

🤖 Generated with Claude Code

… exit Adds a two-lane GitHub Actions workflow (the repo had no CI at all) and fixes the bug that would have made the money-path lane a false-safety net. Root cause: runner.orchestrate.main() returned 0 regardless of the scenario outcome — a scenario whose tests FAILED (status=failed) or ERRORED (status=error) still exited 0. A CI guard that drives the orchestrator (the intended money-path gate) would stay green while the swap/refund logic under test was red — the silent-pass failure mode. Fix (surgical): - main() now reads the finalized run status from runs.db and exits 1 on failed/error; the JSON summary gains a "status" field (additive). - server.runs.spawn_orchestrator tolerates the new non-zero exit: a run that was recorded (run_id summary present) still returns its id so the UI shows status=failed; a non-zero exit with no run_id still raises. CI lanes (.github/workflows/ci.yml, docs/ci.md): - pr: no Docker, no funds. ruff + mypy(runner/) + unit/mock pytest + orchestrator pipeline guards (smoke MUST exit 0; smoke_fail MUST exit non-zero, inverted so a silent-pass fails the build). The mock money guard tests/integration/test_spend_unit.py runs here. - nightly: docker lane stub (make up/down) with the foreign-mint swap + retry scenario steps commented out until #1 merges (it adds those services_required scenarios; not on main yet). Tooling made hermetic: pinned ruff/mypy/types-PyYAML in a [dependency- groups] dev group; [tool.ruff]/[tool.mypy] config added (mypy scoped to the money-path runner/ package — server/ SQLModel typing is out of scope). Cleared 10 pre-existing ruff lint errors so the lint gate is meaningful. Regression tests: - tests/test_orchestrate_exit_code.py — passed→0, failed→non-zero, error→non-zero (failed/error cases fail without the fix). - tests/test_server_orchestrator_exit.py — server returns run_id on non-zero-with-run_id, raises on non-zero-without-run_id. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

9qeklajc · 2026-06-19T21:34:03Z

Took a proper pass through this (plus a second independent review) — the core change is solid. The exit-code bug is real and worth fixing: a scenario that fails/errors but still exits 0 makes the money-path lane false-green, which defeats the whole point of the harness. Reading the final status from runs.db and exiting non-zero is the right call, and the regression tests actually fail without the fix, so they're not just decoration. Nice.

One thing I'd genuinely fix before merge, then a couple of things I'd rather see split out so this doesn't turn into a mega-PR.

Fix before merge — the suite isn't hermetic against OPENAI_API_KEY. test_local_real_requires_api_key calls _resolve(sc, env={}) expecting no key, but _resolve_upstream overlays {**os.environ, ...} (orchestrate.py:338), so an ambient key leaks in. I reproduced it: OPENAI_API_KEY=dummy uv run pytest -k requires_api_key and the test no longer raises. It's pre-existing brittleness, but now that this PR adds a lane that runs the suite, it can flake on a runner that happens to have the var set. Easiest fix is monkeypatch.delenv("OPENAI_API_KEY", raising=False), or make _resolve_upstream not overlay os.environ when an explicit env is passed.

Could you split this into two PRs? Right now it's bundling three concerns and I think it'd review/land faster apart:

ruff + mypy tooling on its own — and while you're in there, bump them. ruff==0.1.7 and mypy==1.9.0 are both from late 2023/early 2024; pinning the gate to rules that old means everyone on a modern ruff locally sees different results than CI. Latest versions please. This PR also quietly drops some greenlet platform wheels (s390x/riscv64) from uv.lock from the regen — fine to keep here, just call it out.
the test hermeticity fix above, on its own.

That leaves this PR as purely "fix the silent-pass exit code + wire the CI lanes," which is the part I'm happy to approve as-is.

Couple of non-blockers, your call:

Exit mapping is fail-open — unknown/missing status falls through to 0. passed -> 0, everything else -> non-zero would be safer if a run ever lands in an unexpected state.
mypy is scoped to runner/ only, so the one source change in this PR (server/runs.py) isn't actually type-checked. It's got a dedicated test so not a big deal, just noting it.

tl;dr: pull the tooling bump and the test fix into their own PRs, land the env fix, and the exit-code fix here is good to go.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: no-docker mock money-path lane + fix silent-pass orchestrator exit#3

ci: no-docker mock money-path lane + fix silent-pass orchestrator exit#3
shroominic wants to merge 1 commit into
mainfrom
fix/mock-money-ci-lane

shroominic commented Jun 19, 2026

Uh oh!

9qeklajc commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shroominic commented Jun 19, 2026

Problem

Root cause

Fix (surgical)

CI (.github/workflows/ci.yml, docs/ci.md)

Test evidence

Risk

Follow-ups (not in this PR)

Uh oh!

9qeklajc commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CI (`.github/workflows/ci.yml`, `docs/ci.md`)

9qeklajc commented Jun 19, 2026 •

edited

Loading