Distributed on PyPI as
quorum-gate; imported and invoked asquorum.
Quorum decides whether a code change is safe to keep. It is built around one idea: "it compiled" and "it passed all checks" are different claims, and only the second — backed by checks that can actually fail — is worth trusting.
You give it a list of independent check functions. Each one takes a throwaway
copy of your codebase and returns pass/fail with a reason. Some checks are
static ("does every file still compile?"), but the useful ones are
behavioral: they spin up a subprocess, import the candidate's modified code
in isolation, and actually run it — e.g. launch 16 threads at a spend tracker to
check for lost updates, or feed a tool_use/tool_result pair through a message
trimmer to confirm it's never split.
A change is promoted to your live files only if a quorum of checks agrees: it passes everything (or, in scored mode, strictly improves the score without breaking any check that was already passing). Otherwise it's discarded — and since it only ever touched a copy, there's nothing to roll back.
Two layers of isolation, and the second is the one that matters:
- Filesystem isolation — every check sees a fresh disposable copy. Nothing it writes survives or affects your real files.
- Process isolation — each check runs in its own subprocess with a wall
clock timeout. A broken candidate that infinite-loops, deadlocks 16 threads,
segfaults, runs out of memory, or hard-exits takes down a disposable child
and is reported back as
TIMEOUT/CRASHED. It cannot crash or pollute the verifier that is judging it.
A naive test runner imports candidate code into its own process — one bad candidate can then hang or corrupt the judge. Quorum never does this.
pip install quorum-gate # PyYAML is only needed if you use --config
# or, from a checkout:
pip install -e .The installed command is quorum; the import package is quorum.
A function check takes the path to the throwaway copy and returns a
CheckResult, a bool, or a (passed, reason[, score]) tuple. Register checks
on a module-level gate object:
# checks.py
import importlib, threading
from quorum import Gate, CheckResult, Outcome
gate = Gate()
@gate.check(name="spend_tracker_no_lost_updates", timeout_s=20)
def spend_tracker(codebase_path):
core = importlib.import_module("myapp.core")
tracker = core.SpendTracker()
threads = [threading.Thread(target=lambda: [tracker.add(1) for _ in range(5000)])
for _ in range(16)]
for t in threads: t.start()
for t in threads: t.join()
expected = 16 * 5000
ok = tracker.total == expected
return CheckResult("spend_tracker_no_lost_updates", ok,
"all increments recorded" if ok else f"lost {expected - tracker.total}",
Outcome.PASSED if ok else Outcome.FAILED,
score=tracker.total / expected)A shell check lets you reuse tools you already have (pytest, mypy,
ruff, a compile step), expressed in YAML:
# quorum.yaml
checks:
- name: types
command: "mypy myapp/"
timeout_s: 60
- name: unit_tests
command: "pytest -q && echo passed=1"
timeout_s: 120
score_from: passed # parses `passed=<number>` from stdout as the score# pass/fail mode: promote iff every check passes
quorum verify --candidate ./candidate --checks checks.py --config quorum.yaml
# scored mode: promote iff candidate strictly improves the total score
# without regressing any check that was passing on the baseline
quorum scored --candidate ./candidate --baseline ./live --checks checks.py
# actually copy a passing candidate over your live files
quorum verify --candidate ./candidate --checks checks.py --promote --live ./live(python -m quorum.cli ... works identically if you prefer not to rely on the
console script.)
Exit code is 0 if promoted, 1 otherwise — so it drops straight into CI or a
patch-proposing loop.
pass/fail — promote iff every check passes. Simple and strict.
scored — each check can return a numeric score. Quorum runs the checks
against your baseline (the current live code) first to learn which checks were
already passing and what the baseline score was, then runs them against the
candidate. It promotes only if the candidate's total score is strictly
greater and no previously-passing check now fails. This is the mode for
"make it better without making anything worse" — e.g. an optimizer or an
agent proposing patches.
from quorum import Gate
g = Gate()
g.add_function(my_check)
g.add_shell("tests", "pytest -q")
report = g.verify("./candidate")
print(report.summary())
if g.promote("./candidate", "./live", report):
print("shipped")Every check returns a reason, not just a verdict — that's the point. Outcomes
are PASSED, FAILED, TIMEOUT (exceeded its budget), CRASHED (process died
without a verdict), or ERROR (the check function itself raised).
quorum verify --candidate examples/candidate_fixed --checks examples/checks.py # promoted
quorum verify --candidate examples/live_code --checks examples/checks.py # rejected: real bugs
quorum verify --candidate examples/candidate_broken --checks examples/checks.py # one check times out, Quorum survives