Inspect and manage fakellm-assert frozen judgment snapshots from the terminal. Part of the fakellm family.
fakellm-assert freezes a judge's verdict about a fuzzy assertion (satisfies("apologizes for the delay")) to .fakellm/judgments/judgments.json and replays it forever. Those frozen verdicts are artifacts you review in a git diff — and once you have more than a handful, you want to look at them, sanity-check them, and clean them up without hand-editing JSON. That's what this CLI is for.
pip install fakellm-cliRequires Python 3.9+. fakellm-assert itself is not a hard dependency — the CLI reuses its types when they're importable but can also inspect a checked-in .fakellm/ store on a machine that only has the snapshots. Install them together with pip install fakellm-cli[assert] if you want both.
fakellm-cli list # every frozen verdict, with pass/fail counts
fakellm-cli list --verdict fail # just the failures
fakellm-cli show "apologizes" # one verdict in full: reasoning + response excerpt
fakellm-cli show aaaa1111 # (by fingerprint prefix or criterion substring)
fakellm-cli verify # integrity check: schema, verdict values, key match
fakellm-cli prune --verdict fail # preview removing all failing verdicts (dry run)
fakellm-cli prune --verdict fail --yes # actually remove them
fakellm-cli diff main/ feature/ # what changed between two snapshot dirs
fakellm-cli init # scaffold .fakellm/ and a conftest.py judge stubEvery read command takes --store PATH (pointing at either the judgments dir or the judgments.json file; default .fakellm/judgments) and --json for machine-readable output. Commands return a non-zero exit code on the condition you'd want to gate CI on: verify fails on integrity problems, diff fails when a verdict flipped pass↔fail.
There is deliberately no fakellm-cli rejudge. Two reasons, both structural:
- The store doesn't keep enough to re-judge. A frozen record holds only a 280-character excerpt of the response, not the full text. Re-judging needs the exact response to recompute the fingerprint — and that lives in your test, not in the snapshot.
- Re-judging is a live model call that belongs in a reviewed run.
fakellm-assert's whole point is that verdicts are produced exactly once, in an explicitpytest --fakellm-update, where a human reads the diff. A CLI that judged live would route around the one safety property the library exists to provide.
So the division of labor is: pytest --fakellm-update produces verdicts; fakellm-cli manages them. When verify or diff tells you a verdict is stale or wrong, the fix is to prune it here and re-judge in pytest there.
A fingerprint includes the response text, so the "same" assertion against a regenerated response has a different fingerprint by design. diff therefore pairs verdicts across two stores by (criterion, judge_model) so it can actually catch a pass→fail flip, rather than reporting every drifted response as an unrelated add+remove.
fakellm-cli init # once, to scaffold
# ... write satisfies() assertions, then:
pytest --fakellm-update # freeze verdicts (review the diff!)
fakellm-cli list # eyeball what got frozen
fakellm-cli verify # gate in CI alongside pytest
# later, when you intentionally change a prompt:
fakellm-cli prune --criterion "old wording" --yes
pytest --fakellm-update # re-freezeMIT