fakellm-cli

Inspect and manage fakellm-assert frozen judgment snapshots from the terminal. Part of the fakellm family.

fakellm-assert freezes a judge's verdict about a fuzzy assertion (satisfies("apologizes for the delay")) to .fakellm/judgments/judgments.json and replays it forever. Those frozen verdicts are artifacts you review in a git diff — and once you have more than a handful, you want to look at them, sanity-check them, and clean them up without hand-editing JSON. That's what this CLI is for.

pip install fakellm-cli

Requires Python 3.9+. fakellm-assert itself is not a hard dependency — the CLI reuses its types when they're importable but can also inspect a checked-in .fakellm/ store on a machine that only has the snapshots. Install them together with pip install fakellm-cli[assert] if you want both.

Commands

fakellm-cli list                         # every frozen verdict, with pass/fail counts
fakellm-cli list --verdict fail          # just the failures
fakellm-cli show "apologizes"            # one verdict in full: reasoning + response excerpt
fakellm-cli show aaaa1111                #   (by fingerprint prefix or criterion substring)
fakellm-cli verify                       # integrity check: schema, verdict values, key match
fakellm-cli prune --verdict fail         # preview removing all failing verdicts (dry run)
fakellm-cli prune --verdict fail --yes   #   actually remove them
fakellm-cli diff main/ feature/          # what changed between two snapshot dirs
fakellm-cli init                         # scaffold .fakellm/ and a conftest.py judge stub

Every read command takes --store PATH (pointing at either the judgments dir or the judgments.json file; default .fakellm/judgments) and --json for machine-readable output. Commands return a non-zero exit code on the condition you'd want to gate CI on: verify fails on integrity problems, diff fails when a verdict flipped pass↔fail.

What it does not do: re-judge

There is deliberately no fakellm-cli rejudge. Two reasons, both structural:

The store doesn't keep enough to re-judge. A frozen record holds only a 280-character excerpt of the response, not the full text. Re-judging needs the exact response to recompute the fingerprint — and that lives in your test, not in the snapshot.
Re-judging is a live model call that belongs in a reviewed run. fakellm-assert's whole point is that verdicts are produced exactly once, in an explicit pytest --fakellm-update, where a human reads the diff. A CLI that judged live would route around the one safety property the library exists to provide.

So the division of labor is: pytest --fakellm-update produces verdicts; fakellm-cli manages them. When verify or diff tells you a verdict is stale or wrong, the fix is to prune it here and re-judge in pytest there.

`diff` matches on criterion, not fingerprint

A fingerprint includes the response text, so the "same" assertion against a regenerated response has a different fingerprint by design. diff therefore pairs verdicts across two stores by (criterion, judge_model) so it can actually catch a pass→fail flip, rather than reporting every drifted response as an unrelated add+remove.

Typical workflow

fakellm-cli init                    # once, to scaffold
# ... write satisfies() assertions, then:
pytest --fakellm-update             # freeze verdicts (review the diff!)
fakellm-cli list                    # eyeball what got frozen
fakellm-cli verify                  # gate in CI alongside pytest
# later, when you intentionally change a prompt:
fakellm-cli prune --criterion "old wording" --yes
pytest --fakellm-update             # re-freeze

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
src/fakellm_cli		src/fakellm_cli
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fakellm-cli

Commands

What it does not do: re-judge

`diff` matches on criterion, not fingerprint

Typical workflow

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fakellm-cli

Commands

What it does not do: re-judge

diff matches on criterion, not fingerprint

Typical workflow

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`diff` matches on criterion, not fingerprint

Packages