Skip to content

1dg618/fakellm-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fakellm-cli

Inspect and manage fakellm-assert frozen judgment snapshots from the terminal. Part of the fakellm family.

fakellm-assert freezes a judge's verdict about a fuzzy assertion (satisfies("apologizes for the delay")) to .fakellm/judgments/judgments.json and replays it forever. Those frozen verdicts are artifacts you review in a git diff — and once you have more than a handful, you want to look at them, sanity-check them, and clean them up without hand-editing JSON. That's what this CLI is for.

pip install fakellm-cli

Requires Python 3.9+. fakellm-assert itself is not a hard dependency — the CLI reuses its types when they're importable but can also inspect a checked-in .fakellm/ store on a machine that only has the snapshots. Install them together with pip install fakellm-cli[assert] if you want both.

Commands

fakellm-cli list                         # every frozen verdict, with pass/fail counts
fakellm-cli list --verdict fail          # just the failures
fakellm-cli show "apologizes"            # one verdict in full: reasoning + response excerpt
fakellm-cli show aaaa1111                #   (by fingerprint prefix or criterion substring)
fakellm-cli verify                       # integrity check: schema, verdict values, key match
fakellm-cli prune --verdict fail         # preview removing all failing verdicts (dry run)
fakellm-cli prune --verdict fail --yes   #   actually remove them
fakellm-cli diff main/ feature/          # what changed between two snapshot dirs
fakellm-cli init                         # scaffold .fakellm/ and a conftest.py judge stub

Every read command takes --store PATH (pointing at either the judgments dir or the judgments.json file; default .fakellm/judgments) and --json for machine-readable output. Commands return a non-zero exit code on the condition you'd want to gate CI on: verify fails on integrity problems, diff fails when a verdict flipped pass↔fail.

What it does not do: re-judge

There is deliberately no fakellm-cli rejudge. Two reasons, both structural:

  1. The store doesn't keep enough to re-judge. A frozen record holds only a 280-character excerpt of the response, not the full text. Re-judging needs the exact response to recompute the fingerprint — and that lives in your test, not in the snapshot.
  2. Re-judging is a live model call that belongs in a reviewed run. fakellm-assert's whole point is that verdicts are produced exactly once, in an explicit pytest --fakellm-update, where a human reads the diff. A CLI that judged live would route around the one safety property the library exists to provide.

So the division of labor is: pytest --fakellm-update produces verdicts; fakellm-cli manages them. When verify or diff tells you a verdict is stale or wrong, the fix is to prune it here and re-judge in pytest there.

diff matches on criterion, not fingerprint

A fingerprint includes the response text, so the "same" assertion against a regenerated response has a different fingerprint by design. diff therefore pairs verdicts across two stores by (criterion, judge_model) so it can actually catch a pass→fail flip, rather than reporting every drifted response as an unrelated add+remove.

Typical workflow

fakellm-cli init                    # once, to scaffold
# ... write satisfies() assertions, then:
pytest --fakellm-update             # freeze verdicts (review the diff!)
fakellm-cli list                    # eyeball what got frozen
fakellm-cli verify                  # gate in CI alongside pytest
# later, when you intentionally change a prompt:
fakellm-cli prune --criterion "old wording" --yes
pytest --fakellm-update             # re-freeze

License

MIT

About

Inspect and manage fakellm-assert frozen judgment snapshots from the terminal.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages