docs(verifier): add README for the new rubric verifier by miguelg719 · Pull Request #2139 · browserbase/stagehand

miguelg719 · 2026-05-15T21:36:59Z

Summary

Adds a README at packages/core/lib/v3/verifier/README.md that documents the rubric verifier landing across PRs #2129-#2138. Pure documentation — no code changes.

Covers:

What it produces — the Verdict shape and the outcome-vs-process distinction (load-bearing rule: render failures count against outcomeSuccess regardless of cause; effort credit goes to processScore)
Pipeline diagram (mermaid) — Step 0a rubric → Step 1 canonical evidence (SSIM/MSE dedup + downscale) → Step 2 batched relevance → Top-K → Approach A or Approach B
Approach A vs B — fault-isolated per-criterion path vs single fused multimodal call, with accuracy + cost tradeoffs
Configuration — every env var (VERIFIER_APPROACH, VERIFIER_OPTIONAL_STEPS, VERIFIER_RELEVANCE_BATCH_SIZE, VERIFIER_MAX_PARALLEL, VERIFIER_TOP_K, dedup thresholds, cache controls, STAGEHAND_EVALUATOR_BACKEND)
Public API — V3Evaluator.verify() / .generateRubric() and the hands-off runWithVerifier adapter
On-disk trajectory layout — fara-compatible task_data.json + trajectory.json + screenshots/{agent,probe}/N.png + times.json + scores/mmrubric_*.json
Offline re-scoring — evals verify <dir> usage for ~30s prompt iteration cycles
External harness adapters — Codex / Claude Code path with tier-1-only evidence
Prompts library + error taxonomy — one-line per file/category, what each does
Performance characteristics — 28-50 LLM calls / ~$0.12 per run / 3-10× cheaper than the FARA original
Known limitations — Step 0a parse robustness, tier-1 image dedup gap, processScore non-determinism, Browserbase quota interaction

Why standalone

Independent of the 10 implementation PRs so it can land at any stack position without rebase churn. Targets the same directory where the verifier source files will live (packages/core/lib/v3/verifier/) — readers will find it naturally once the rest merges.

Test plan

gh pr view <num> renders the mermaid block correctly
Cross-references to other PRs (feat(verifier): add evaluator backend facade #2129-feat(evals): add verifier benchmark instrumentation #2138) and packages/evals/framework/verifierAdapter.ts etc. point at the right files post-merge
All env-var names match the constants in the implementation PRs

🤖 Generated with Claude Code

Summary by cubic

Adds a README for the rubric-based verifier at packages/core/lib/v3/verifier/, documenting outputs, pipeline (Approach A vs B), configuration, public API, on-disk layout, offline re-scoring, adapters, prompts, taxonomy, performance, and limitations. Docs only; no code changes.

Migration
- To route V3Evaluator.verify() through the new verifier, set STAGEHAND_EVALUATOR_BACKEND=verifier.
- To re-score saved trajectories without running an agent, use evals verify <trajectory-dir>.

^{Written for commit 690572f. Summary will update on new commits. Review in cubic}

Documents the verifier system landing across PRs #2129-#2138: pipeline architecture, Approach A vs B tradeoffs, env knobs, on-disk trajectory layout, offline `bench verify` usage, external harness adapter integration, prompts library, error taxonomy, and known limitations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

changeset-bot · 2026-05-15T21:37:06Z

⚠️ No Changeset found

Latest commit: 690572f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(verifier): add README for the new rubric verifier#2139

docs(verifier): add README for the new rubric verifier#2139
miguelg719 wants to merge 1 commit into
mainfrom
miguelgonzalez/verifier-readme

miguelg719 commented May 15, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

changeset-bot Bot commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

miguelg719 commented May 15, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why standalone

Test plan

Summary by cubic

Uh oh!

changeset-bot Bot commented May 15, 2026

⚠️ No Changeset found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

miguelg719 commented May 15, 2026 •

edited by cubic-dev-ai Bot

Loading