EvalView can now generate a draft regression suite in one command. Use it when you have an agent endpoint or traffic logs but no meaningful test suite yet.
evalview generate --agent http://localhost:8000What it does:
- probes the agent with diverse prompts
- discovers tool-aware behavior paths
- clusters duplicate trajectories
- writes draft YAML tests to
tests/generated/ - writes
tests/generated/generated.report.json
Use this when:
- you have no production traffic yet
- you want a fast draft suite from a running endpoint
- you need tool-path coverage quickly
evalview generate --from-log traffic.jsonlSupported log formats:
jsonlopenaievalview
Use this when:
- you already have staging or production logs
- you want to bootstrap tests from real behavior
- you do not want to probe a live endpoint directly
Each generated test is native EvalView YAML with:
- inferred tool expectations
- output contains / not_contains checks when stable
- thresholds from observed behavior
- generation metadata in
meta
Generated tests are marked as drafts:
meta:
generated_by: evalview generate
review_status: draft
confidence: high
rationale: Observed tool path: weather_api
behavior_class: tool_pathGenerated tests are intentionally blocked from snapshotting until reviewed.
evalview snapshot tests/generated --approve-generatedThat command:
- updates generated YAML files to
review_status: approved - stamps
approved_at - snapshots approved tests as baselines
If you forget the flag, EvalView refuses to baseline draft-generated tests.
Every generated suite writes a report file:
tests/generated/generated.report.jsonTurn it into a PR comment:
evalview ci comment --results tests/generated/generated.report.json --dry-runThe comment includes:
- discovered tools
- generated behavior paths
- coverage gaps
- approval instructions
evalview generate --agent http://localhost:8000
evalview ci comment --results tests/generated/generated.report.json --dry-run
evalview snapshot tests/generated --approve-generated
evalview check tests/generatedevalview generate --from-log traffic.jsonl
evalview snapshot tests/generated --approve-generated
evalview check tests/generatedevalview generate [OPTIONS]
Options:
--agent URL Agent endpoint URL
--from-log PATH Build suite from logs instead of live probing
--log-format FORMAT auto|jsonl|openai|evalview
--budget N Max probe runs or imported entries
--out DIR Output directory (default: tests/generated)
--seed FILE Newline-delimited seed prompts
--include-tools LIST Focus on specific tools
--exclude-tools LIST Avoid specific tools
--allow-live-side-effects Allow side-effecting prompts
--dry-run Preview without writing files- zero-to-suite onboarding
- tool-path clustering
- clarification and multi-turn draft detection
- schema-aware probing when tool discovery is available
- safe-mode probing by default
- generated assertions are conservative by design
- multi-turn generation is currently strongest for clarification-followup flows
- safety contracts are inferred heuristically unless the tool schema is explicit
That is intentional. EvalView generates draft regression tests, not blind truth claims.
generate: create the first draft suitecapture: record real interactions as testsexpand: create variations from an existing seed testrecord: interactive one-by-one test recording
Use generate for onboarding, capture for real traffic, and expand when you already have a strong seed test.