You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat: transcript import pipeline — grade existing Claude/Codex/Copilot sessions offline (#872)
Add `agentv import` command with Claude, Codex, and Copilot subcommands
that read existing AI coding sessions from disk and normalize them into
a tool-agnostic transcript JSONL format.
Add `--transcript` flag to `agentv eval` that skips provider invocation
and grades pre-recorded transcripts, enabling offline evaluation without
re-running sessions.
Rename `agentv trace` → `agentv inspect` (kept trace as deprecated alias).
Key changes:
- New parsers: codex-parser.ts, transcript-provider.ts
- New discovery: codex-session-discovery.ts
- Updated import output to spec format (input, output, source, token_usage, etc.)
- TranscriptProvider implements Provider interface for eval pipeline integration
- Re-export copilot parser/discovery from import barrel for CLI access
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: prevent transcript provider from being used as LLM grader
When --transcript is used without --grader-target, the orchestrator's
grader resolution would fall back to using the transcript provider as
the grader, exhausting the transcript on the second invoke() call.
Fix: return undefined from resolveGraderProvider when the target is a
transcript provider so LLM-based evaluators skip gracefully.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: use LLM_GRADER_CAPABLE_KINDS allowlist for grader resolution
Replace the transcript-specific point check with a proper allowlist of
provider kinds that can return structured JSON for LLM grading.
Previously, resolveGraderProvider would blindly fall back to using the
eval target as its own grader when no grader_target was configured. This
silently broke for transcript, copilot-log, cli, and any other provider
that can't produce grader responses.
Now only providers in LLM_GRADER_CAPABLE_KINDS (openai, openrouter,
azure, anthropic, gemini, agentv, mock) are used as fallback graders.
All others return undefined, causing LLM-based evaluators to skip with
a clear error rather than fail silently.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: hard-remove agentv trace, replace with agentv inspect
Delete the trace/ command directory entirely (no deprecated alias).
Update all imports from trace/utils to inspect/utils.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`Transcript has ${transcriptProvider.lineCount} entry(s) but eval defines ${totalTests} test(s). Each transcript line maps positionally to one test case.`,
0 commit comments