Description
Implement the FRAMES (Google 2024) dataset loader and accuracy evaluator.
Part of epic #2827. See spec: .local/specs/zeph-bench/spec.md section 6.
Scope
FramesLoader parsing FRAMES JSON schema into Vec<Scenario>
- Accuracy evaluator (exact match against gold answer)
- Download and cache to
~/.local/share/zeph/bench/frames/
- Unit tests with a synthetic fixture
Acceptance Criteria
Description
Implement the FRAMES (Google 2024) dataset loader and accuracy evaluator.
Part of epic #2827. See spec:
.local/specs/zeph-bench/spec.mdsection 6.Scope
FramesLoaderparsing FRAMES JSON schema intoVec<Scenario>~/.local/share/zeph/bench/frames/Acceptance Criteria
zeph bench download --dataset framesworks