Description
Implement the GAIA (HuggingFace leaderboard) dataset loader supporting levels 1, 2, and 3.
Part of epic #2827. See spec: .local/specs/zeph-bench/spec.md section 6.
Scope
GaiaLoader parsing GAIA JSON schema; support --level 1|2|3 filter flag
- Accuracy evaluator (exact match normalized per GAIA spec)
- Download from HuggingFace Hub; cache to
~/.local/share/zeph/bench/gaia/
- Unit tests with a synthetic fixture
Acceptance Criteria
Description
Implement the GAIA (HuggingFace leaderboard) dataset loader supporting levels 1, 2, and 3.
Part of epic #2827. See spec:
.local/specs/zeph-bench/spec.mdsection 6.Scope
GaiaLoaderparsing GAIA JSON schema; support--level 1|2|3filter flag~/.local/share/zeph/bench/gaia/Acceptance Criteria
--levelfilter restricts scenarios correctlyzeph bench download --dataset gaiaworks