Skip to content

feat(bench): implement GAIA dataset loader #2839

@bug-ops

Description

@bug-ops

Description

Implement the GAIA (HuggingFace leaderboard) dataset loader supporting levels 1, 2, and 3.

Part of epic #2827. See spec: .local/specs/zeph-bench/spec.md section 6.

Scope

  • GaiaLoader parsing GAIA JSON schema; support --level 1|2|3 filter flag
  • Accuracy evaluator (exact match normalized per GAIA spec)
  • Download from HuggingFace Hub; cache to ~/.local/share/zeph/bench/gaia/
  • Unit tests with a synthetic fixture

Acceptance Criteria

  • Loader parses GAIA schema for all three levels
  • --level filter restricts scenarios correctly
  • Accuracy evaluator matches GAIA normalization rules on fixture
  • zeph bench download --dataset gaia works

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions