feat(bench): implement LOCOMO, FRAMES, and GAIA dataset loaders by bug-ops · Pull Request #2842 · bug-ops/zeph

bug-ops · 2026-04-08T13:13:00Z

Summary

Add shared Scenario / Evaluator traits and metric functions (token_f1, exact_match, gaia_normalized_exact_match) in scenario.rs
LocomoLoader parses lmlab/locomo JSON array (one Scenario per QA pair); LocomoEvaluator uses token F1 with threshold 0.5
FramesLoader parses google/frames-benchmark JSONL, stores reasoning_types in metadata; FramesEvaluator uses exact match
GaiaLoader parses gaia-benchmark/GAIA JSONL with optional --level 1|2|3 filter; GaiaEvaluator uses GAIA-normalized exact match (strip articles, punctuation, collapse whitespace)

Closes #2836, #2837, #2839. Part of epic #2827.

Test plan

52 unit tests across all new modules (synthetic fixtures for each loader + evaluator)
cargo nextest run -p zeph-bench --lib — 52 passed, 0 skipped
Full workspace: 7788 tests passed
cargo +nightly fmt --check — clean
cargo clippy -p zeph-bench --all-targets -- -D warnings — clean

Closes #2836, #2837, #2839 Add shared Scenario/Evaluator traits and metric functions: - token_f1: whitespace-token overlap F1 score - exact_match: case-insensitive, punctuation-stripped equality - gaia_normalized_exact_match: strips articles, punctuation, collapses whitespace Loaders and evaluators: - LocomoLoader: parses lmlab/locomo JSON array; one Scenario per QA pair; LocomoEvaluator uses token F1 with threshold 0.5 - FramesLoader: parses google/frames-benchmark JSONL; stores reasoning_types in metadata; FramesEvaluator uses exact match - GaiaLoader: parses gaia-benchmark/GAIA JSONL with optional --level filter; GaiaEvaluator uses GAIA-normalized exact match 52 unit tests across all new modules; all 7788 workspace tests pass.

github-actions bot added enhancement New feature or request rust Rust code changes dependencies Dependency updates size/XL Extra large PR (500+ lines) and removed enhancement New feature or request labels Apr 8, 2026

This was linked to issues Apr 8, 2026

feat(bench): implement FRAMES dataset loader #2837

Closed

feat(bench): implement GAIA dataset loader #2839

Closed

bug-ops enabled auto-merge (squash) April 8, 2026 13:16

bug-ops force-pushed the locomo-frames-gaia-dataset-loaders branch from 39eb57c to 2fe5fce Compare April 8, 2026 13:20

github-actions bot added enhancement New feature or request and removed dependencies Dependency updates labels Apr 8, 2026

bug-ops merged commit d58a1b2 into main Apr 8, 2026
29 checks passed

bug-ops deleted the locomo-frames-gaia-dataset-loaders branch April 8, 2026 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): implement LOCOMO, FRAMES, and GAIA dataset loaders#2842

feat(bench): implement LOCOMO, FRAMES, and GAIA dataset loaders#2842
bug-ops merged 1 commit intomainfrom
locomo-frames-gaia-dataset-loaders

bug-ops commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Apr 8, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant