Skip to content

feat(bench): add ragcsv benchmark for CSV-based RAG evaluation#869

Merged
starpit merged 1 commit into
IBM:mainfrom
starpit:feat/ragcsv-bench
Feb 18, 2026
Merged

feat(bench): add ragcsv benchmark for CSV-based RAG evaluation#869
starpit merged 1 commit into
IBM:mainfrom
starpit:feat/ragcsv-bench

Conversation

@starpit
Copy link
Copy Markdown
Member

@starpit starpit commented Feb 18, 2026

Summary

  • Adds a standalone ragcsv benchmark that reads a CSV of evaluation queries, executes each as a span query with document fragments as context, then grades response accuracy via a second LLM call
  • Includes python_repr_to_json parser that handles mixed single/double-quoted Python repr output, None/True/False keywords
  • Configurable via RAGCSV_* env vars (model, concurrency, limit, debug)
  • Reports quantile statistics for accuracy and response time

Test plan

  • cargo check --bench ragcsv compiles with zero warnings
  • RAGCSV_FILE=... RAGCSV_LIMIT=3 RAGCSV_DEBUG=1 cargo bench --bench ragcsv verifies CSV parsing and query construction
  • RAGCSV_FILE=... RAGCSV_CONCURRENCY=2 RAGCSV_LIMIT=10 cargo bench --bench ragcsv verifies concurrent execution and final report

🤖 Generated with Claude Code

Standalone benchmark that reads a CSV of evaluation queries, executes
each as a span query with document fragments as context, then grades
response accuracy via a second LLM call. Reports quantile statistics
for accuracy and response time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Nick Mitchell <nickm@us.ibm.com>
@starpit starpit merged commit 1356c6b into IBM:main Feb 18, 2026
36 of 38 checks passed
@starpit starpit deleted the feat/ragcsv-bench branch February 18, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant