feat(bench): add ragcsv benchmark for CSV-based RAG evaluation by starpit · Pull Request #869 · IBM/spnl

starpit · 2026-02-18T00:18:05Z

Summary

Adds a standalone ragcsv benchmark that reads a CSV of evaluation queries, executes each as a span query with document fragments as context, then grades response accuracy via a second LLM call
Includes python_repr_to_json parser that handles mixed single/double-quoted Python repr output, None/True/False keywords
Configurable via RAGCSV_* env vars (model, concurrency, limit, debug)
Reports quantile statistics for accuracy and response time

Test plan

cargo check --bench ragcsv compiles with zero warnings
RAGCSV_FILE=... RAGCSV_LIMIT=3 RAGCSV_DEBUG=1 cargo bench --bench ragcsv verifies CSV parsing and query construction
RAGCSV_FILE=... RAGCSV_CONCURRENCY=2 RAGCSV_LIMIT=10 cargo bench --bench ragcsv verifies concurrent execution and final report

🤖 Generated with Claude Code

Standalone benchmark that reads a CSV of evaluation queries, executes each as a span query with document fragments as context, then grades response accuracy via a second LLM call. Reports quantile statistics for accuracy and response time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Nick Mitchell <nickm@us.ibm.com>

starpit added the made with opus4.6 label Feb 18, 2026

starpit merged commit 1356c6b into IBM:main Feb 18, 2026
36 of 38 checks passed

starpit deleted the feat/ragcsv-bench branch February 18, 2026 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): add ragcsv benchmark for CSV-based RAG evaluation#869

feat(bench): add ragcsv benchmark for CSV-based RAG evaluation#869
starpit merged 1 commit into
IBM:mainfrom
starpit:feat/ragcsv-bench

starpit commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

starpit commented Feb 18, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant