Skip to content

feat(evaluation): add option to save eval results to CSV#6182

Open
vaibhav-patel wants to merge 2 commits into
google:mainfrom
vaibhav-patel:fix/2652-agentevaluator-csv-export
Open

feat(evaluation): add option to save eval results to CSV#6182
vaibhav-patel wants to merge 2 commits into
google:mainfrom
vaibhav-patel:fix/2652-agentevaluator-csv-export

Conversation

@vaibhav-patel

Copy link
Copy Markdown

Summary

Adds an optional, opt-in way to persist AgentEvaluator results to a CSV file, as requested in #2652. A new output_file parameter is added to both AgentEvaluator.evaluate and AgentEvaluator.evaluate_eval_set. When provided, per-invocation results for every metric (both passing and failing) are written to the given path; when omitted (the default), behavior is unchanged.

Motivation

Running agent evals from pytest currently only prints detailed tables (and only for failing metrics). Users want to save results to disk for later inspection/tracking. This implements the CSV suggestion from #2652, adapted to the current evaluation architecture (the codebase has changed substantially since the issue was filed).

What changed

  • New output_file: Optional[str] = None on AgentEvaluator.evaluate and evaluate_eval_set.
  • Results are flattened to one row per metric per invocation. Columns: eval_set_id, eval_id, metric_name, threshold, score, eval_status, prompt, expected_response, actual_response, expected_tool_calls, actual_tool_calls.
  • Parent directory is created if needed; rows are appended (header written once) so evaluating a directory of .test.json files accumulates results in a single file.
  • Reuses existing content/tool-call formatting helpers; CSV writing uses pandas, already declared in the eval optional dependencies.

Backward compatibility

Fully backward-compatible. The feature is disabled unless output_file is set.

Testing

Added tests/unittests/evaluation/test_agent_evaluator.py covering row flattening, the missing-expected-invocation case, single-file writing with directory creation, and append-without-duplicate-header. All pass; the full tests/unittests/evaluation/ suite still collects cleanly.

Fixes #2652.

Add an optional `output_file` parameter to `AgentEvaluator.evaluate` and
`AgentEvaluator.evaluate_eval_set`. When set, per-invocation evaluation
results for every metric (both passing and failing) are flattened and
written to the given path as a CSV file, making it easy to persist and
inspect results from pytest-based eval runs.

The option is disabled by default, so existing behavior is unchanged. The
parent directory is created if needed, and rows are appended so results
from a directory of test files accumulate in a single file. CSV writing
reuses the existing text/tool-call formatting helpers and relies on pandas,
which is already part of the `eval` optional dependencies.

Fixes google#2652.
@google-cla

google-cla Bot commented Jun 22, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@vaibhav-patel

Copy link
Copy Markdown
Author

@googlebot I signed it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AgentEvaluator - save eval results to a csv

1 participant