fix(eval): handle unevaluated final response v2 results by pragnyanramtha · Pull Request #5728 · google/adk-python

pragnyanramtha · 2026-05-17T00:12:23Z

Summary

Fixes a small aggregation edge case in FinalResponseMatchV2Evaluator: when every per-invocation result is skipped or not evaluated, the evaluator currently divides by zero while computing the overall score.

Root Cause

aggregate_invocation_results() filters out results whose score is None or whose eval_status is NOT_EVALUATED, but it unconditionally computes:

overall_score = num_valid / num_evaluated

If all judge samples fail to produce a usable score, num_evaluated remains 0 and evaluation crashes instead of returning a not-evaluated aggregate result. Other ADK evaluators handle this condition by returning overall_score=None and overall_eval_status=NOT_EVALUATED.

Change

Return an EvaluationResult with overall_score=None and overall_eval_status=NOT_EVALUATED when no FinalResponseMatchV2 invocation results are evaluable.
Add a focused regression test for all-skipped/all-not-evaluated invocation results.

Validation

uv sync --extra test
uv run pytest tests/unittests/evaluation/test_final_response_match_v2.py

Result: 18 passed, 20 warnings.

Full unit suite was not run; this patch is limited to FinalResponseMatchV2 aggregation and its targeted unit test file.

…onse-v2-no-eval-guard

pragnyanramtha · 2026-05-20T18:18:19Z

Refreshed this branch with current main in f61da061.

Validation rerun:

uv run --extra test pytest tests/unittests/evaluation/test_final_response_match_v2.py -q (18 passed)
uv run --extra dev pyink --check src/google/adk/evaluation/final_response_match_v2.py tests/unittests/evaluation/test_final_response_match_v2.py
git diff --check

…onse-v2-no-eval-guard

pragnyanramtha · 2026-05-20T18:49:08Z

Refreshed this branch with current main in f7a83e9b.

Validation rerun:

uv run --extra test pytest tests/unittests/evaluation/test_final_response_match_v2.py -q (18 passed, 20 experimental warnings)
uv run --extra dev pyink --check src/google/adk/evaluation/final_response_match_v2.py tests/unittests/evaluation/test_final_response_match_v2.py
uv run --extra dev isort --check-only src/google/adk/evaluation/final_response_match_v2.py tests/unittests/evaluation/test_final_response_match_v2.py
git diff --check

…onse-v2-no-eval-guard

pragnyanramtha · 2026-05-20T23:27:07Z

Pushed 430f312d to address the failing CI pre-commit job.

The failure was from repository-wide hooks updating existing files outside this PR's evaluator patch:

add newline at EOF for src/google/adk/cli/browser/assets/config/runtime-config.json
apply pyink formatting to tests/unittests/cli/utils/test_gcp_utils.py

Validation:

uv run --extra dev pre-commit run --all-files -> passed
uv run --extra test pytest tests/unittests/evaluation/test_final_response_match_v2.py -q -> 18 passed, 20 experimental warnings
git diff --check -> passed

rohityan · 2026-05-20T23:38:13Z

Hi @pragnyanramtha , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share.

rohityan · 2026-05-20T23:38:24Z

Hi @sasha-gitg , can you please review this.

pragnyanramtha · 2026-05-21T03:20:02Z

I noticed the request clarification label is still on this PR, but I do not see a specific clarification request in the thread. The branch is current and all checks are green after 430f312d; happy to clarify or adjust anything if there is a maintainer question I missed.

Merge #5728 ## Summary Fixes a small aggregation edge case in `FinalResponseMatchV2Evaluator`: when every per-invocation result is skipped or not evaluated, the evaluator currently divides by zero while computing the overall score. ## Root Cause `aggregate_invocation_results()` filters out results whose `score` is `None` or whose `eval_status` is `NOT_EVALUATED`, but it unconditionally computes: ```python overall_score = num_valid / num_evaluated ``` If all judge samples fail to produce a usable score, `num_evaluated` remains `0` and evaluation crashes instead of returning a not-evaluated aggregate result. Other ADK evaluators handle this condition by returning `overall_score=None` and `overall_eval_status=NOT_EVALUATED`. ## Change - Return an `EvaluationResult` with `overall_score=None` and `overall_eval_status=NOT_EVALUATED` when no FinalResponseMatchV2 invocation results are evaluable. - Add a focused regression test for all-skipped/all-not-evaluated invocation results. ## Validation ```bash uv sync --extra test uv run pytest tests/unittests/evaluation/test_final_response_match_v2.py ``` Result: `18 passed, 20 warnings`. Full unit suite was not run; this patch is limited to FinalResponseMatchV2 aggregation and its targeted unit test file. Co-authored-by: Haran Rajkumar <haranrk@google.com> COPYBARA_INTEGRATE_REVIEW=#5728 from pragnyanramtha:pragnyan/final-response-v2-no-eval-guard 3d5ab73 PiperOrigin-RevId: 933818272

adk-bot · 2026-06-17T18:02:33Z

Thank you @pragnyanramtha for your contribution! 🎉

Your changes have been successfully imported and merged via Copybara in commit 5cfef01.

Closing this PR as the changes are now in the main branch.

fix(eval): handle no evaluated final response v2 results

f814359

pragnyanramtha marked this pull request as ready for review May 17, 2026 00:15

Merge branch 'main' into pragnyan/final-response-v2-no-eval-guard

22c8a0f

rohityan self-assigned this May 18, 2026

rohityan and others added 5 commits May 18, 2026 11:40

Merge branch 'main' into pragnyan/final-response-v2-no-eval-guard

1c75271

Merge remote-tracking branch 'upstream/main' into pragnyan/final-resp…

53ce1af

…onse-v2-no-eval-guard

Merge remote-tracking branch 'upstream/main' into pragnyan/final-resp…

060f329

…onse-v2-no-eval-guard

Merge branch 'main' into pragnyan/final-response-v2-no-eval-guard

095c893

Merge branch 'main' into pragnyan/final-response-v2-no-eval-guard

f004759

rohityan added the v2 Affects only 2.0 version label May 19, 2026

pragnyanramtha added 2 commits May 20, 2026 05:03

Merge remote-tracking branch 'upstream/main' into pragnyan/final-resp…

26095e5

…onse-v2-no-eval-guard

Merge remote-tracking branch 'upstream/main' into pragnyan/final-resp…

f61da06

…onse-v2-no-eval-guard

Merge remote-tracking branch 'upstream/main' into pragnyan/final-resp…

f7a83e9

…onse-v2-no-eval-guard

Merge remote-tracking branch 'upstream/main' into pragnyan/final-resp…

ac57c85

…onse-v2-no-eval-guard

rohityan added request clarification [Status] The maintainer need clarification or more information from the author and removed v2 Affects only 2.0 version labels May 20, 2026

chore: satisfy repository pre-commit hooks

430f312

rohityan added the eval [Component] This issue is related to evaluation label May 20, 2026

rohityan requested a review from sasha-gitg May 20, 2026 23:37

Merge branch 'main' into pragnyan/final-response-v2-no-eval-guard

3d5ab73

rohityan added needs review [Status] The PR/issue is awaiting review from the maintainer and removed request clarification [Status] The maintainer need clarification or more information from the author labels May 29, 2026

haranrk assigned haranrk and unassigned rohityan Jun 17, 2026

adk-bot added the merged [Status] This PR is merged label Jun 17, 2026

adk-bot closed this Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): handle unevaluated final response v2 results#5728

fix(eval): handle unevaluated final response v2 results#5728
pragnyanramtha wants to merge 13 commits into
google:mainfrom
pragnyanramtha:pragnyan/final-response-v2-no-eval-guard

pragnyanramtha commented May 17, 2026

Uh oh!

pragnyanramtha commented May 20, 2026

Uh oh!

pragnyanramtha commented May 20, 2026

Uh oh!

pragnyanramtha commented May 20, 2026

Uh oh!

rohityan commented May 20, 2026

Uh oh!

rohityan commented May 20, 2026

Uh oh!

pragnyanramtha commented May 21, 2026

Uh oh!

adk-bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pragnyanramtha commented May 17, 2026

Summary

Root Cause

Change

Validation

Uh oh!

pragnyanramtha commented May 20, 2026

Uh oh!

pragnyanramtha commented May 20, 2026

Uh oh!

pragnyanramtha commented May 20, 2026

Uh oh!

rohityan commented May 20, 2026

Uh oh!

rohityan commented May 20, 2026

Uh oh!

pragnyanramtha commented May 21, 2026

Uh oh!

adk-bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants