fix(eval): include function-call events in invocation_events when skip_summarization is set by Koushik-Salammagari · Pull Request #5417 · google/adk-python

Koushik-Salammagari · 2026-04-20T19:25:58Z

Link to Issue or Description of Change

Description

EvaluationGenerator.convert_events_to_eval_invocations builds
invocation_events (the intermediate tool-call record used by
TrajectoryEvaluator) by collecting all qualifying events and then excluding
the final_event from the list.

The final event is identified via event.is_final_response(), but
is_final_response() returns True for any event with
skip_summarization=True — even events that contain function_call parts
(e.g. tools that use skip_summarization to surface their result directly
without an LLM summarization step). Those events were silently dropped from
invocation_events, causing get_all_tool_calls() to return [] for the
actual invocation. The result: tool_trajectory_avg_score was always 0.0
even when the tool name and args matched the expected exactly.

Root cause: is_final_response() conflates "final user-visible response"
with "should be excluded from tool trajectory". When skip_summarization=True
the function-call event is both the final response and an intermediate step
that must appear in the trajectory.

Fix: in the list comprehension that builds invocation_events, keep an
event even when it equals final_event if it contains function calls:

# before
if e is not final_event

# after
if e is not final_event or e.get_function_calls()

Changes

src/google/adk/evaluation/evaluation_generator.py: one-line fix
tests/unittests/evaluation/test_evaluation_generator.py: regression test that verifies tool calls are preserved when skip_summarization=True
tests/unittests/evaluation/test_trajectory_evaluator.py: end-to-end tests for InvocationEvents intermediate_data format (exact match → 1.0, mismatch → 0.0)

Testing Plan

pytest tests/unittests/evaluation/test_trajectory_evaluator.py \
       tests/unittests/evaluation/test_evaluation_generator.py -v
======================== 47 passed in 1.23s ============================

rohityan · 2026-04-20T22:43:37Z

Hi @Koushik-Salammagari , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors by running autoformat.sh

Koushik-Salammagari · 2026-05-02T19:56:12Z

Hi @rohityan — rebased onto latest main and ran pre-commit run --all-files (all checks pass: isort, pyink, addlicense, mdformat). Ready for re-review!

rohityan · 2026-05-08T23:33:16Z

Hi @Jacksunwei , can you please review this.

…in thread pool When RunConfig.tool_thread_pool_config is enabled, _call_tool_in_thread_pool used None as a sentinel to distinguish "FunctionTool ran in thread pool" from "non-FunctionTool sync tool, needs async fallback". Because None is also a valid return value from any FunctionTool whose underlying function has no explicit return statement (implicit None), the sentinel check failed and execution fell through to tool.run_async(), invoking the function a second time silently. Replace the None sentinel with a dedicated _SYNC_TOOL_RESULT_UNSET object so that a legitimate None result from a FunctionTool is correctly returned on the first execution, without triggering the async fallback path. Fixes google#5284

…ases Per reviewer feedback: collapse the two near-identical None tests into a single @pytest.mark.parametrize test, and add falsy-but-not-None cases (0, '', {}, False) to prove the sentinel is identity-based and does not mishandle any falsy return value from a FunctionTool.

…p_summarization is set EvaluationGenerator.convert_events_to_eval_invocations builds invocation_events by excluding the final_event from intermediate steps. However, is_final_response() returns True for any event with skip_summarization=True, even when that event contains function calls (e.g. tools using skip_summarization to bypass LLM summarization). Such events were incorrectly excluded from invocation_events, causing get_all_tool_calls() to return an empty list and tool_trajectory_avg_score to always be 0.0 despite matching tool calls. Fix: keep an event in invocation_events even if it is the final_event when it contains function calls. Fixes google#5410

…rror

Koushik-Salammagari · 2026-06-03T23:55:02Z

Hi @rohityan @Jacksunwei — rebased onto latest main to resolve the merge conflict; the branch is mergeable again and CI is re-running. Ready whenever you have a moment to review. Thanks!

_call_tool_in_thread_pool uses early returns for each tool category (sync FunctionTool, async tool, and the sync non-FunctionTool fallback), so a sync FunctionTool that returns None exits immediately via its own return and never falls through to the fallback path. The _SYNC_TOOL_RESULT_UNSET sentinel added to guard that case is never referenced anywhere, and its comment describes a fallthrough the code structure already prevents. Remove the dead definition and comment.

adk-bot · 2026-06-08T17:16:11Z

I have conducted a thorough, read-only analysis of PR #5417 in accordance with the adk-pr-analyze workflow.

I have generated a premium PR Analysis Report which is now saved and available:
📂 PR Analysis Report (pr_analysis_report.md)

Key Takeaways & Recommendations:

Bug Resolution: The proposed fix correctly solves Issue #5410. It ensures that intermediate evaluation event trajectories do not discard critical final_event data when skip_summarization=True is utilized.
Quality & Standard: Alignment with ADK framework boundaries is excellent. There are high-quality regression and integration tests protecting against future regressions.
Core Style Nit:
- In evaluation_generator.py, the author annotated final_event with:
```
final_event: Optional[Event] = None
```
- Recommendation: Request the author to update this to ADK's preferred modern standard union syntax:
```
final_event: Event | None = None
```

Next Steps for You:

Recommendation: Approve/Merge with Nits. The change is ready for integration, with only a minor type-hint modernization suggested.
Let me know if you would like me to draft a GitHub review comment or perform further checks!

…p_summarization is set Merge #5417 ### Link to Issue or Description of Change Fixes #5410 ### Description `EvaluationGenerator.convert_events_to_eval_invocations` builds `invocation_events` (the intermediate tool-call record used by `TrajectoryEvaluator`) by collecting all qualifying events and then excluding the `final_event` from the list. The final event is identified via `event.is_final_response()`, but `is_final_response()` returns `True` for **any** event with `skip_summarization=True` — even events that contain `function_call` parts (e.g. tools that use `skip_summarization` to surface their result directly without an LLM summarization step). Those events were silently dropped from `invocation_events`, causing `get_all_tool_calls()` to return `[]` for the actual invocation. The result: `tool_trajectory_avg_score` was always **0.0** even when the tool name and args matched the expected exactly. **Root cause:** `is_final_response()` conflates "final user-visible response" with "should be excluded from tool trajectory". When `skip_summarization=True` the function-call event is both the final response *and* an intermediate step that must appear in the trajectory. **Fix:** in the list comprehension that builds `invocation_events`, keep an event even when it equals `final_event` if it contains function calls: ```python # before if e is not final_event # after if e is not final_event or e.get_function_calls() ``` ### Changes - `src/google/adk/evaluation/evaluation_generator.py`: one-line fix - `tests/unittests/evaluation/test_evaluation_generator.py`: regression test that verifies tool calls are preserved when `skip_summarization=True` - `tests/unittests/evaluation/test_trajectory_evaluator.py`: end-to-end tests for `InvocationEvents` intermediate_data format (exact match → 1.0, mismatch → 0.0) ### Testing Plan ``` pytest tests/unittests/evaluation/test_trajectory_evaluator.py \ tests/unittests/evaluation/test_evaluation_generator.py -v ======================== 47 passed in 1.23s ============================ ``` Co-authored-by: George Weale <gweale@google.com> COPYBARA_INTEGRATE_REVIEW=#5417 from Koushik-Salammagari:fix/trajectory-eval-skip-summarization ce8087f PiperOrigin-RevId: 933236523

adk-bot · 2026-06-16T19:29:31Z

Thank you @Koushik-Salammagari for your contribution! 🎉

Your changes have been successfully imported and merged via Copybara in commit 5b16a86.

Closing this PR as the changes are now in the main branch.

adk-bot added the eval [Component] This issue is related to evaluation label Apr 20, 2026

rohityan self-assigned this Apr 20, 2026

rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Apr 20, 2026

surajksharma07 mentioned this pull request Apr 23, 2026

tool_trajectory_avg_score returns 0.0 even when tool name and args match exactly #5410

Closed

Koushik-Salammagari force-pushed the fix/trajectory-eval-skip-summarization branch 2 times, most recently from 7e902ee to 709d857 Compare May 2, 2026 17:54

rohityan added needs review [Status] The PR/issue is awaiting review from the maintainer and removed request clarification [Status] The maintainer need clarification or more information from the author labels May 8, 2026

rohityan requested a review from Jacksunwei May 8, 2026 23:33

Koushik-Salammagari added 6 commits June 3, 2026 16:51

style: apply pyink formatting to thread pool test file

0f68dc9

fix(eval): add type annotation and guard to resolve mypy union-attr e…

fb4a8cd

…rror

style: apply pyink formatting to evaluation_generator.py

f860460

Koushik-Salammagari force-pushed the fix/trajectory-eval-skip-summarization branch from 4d7739c to f860460 Compare June 3, 2026 23:53

GWeale assigned GWeale and unassigned rohityan Jun 15, 2026

adk-bot added the merged [Status] This PR is merged label Jun 16, 2026

adk-bot closed this Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): include function-call events in invocation_events when skip_summarization is set#5417

fix(eval): include function-call events in invocation_events when skip_summarization is set#5417
Koushik-Salammagari wants to merge 7 commits into
google:mainfrom
Koushik-Salammagari:fix/trajectory-eval-skip-summarization

Koushik-Salammagari commented Apr 20, 2026

Uh oh!

rohityan commented Apr 20, 2026

Uh oh!

Koushik-Salammagari commented May 2, 2026

Uh oh!

rohityan commented May 8, 2026

Uh oh!

Koushik-Salammagari commented Jun 3, 2026

Uh oh!

adk-bot commented Jun 8, 2026

Uh oh!

adk-bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Koushik-Salammagari commented Apr 20, 2026

Link to Issue or Description of Change

Description

Changes

Testing Plan

Uh oh!

rohityan commented Apr 20, 2026

Uh oh!

Koushik-Salammagari commented May 2, 2026

Uh oh!

rohityan commented May 8, 2026

Uh oh!

Koushik-Salammagari commented Jun 3, 2026

Uh oh!

adk-bot commented Jun 8, 2026

Key Takeaways & Recommendations:

Next Steps for You:

Uh oh!

adk-bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants