Bug
AgentEvaluator.evaluate() raises a pydantic ValidationError when processing eval cases that use conversation_scenario (LLM-backed user simulation) instead of explicit conversation arrays.
Error
pydantic_core._pydantic_core.ValidationError: 1 validation error for _EvalMetricResultWithInvocation
expected_invocation
Input should be a valid dictionary or instance of Invocation [type=model_type, input_value=None, input_type=NoneType]
at agent_evaluator.py:639
Root Cause
Type mismatch between the public and private invocation result models:
local_eval_service.py:285-287 intentionally sets expected_invocation=None when eval_case.conversation is None (i.e., when using conversation_scenario):
EvalMetricResultPerInvocation(
actual_invocation=actual,
expected_invocation=eval_case.conversation[idx]
if eval_case.conversation
else None, # <-- None for conversation_scenario cases
)
This None flows through _get_eval_metric_results_with_invocation() at line 636 and into _EvalMetricResultWithInvocation() at line 639, which rejects it.
Fix
Two changes needed in agent_evaluator.py:
1. Make the field optional (line 93):
expected_invocation: Optional[Invocation] = None
2. Guard the downstream attribute accesses (lines 439-449 in _print_details):
"prompt": AgentEvaluator._convert_content_to_text(
per_invocation_result.expected_invocation.user_content
if per_invocation_result.expected_invocation else None
),
"expected_response": AgentEvaluator._convert_content_to_text(
per_invocation_result.expected_invocation.final_response
if per_invocation_result.expected_invocation else None
),
...
"expected_tool_calls": AgentEvaluator._convert_tool_calls_to_text(
per_invocation_result.expected_invocation.intermediate_data
if per_invocation_result.expected_invocation else None
),
Both _convert_content_to_text and _convert_tool_calls_to_text already accept Optional parameters and handle None gracefully.
Reproduction
# evalset with conversation_scenario (no explicit conversation array)
{
"eval_set_id": "test",
"eval_cases": [{
"eval_id": "scenario_1",
"conversation_scenario": {
"starting_prompt": "Hello",
"conversation_plan": "Ask the agent a question and accept the answer."
},
"session_input": {"app_name": "my_agent", "user_id": "user1", "state": {}}
}]
}
# test_eval.py
import pytest
from google.adk.evaluation import AgentEvaluator
@pytest.mark.asyncio
async def test_scenario():
await AgentEvaluator.evaluate(
"my_agent",
"path/to/evalset.json",
num_runs=1,
)
Crashes during post-processing in _get_eval_metric_results_with_invocation after all metrics have been computed.
Environment
google-adk[eval]>=1.28.0 (also confirmed on main branch — same code)
- Python 3.11
Bug
AgentEvaluator.evaluate()raises apydantic ValidationErrorwhen processing eval cases that useconversation_scenario(LLM-backed user simulation) instead of explicitconversationarrays.Error
at
agent_evaluator.py:639Root Cause
Type mismatch between the public and private invocation result models:
EvalMetricResultPerInvocationineval_metrics.py:323correctly declares:_EvalMetricResultWithInvocationinagent_evaluator.py:93incorrectly declares:local_eval_service.py:285-287intentionally setsexpected_invocation=Nonewheneval_case.conversationisNone(i.e., when usingconversation_scenario):This
Noneflows through_get_eval_metric_results_with_invocation()at line 636 and into_EvalMetricResultWithInvocation()at line 639, which rejects it.Fix
Two changes needed in
agent_evaluator.py:1. Make the field optional (line 93):
2. Guard the downstream attribute accesses (lines 439-449 in
_print_details):Both
_convert_content_to_textand_convert_tool_calls_to_textalready acceptOptionalparameters and handleNonegracefully.Reproduction
Crashes during post-processing in
_get_eval_metric_results_with_invocationafter all metrics have been computed.Environment
google-adk[eval]>=1.28.0(also confirmed onmainbranch — same code)