rubric_based_final_response_quality_v1 does not evaluate text emitted before tool calls

## Problem

When an agent emits text before a tool call (e.g. presenting a plan), then calls a tool, then emits more text (e.g. an explanation), `rubric_based_final_response_quality_v1` only sends the **post-tool-call text** to the judge as `final_response`. 

The pre-tool-call text is stored in `intermediate_data.invocation_events` (confirmed by inspecting the eval results), but `format_auto_rater_prompt` in `rubric_based_final_response_quality_v1.py` only extracts tool calls/responses from `intermediate_data` — the text content in those events is ignored.

## Impact

Rubrics that check for content in the pre-tool-call text always fail, even though the agent correctly produced that content. For example, if an agent:

1. Streams a plan to the user (text)
2. Calls a tool
3. Streams an explanation of changes (text)

The judge only sees step 3. Rubrics checking for the plan (step 1) always fail.

## Expected behavior

There should be an option to evaluate the agent's **full response** — all text emitted during the invocation, not just the text after the last tool call. This follows the pattern of `evaluate_intermediate_nl_responses` on `HallucinationsCriterion`.

## Proposed solution

Add `evaluate_full_response: bool = False` to `RubricsBasedCriterion`. When true, concatenate text from `intermediate_data.invocation_events` + `final_response` before sending to the judge.

PR: #5216

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rubric_based_final_response_quality_v1 does not evaluate text emitted before tool calls #5217

Problem

Impact

Expected behavior

Proposed solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rubric_based_final_response_quality_v1 does not evaluate text emitted before tool calls #5217

Description

Problem

Impact

Expected behavior

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions