Finish/tool turns drop SDK usage metadata while equivalent plain text turns do report it

## Summary

`google-antigravity==0.1.0` appears not to expose token usage metadata for a local-harness `Agent` turn that completes through the built-in `finish` structured-output flow. The same local Agent setup does expose `usageMetadata` for a plain text response, so this looks specific to the finish/tool completion path rather than the Agent API in general.

I am not asking for billing guarantees from the SDK. The request is for the same model usage metadata that is already exposed on normal text turns to also be attached to the turn/step when the model terminates through the built-in `finish` flow.

## Environment

- Package: `google-antigravity==0.1.0`
- Python: 3.14
- Host: Linux x86_64
- Execution strategy: local harness
- Model tested: Gemini through `GeminiConfig`

## Reproduction

Set `GEMINI_API_KEY` in the environment and run:

```python
from __future__ import annotations

import asyncio
import json
import os
import tempfile
from pathlib import Path

from google.antigravity import (
    Agent,
    CapabilitiesConfig,
    GeminiConfig,
    GenerationConfig,
    LocalAgentConfig,
    ModelConfig,
    ModelEntry,
)


MODEL = os.environ.get("GEMINI_MODEL", "gemini-3.1-flash-lite")
FINISH_SCHEMA = {
    "type": "object",
    "additionalProperties": False,
    "required": ["summary", "content"],
    "properties": {
        "summary": {"type": "string"},
        "content": {"type": "string"},
    },
}


def _config(root: Path, *, use_finish: bool) -> LocalAgentConfig:
    capabilities = CapabilitiesConfig(
        enabled_tools=["finish"] if use_finish else [],
        compaction_threshold=50000,
        finish_tool_schema_json=json.dumps(FINISH_SCHEMA, separators=(",", ":"), sort_keys=True)
        if use_finish
        else None,
    )
    return LocalAgentConfig(
        system_instructions="Return only through finish." if use_finish else "Answer briefly.",
        capabilities=capabilities,
        policies=[],
        workspaces=[],
        save_dir=str(root / "save"),
        app_data_dir=str(root / "app-data"),
        response_schema=FINISH_SCHEMA if use_finish else None,
        skills_paths=[],
        gemini_config=GeminiConfig(
            api_key=os.environ["GEMINI_API_KEY"],
            models=ModelConfig(
                default=ModelEntry(
                    name=MODEL,
                    generation=GenerationConfig(thinking_level=None),
                ),
                image_generation=ModelEntry(
                    name=MODEL,
                    generation=GenerationConfig(thinking_level=None),
                ),
            ),
        ),
    )


async def run_case(*, use_finish: bool) -> None:
    with tempfile.TemporaryDirectory(prefix="ag-usage-debug-") as temp_dir:
        root = Path(temp_dir)
        async with Agent(_config(root, use_finish=use_finish)) as agent:
            response = await agent.chat(
                "Call finish with summary and content saying usage debug completed."
                if use_finish
                else "Say hello in one sentence."
            )
            await response.resolve()
            print("use_finish:", use_finish)
            print(
                "response.usage_metadata:",
                None if response.usage_metadata is None else response.usage_metadata.model_dump(),
            )
            print(
                "conversation.last_turn_usage:",
                None
                if agent.conversation.last_turn_usage is None
                else agent.conversation.last_turn_usage.model_dump(),
            )
            print("conversation.total_usage:", agent.conversation.total_usage.model_dump())


async def main() -> None:
    await run_case(use_finish=False)
    await run_case(use_finish=True)


asyncio.run(main())
```

## Observed Result

The plain text case reports usage, for example:

```text
use_finish: False
response.usage_metadata: {'prompt_token_count': 2167, 'cached_content_token_count': 0, 'candidates_token_count': 17, 'thoughts_token_count': 0, 'total_token_count': 2184}
conversation.last_turn_usage: {'prompt_token_count': 2167, 'cached_content_token_count': 0, 'candidates_token_count': 17, 'thoughts_token_count': 0, 'total_token_count': 2184}
conversation.total_usage: {'prompt_token_count': 2167, 'cached_content_token_count': 0, 'candidates_token_count': 17, 'thoughts_token_count': 0, 'total_token_count': 2184}
```

The finish case completes successfully and returns structured output, but usage is absent:

```text
use_finish: True
response.usage_metadata: None
conversation.last_turn_usage: None
conversation.total_usage: {'prompt_token_count': 0, 'cached_content_token_count': 0, 'candidates_token_count': 0, 'thoughts_token_count': 0, 'total_token_count': 0}
```

With raw local-connection WebSocket logging enabled, the plain text final event includes `usageMetadata`, while the final `finish` `stepUpdate` events do not include `usageMetadata`.

## Expected Result

For a turn that completes through the built-in `finish` flow, the SDK should expose the model usage metadata for that model turn in at least one of the same public places used by regular text turns:

- `response.usage_metadata`
- `agent.conversation.last_turn_usage`
- `agent.conversation.total_usage`
- or the final `Step.usage_metadata`

The exact step/event that owns the metadata is less important than having the SDK expose authoritative usage for the turn.

## Why This Matters

We are integrating Antigravity as a governed external-agent runtime in Cortex. Cortex keeps strict boundaries for:

- provider credentials and local-harness process state,
- transcript redaction and raw evidence,
- mediated tool activity,
- audit/evidence records,
- usage and reporting projections.

For governance reasons, we should not estimate or reconstruct token counts from prompts, transcripts, or provider-specific side channels. We need the SDK/local harness to surface the authoritative usage metadata for the actual Gemini model turn, including when the turn terminates through the structured `finish` flow that applications use for reliable final results.

## Additional Notes

This issue is intentionally scoped to usage metadata availability. The finish flow itself succeeds and structured output is available. The problem is that the successful finish/tool turn appears to drop or omit the usage metadata that normal text turns expose.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finish/tool turns drop SDK usage metadata while equivalent plain text turns do report it #24

Summary

Environment

Reproduction

Observed Result

Expected Result

Why This Matters

Additional Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Finish/tool turns drop SDK usage metadata while equivalent plain text turns do report it #24

Description

Summary

Environment

Reproduction

Observed Result

Expected Result

Why This Matters

Additional Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions