Skip to content

Finish/tool turns drop SDK usage metadata while equivalent plain text turns do report it #24

@abdes

Description

@abdes

Summary

google-antigravity==0.1.0 appears not to expose token usage metadata for a local-harness Agent turn that completes through the built-in finish structured-output flow. The same local Agent setup does expose usageMetadata for a plain text response, so this looks specific to the finish/tool completion path rather than the Agent API in general.

I am not asking for billing guarantees from the SDK. The request is for the same model usage metadata that is already exposed on normal text turns to also be attached to the turn/step when the model terminates through the built-in finish flow.

Environment

  • Package: google-antigravity==0.1.0
  • Python: 3.14
  • Host: Linux x86_64
  • Execution strategy: local harness
  • Model tested: Gemini through GeminiConfig

Reproduction

Set GEMINI_API_KEY in the environment and run:

from __future__ import annotations

import asyncio
import json
import os
import tempfile
from pathlib import Path

from google.antigravity import (
    Agent,
    CapabilitiesConfig,
    GeminiConfig,
    GenerationConfig,
    LocalAgentConfig,
    ModelConfig,
    ModelEntry,
)


MODEL = os.environ.get("GEMINI_MODEL", "gemini-3.1-flash-lite")
FINISH_SCHEMA = {
    "type": "object",
    "additionalProperties": False,
    "required": ["summary", "content"],
    "properties": {
        "summary": {"type": "string"},
        "content": {"type": "string"},
    },
}


def _config(root: Path, *, use_finish: bool) -> LocalAgentConfig:
    capabilities = CapabilitiesConfig(
        enabled_tools=["finish"] if use_finish else [],
        compaction_threshold=50000,
        finish_tool_schema_json=json.dumps(FINISH_SCHEMA, separators=(",", ":"), sort_keys=True)
        if use_finish
        else None,
    )
    return LocalAgentConfig(
        system_instructions="Return only through finish." if use_finish else "Answer briefly.",
        capabilities=capabilities,
        policies=[],
        workspaces=[],
        save_dir=str(root / "save"),
        app_data_dir=str(root / "app-data"),
        response_schema=FINISH_SCHEMA if use_finish else None,
        skills_paths=[],
        gemini_config=GeminiConfig(
            api_key=os.environ["GEMINI_API_KEY"],
            models=ModelConfig(
                default=ModelEntry(
                    name=MODEL,
                    generation=GenerationConfig(thinking_level=None),
                ),
                image_generation=ModelEntry(
                    name=MODEL,
                    generation=GenerationConfig(thinking_level=None),
                ),
            ),
        ),
    )


async def run_case(*, use_finish: bool) -> None:
    with tempfile.TemporaryDirectory(prefix="ag-usage-debug-") as temp_dir:
        root = Path(temp_dir)
        async with Agent(_config(root, use_finish=use_finish)) as agent:
            response = await agent.chat(
                "Call finish with summary and content saying usage debug completed."
                if use_finish
                else "Say hello in one sentence."
            )
            await response.resolve()
            print("use_finish:", use_finish)
            print(
                "response.usage_metadata:",
                None if response.usage_metadata is None else response.usage_metadata.model_dump(),
            )
            print(
                "conversation.last_turn_usage:",
                None
                if agent.conversation.last_turn_usage is None
                else agent.conversation.last_turn_usage.model_dump(),
            )
            print("conversation.total_usage:", agent.conversation.total_usage.model_dump())


async def main() -> None:
    await run_case(use_finish=False)
    await run_case(use_finish=True)


asyncio.run(main())

Observed Result

The plain text case reports usage, for example:

use_finish: False
response.usage_metadata: {'prompt_token_count': 2167, 'cached_content_token_count': 0, 'candidates_token_count': 17, 'thoughts_token_count': 0, 'total_token_count': 2184}
conversation.last_turn_usage: {'prompt_token_count': 2167, 'cached_content_token_count': 0, 'candidates_token_count': 17, 'thoughts_token_count': 0, 'total_token_count': 2184}
conversation.total_usage: {'prompt_token_count': 2167, 'cached_content_token_count': 0, 'candidates_token_count': 17, 'thoughts_token_count': 0, 'total_token_count': 2184}

The finish case completes successfully and returns structured output, but usage is absent:

use_finish: True
response.usage_metadata: None
conversation.last_turn_usage: None
conversation.total_usage: {'prompt_token_count': 0, 'cached_content_token_count': 0, 'candidates_token_count': 0, 'thoughts_token_count': 0, 'total_token_count': 0}

With raw local-connection WebSocket logging enabled, the plain text final event includes usageMetadata, while the final finish stepUpdate events do not include usageMetadata.

Expected Result

For a turn that completes through the built-in finish flow, the SDK should expose the model usage metadata for that model turn in at least one of the same public places used by regular text turns:

  • response.usage_metadata
  • agent.conversation.last_turn_usage
  • agent.conversation.total_usage
  • or the final Step.usage_metadata

The exact step/event that owns the metadata is less important than having the SDK expose authoritative usage for the turn.

Why This Matters

We are integrating Antigravity as a governed external-agent runtime in Cortex. Cortex keeps strict boundaries for:

  • provider credentials and local-harness process state,
  • transcript redaction and raw evidence,
  • mediated tool activity,
  • audit/evidence records,
  • usage and reporting projections.

For governance reasons, we should not estimate or reconstruct token counts from prompts, transcripts, or provider-specific side channels. We need the SDK/local harness to surface the authoritative usage metadata for the actual Gemini model turn, including when the turn terminates through the structured finish flow that applications use for reliable final results.

Additional Notes

This issue is intentionally scoped to usage metadata availability. The finish flow itself succeeds and structured output is available. The problem is that the successful finish/tool turn appears to drop or omit the usage metadata that normal text turns expose.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions