Skip to content

[backend] GeminiLLMBackend — native protocol implementation + tool-use translation + conformance #329

@dep0we

Description

@dep0we

Parent

Part of meta-issue #325 (expand LLM provider catalog). Architectural decision to land in core is documented there.

What's different about Gemini

Unlike Grok / Deepseek / OpenRouter (all OpenAI-compatible at the wire level), Gemini exposes a native protocol with its own:

  • Request/response shape (generateContent endpoint, not /chat/completions)
  • Tool-use format (function declarations + function calls / responses; different from OpenAI's tool_calls and Anthropic's tool_use blocks)
  • Content-block structure (parts array with text / inlineData / functionCall / functionResponse)
  • Streaming semantics (Server-Sent Events with different chunk schema)
  • Multi-turn convention (alternating user / model roles; tool responses inline as model parts)
  • System instruction handling (separate systemInstruction field, not a message role)

Implementation mirrors atomic_agents/llm/anthropic.py rather than moonshot.py. ~300-500 LOC with translation at the call() boundary. Full adversarial review cadence (3 Opus rounds minimum) per CLAUDE.md taste rule 11.

Why this is the biggest scope of the 4

OpenAI-shape backends (Grok / Deepseek / OpenRouter) are factories over an already-tested openai_compat.py layer. Bugs in the layer affect all of them uniformly. Gemini introduces a SECOND translation surface — every tool-use round trip, every cache directive, every multi-modal input has to translate between canonical types and Gemini's parts-array format. That's net-new code paths every conformance test exercises.

Implementation

atomic_agents/llm/gemini.py (new file, ~300-500 LOC)

Structural mirror of atomic_agents/llm/anthropic.py:

  • GeminiLLMBackend class implementing SyncLLMBackend Protocol
  • _translate_outbound_messages(canonical_messages) -> gemini_contents — canonical message list → Gemini contents array. System messages route to systemInstruction. Tool definitions route to tools[].functionDeclarations. Tool results route to parts[].functionResponse.
  • _translate_inbound_response(gemini_response) -> _RawLLMResponse — Gemini's candidates[0].content.parts → canonical text + LLMToolUse[]. Multiple parts (mixed text + function_call) merge into a single _RawLLMResponse.
  • _get_capabilities(model) returns LLMCapabilities per model (Gemini 2.0 Flash, 2.5 Pro, etc., with their respective context windows + tool-use support + caching tiers).
  • HTTP client via httpx (consistent with anthropic.py).
  • API key via Authorization: Bearer <key> or ?key=<key> URL param depending on which endpoint surface — decide in plan-eng-review (Google's docs recommend Bearer for v1beta+ but URL param is still supported and simpler for testing).

atomic_agents/_llm.py — key resolver

def _get_gemini_key() -> str:
    return _get_key(
        env_vars=["ATOMIC_AGENTS_GEMINI_KEY", "GOOGLE_API_KEY", "GEMINI_API_KEY"],
        keychain_name="atomic-agents-gemini",
        config_key="gemini",
    )

GOOGLE_API_KEY is Google's canonical env var (covers Gemini + other Google AI Studio APIs); GEMINI_API_KEY accepted for operator convenience.

atomic_agents/_costs.py — pricing table

# Gemini pricing as of impl date. Source: https://ai.google.dev/gemini-api/docs/pricing.
"gemini/gemini-2.0-flash":      {"input": 0.10, "output": 0.40},
"gemini/gemini-2.0-flash-lite": {"input": 0.075, "output": 0.30},
"gemini/gemini-2.5-pro":        {"input": 1.25, "output": 5.00},
"gemini/gemini-2.5-flash":      {"input": 0.30, "output": 2.50},

Verify exact numbers at impl time; Gemini pricing has tier brackets (long-context surcharge above 128K) that may need additional table entries. Resolve in plan-eng-review.

Registration in atomic_agents/llm/__init__.py

Add GeminiLLMBackend() to _register_default_backends(). Skip silently when no key resolves.

Doctor integration

check_provider_keys extends to Gemini. Same shape as the OpenAI-compat providers.

Spec/31 amendments (substantial)

  • §"Reference implementations" gains a third entry: GeminiLLMBackend (gemini.py). Note that this is a native-protocol backend, not an OpenAI-compat factory.
  • §"Tool definition / tool result translation" gets a new sub-bullet documenting Gemini's functionDeclarations / functionCall / functionResponse translation contract.
  • §"Default model" considers whether the framework's default-model recommendation should change. (Recommendation: NO — Anthropic stays primary; Gemini is opt-in via provider: gemini in model.md.)
  • §"Conformance requirements" — verify Gemini-specific edge cases are exercised: multi-turn with mixed text + function_call parts, system instruction handling, long-context request shape, tool-result-as-part round-trip.

Conformance suite parametrization

Gemini's parts-array shape needs dedicated conformance tests beyond what the OpenAI-compat backends cover:

  • test_gemini_multi_part_response_merges_to_single_RawLLMResponse
  • test_gemini_system_instruction_routes_separately
  • test_gemini_tool_result_as_function_response_part
  • test_gemini_long_context_pricing_tier_handling (if pricing tiers exist)
  • test_gemini_streaming_chunk_translation (deferred to StreamingLLMBackend; out of scope for this issue)

_costs.py cache-tier accounting

Gemini has implicit/explicit caching (explicit via cachedContent API; implicit caching is automatic). If caching pricing matters for cost-guardrail accuracy, add cached-input tier entries:

"gemini/gemini-2.5-pro":          {"input": 1.25, "output": 5.00},
"gemini/gemini-2.5-pro-cached":   {"input": 0.31, "output": 5.00},  # 75% discount on cached input

Decide in plan-eng-review whether to surface caching as a _RawLLMResponse field (so cost accounting reflects actual cached-input usage) or punt to v1.1.

Acceptance criteria

  • atomic_agents/llm/gemini.py shipped with GeminiLLMBackend satisfying SyncLLMBackend Protocol.
  • Translation tested round-trip: canonical → Gemini wire → canonical for text-only, tool-call, tool-result, multi-turn, and system-instruction scenarios.
  • Registration skips silently when no Gemini key resolves.
  • _get_gemini_key() resolver with 3-env-var ladder.
  • Pricing table entries for Gemini 2.0 Flash + 2.5 Pro + 2.5 Flash + 2.0 Flash Lite (or current catalog at impl time).
  • Doctor check_provider_keys enumerates Gemini.
  • Conformance suite parametrizes Gemini + adds Gemini-specific tests for parts-array translation correctness.
  • spec/31 amendments to §"Reference implementations" + §"Tool definition / tool result translation" + §"Conformance requirements".
  • CHANGELOG entry covering native-protocol-backend addition and translation contract.
  • Smoke test exercising one real Gemini call via mocked HTTP (e.g. respx).
  • 3+ rounds of Opus adversarial review per CLAUDE.md taste rule 11; convergence target Round 3 LOW only.

Out of scope

  • Gemini streaming (reserved StreamingLLMBackend Protocol per spec/31).
  • Gemini async (reserved AsyncLLMBackend Protocol).
  • Multi-modal image / video / audio inputs — defer to a v1.1 follow-up that extends LLMToolDefinition / canonical types to support binary content blocks.
  • Vertex AI (separate auth + endpoint surface from AI Studio; different product). Vertex would be its own backend if added.
  • Gemini Code Execution tool (gated behind a separate API).
  • Explicit cachedContent API for prompt-template caching — possible v1.1 once cost-tier accounting decision is settled.

Risk register

  • Translation correctness is the biggest risk. Gemini's parts-array shape has many valid permutations; canonical-types round-trip must preserve every one. Heavy unit-test discipline required.
  • Long-context pricing tiers. Gemini's "long-context surcharge above 128K tokens" needs _costs.py table entries OR a per-call tier-resolution function. Decide in plan-eng-review.
  • Cached-input accounting. Implicit caching means a single prompt may be cheaper than the table suggests. If we don't track this, the framework's reported cost overstates. Cross-cutting concern; same shape as Anthropic prompt caching.
  • API surface churn. Google has changed Gemini's API surface meaningfully across versions (v1 → v1beta → v1beta2). Pin a specific API version in the backend config; document the version in spec/31.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendProtocol-pattern backend abstractions (memory, logs, locks, etc.)enhancementNew feature or requestspecImplementation of an Atomic Agents spec doc

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions