[backend] GeminiLLMBackend — native protocol implementation + tool-use translation + conformance

## Parent

Part of meta-issue #325 (expand LLM provider catalog). Architectural decision to land in core is documented there.

## What's different about Gemini

Unlike Grok / Deepseek / OpenRouter (all OpenAI-compatible at the wire level), Gemini exposes a **native protocol** with its own:

- Request/response shape (`generateContent` endpoint, not `/chat/completions`)
- Tool-use format (function declarations + function calls / responses; different from OpenAI's `tool_calls` and Anthropic's `tool_use` blocks)
- Content-block structure (`parts` array with text / inlineData / functionCall / functionResponse)
- Streaming semantics (Server-Sent Events with different chunk schema)
- Multi-turn convention (alternating user / model roles; tool responses inline as model parts)
- System instruction handling (separate `systemInstruction` field, not a message role)

Implementation mirrors `atomic_agents/llm/anthropic.py` rather than `moonshot.py`. ~300-500 LOC with translation at the `call()` boundary. Full adversarial review cadence (3 Opus rounds minimum) per CLAUDE.md taste rule 11.

## Why this is the biggest scope of the 4

OpenAI-shape backends (Grok / Deepseek / OpenRouter) are factories over an already-tested `openai_compat.py` layer. Bugs in the layer affect all of them uniformly. Gemini introduces a SECOND translation surface — every tool-use round trip, every cache directive, every multi-modal input has to translate between canonical types and Gemini's `parts`-array format. That's net-new code paths every conformance test exercises.

## Implementation

### `atomic_agents/llm/gemini.py` (new file, ~300-500 LOC)

Structural mirror of `atomic_agents/llm/anthropic.py`:

- `GeminiLLMBackend` class implementing `SyncLLMBackend` Protocol
- `_translate_outbound_messages(canonical_messages) -> gemini_contents` — canonical message list → Gemini `contents` array. System messages route to `systemInstruction`. Tool definitions route to `tools[].functionDeclarations`. Tool results route to `parts[].functionResponse`.
- `_translate_inbound_response(gemini_response) -> _RawLLMResponse` — Gemini's `candidates[0].content.parts` → canonical text + `LLMToolUse[]`. Multiple `parts` (mixed text + function_call) merge into a single `_RawLLMResponse`.
- `_get_capabilities(model)` returns `LLMCapabilities` per model (Gemini 2.0 Flash, 2.5 Pro, etc., with their respective context windows + tool-use support + caching tiers).
- HTTP client via `httpx` (consistent with anthropic.py).
- API key via `Authorization: Bearer <key>` or `?key=<key>` URL param depending on which endpoint surface — decide in plan-eng-review (Google's docs recommend Bearer for v1beta+ but URL param is still supported and simpler for testing).

### `atomic_agents/_llm.py` — key resolver

```python
def _get_gemini_key() -> str:
    return _get_key(
        env_vars=["ATOMIC_AGENTS_GEMINI_KEY", "GOOGLE_API_KEY", "GEMINI_API_KEY"],
        keychain_name="atomic-agents-gemini",
        config_key="gemini",
    )
```

`GOOGLE_API_KEY` is Google's canonical env var (covers Gemini + other Google AI Studio APIs); `GEMINI_API_KEY` accepted for operator convenience.

### `atomic_agents/_costs.py` — pricing table

```python
# Gemini pricing as of impl date. Source: https://ai.google.dev/gemini-api/docs/pricing.
"gemini/gemini-2.0-flash":      {"input": 0.10, "output": 0.40},
"gemini/gemini-2.0-flash-lite": {"input": 0.075, "output": 0.30},
"gemini/gemini-2.5-pro":        {"input": 1.25, "output": 5.00},
"gemini/gemini-2.5-flash":      {"input": 0.30, "output": 2.50},
```

Verify exact numbers at impl time; Gemini pricing has tier brackets (long-context surcharge above 128K) that may need additional table entries. Resolve in plan-eng-review.

### Registration in `atomic_agents/llm/__init__.py`

Add `GeminiLLMBackend()` to `_register_default_backends()`. Skip silently when no key resolves.

### Doctor integration

`check_provider_keys` extends to Gemini. Same shape as the OpenAI-compat providers.

### Spec/31 amendments (substantial)

- §"Reference implementations" gains a third entry: `GeminiLLMBackend (gemini.py)`. Note that this is a native-protocol backend, not an OpenAI-compat factory.
- §"Tool definition / tool result translation" gets a new sub-bullet documenting Gemini's `functionDeclarations` / `functionCall` / `functionResponse` translation contract.
- §"Default model" considers whether the framework's default-model recommendation should change. (Recommendation: NO — Anthropic stays primary; Gemini is opt-in via `provider: gemini` in `model.md`.)
- §"Conformance requirements" — verify Gemini-specific edge cases are exercised: multi-turn with mixed text + function_call parts, system instruction handling, long-context request shape, tool-result-as-part round-trip.

### Conformance suite parametrization

Gemini's `parts`-array shape needs dedicated conformance tests beyond what the OpenAI-compat backends cover:

- `test_gemini_multi_part_response_merges_to_single_RawLLMResponse`
- `test_gemini_system_instruction_routes_separately`
- `test_gemini_tool_result_as_function_response_part`
- `test_gemini_long_context_pricing_tier_handling` (if pricing tiers exist)
- `test_gemini_streaming_chunk_translation` (deferred to StreamingLLMBackend; out of scope for this issue)

### `_costs.py` cache-tier accounting

Gemini has implicit/explicit caching (explicit via `cachedContent` API; implicit caching is automatic). If caching pricing matters for cost-guardrail accuracy, add cached-input tier entries:

```python
"gemini/gemini-2.5-pro":          {"input": 1.25, "output": 5.00},
"gemini/gemini-2.5-pro-cached":   {"input": 0.31, "output": 5.00},  # 75% discount on cached input
```

Decide in plan-eng-review whether to surface caching as a `_RawLLMResponse` field (so cost accounting reflects actual cached-input usage) or punt to v1.1.

## Acceptance criteria

- [ ] `atomic_agents/llm/gemini.py` shipped with `GeminiLLMBackend` satisfying `SyncLLMBackend` Protocol.
- [ ] Translation tested round-trip: canonical → Gemini wire → canonical for text-only, tool-call, tool-result, multi-turn, and system-instruction scenarios.
- [ ] Registration skips silently when no Gemini key resolves.
- [ ] `_get_gemini_key()` resolver with 3-env-var ladder.
- [ ] Pricing table entries for Gemini 2.0 Flash + 2.5 Pro + 2.5 Flash + 2.0 Flash Lite (or current catalog at impl time).
- [ ] Doctor `check_provider_keys` enumerates Gemini.
- [ ] Conformance suite parametrizes Gemini + adds Gemini-specific tests for `parts`-array translation correctness.
- [ ] spec/31 amendments to §"Reference implementations" + §"Tool definition / tool result translation" + §"Conformance requirements".
- [ ] CHANGELOG entry covering native-protocol-backend addition and translation contract.
- [ ] Smoke test exercising one real Gemini call via mocked HTTP (e.g. respx).
- [ ] 3+ rounds of Opus adversarial review per CLAUDE.md taste rule 11; convergence target Round 3 LOW only.

## Out of scope

- Gemini streaming (reserved `StreamingLLMBackend` Protocol per spec/31).
- Gemini async (reserved `AsyncLLMBackend` Protocol).
- Multi-modal image / video / audio inputs — defer to a v1.1 follow-up that extends `LLMToolDefinition` / canonical types to support binary content blocks.
- Vertex AI (separate auth + endpoint surface from AI Studio; different product). Vertex would be its own backend if added.
- Gemini Code Execution tool (gated behind a separate API).
- Explicit `cachedContent` API for prompt-template caching — possible v1.1 once cost-tier accounting decision is settled.

## Risk register

- **Translation correctness is the biggest risk.** Gemini's `parts`-array shape has many valid permutations; canonical-types round-trip must preserve every one. Heavy unit-test discipline required.
- **Long-context pricing tiers.** Gemini's "long-context surcharge above 128K tokens" needs `_costs.py` table entries OR a per-call tier-resolution function. Decide in plan-eng-review.
- **Cached-input accounting.** Implicit caching means a single prompt may be cheaper than the table suggests. If we don't track this, the framework's reported cost overstates. Cross-cutting concern; same shape as Anthropic prompt caching.
- **API surface churn.** Google has changed Gemini's API surface meaningfully across versions (v1 → v1beta → v1beta2). Pin a specific API version in the backend config; document the version in spec/31.

## References

- Parent: #325.
- Pattern: `atomic_agents/llm/anthropic.py` (closest existing analog; both native protocols).
- Architecture: spec/31 (gets substantial amendments).
- Gemini API docs: https://ai.google.dev/gemini-api/docs/api-overview.
- Gemini tool use: https://ai.google.dev/gemini-api/docs/function-calling.
- Gemini pricing: https://ai.google.dev/gemini-api/docs/pricing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[backend] GeminiLLMBackend — native protocol implementation + tool-use translation + conformance #329

Parent

What's different about Gemini

Why this is the biggest scope of the 4

Implementation

`atomic_agents/llm/gemini.py` (new file, ~300-500 LOC)

`atomic_agents/_llm.py` — key resolver

`atomic_agents/_costs.py` — pricing table

Registration in `atomic_agents/llm/init.py`

Doctor integration

Spec/31 amendments (substantial)

Conformance suite parametrization

`_costs.py` cache-tier accounting

Acceptance criteria

Out of scope

Risk register

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[backend] GeminiLLMBackend — native protocol implementation + tool-use translation + conformance #329

Description

Parent

What's different about Gemini

Why this is the biggest scope of the 4

Implementation

atomic_agents/llm/gemini.py (new file, ~300-500 LOC)

atomic_agents/_llm.py — key resolver

atomic_agents/_costs.py — pricing table

Registration in atomic_agents/llm/__init__.py

Doctor integration

Spec/31 amendments (substantial)

Conformance suite parametrization

_costs.py cache-tier accounting

Acceptance criteria

Out of scope

Risk register

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`atomic_agents/llm/gemini.py` (new file, ~300-500 LOC)

`atomic_agents/_llm.py` — key resolver

`atomic_agents/_costs.py` — pricing table

Registration in `atomic_agents/llm/init.py`

`_costs.py` cache-tier accounting