Parent
Part of meta-issue #325 (expand LLM provider catalog). Architectural decision to land in core is documented there.
What's different about Gemini
Unlike Grok / Deepseek / OpenRouter (all OpenAI-compatible at the wire level), Gemini exposes a native protocol with its own:
- Request/response shape (
generateContent endpoint, not /chat/completions)
- Tool-use format (function declarations + function calls / responses; different from OpenAI's
tool_calls and Anthropic's tool_use blocks)
- Content-block structure (
parts array with text / inlineData / functionCall / functionResponse)
- Streaming semantics (Server-Sent Events with different chunk schema)
- Multi-turn convention (alternating user / model roles; tool responses inline as model parts)
- System instruction handling (separate
systemInstruction field, not a message role)
Implementation mirrors atomic_agents/llm/anthropic.py rather than moonshot.py. ~300-500 LOC with translation at the call() boundary. Full adversarial review cadence (3 Opus rounds minimum) per CLAUDE.md taste rule 11.
Why this is the biggest scope of the 4
OpenAI-shape backends (Grok / Deepseek / OpenRouter) are factories over an already-tested openai_compat.py layer. Bugs in the layer affect all of them uniformly. Gemini introduces a SECOND translation surface — every tool-use round trip, every cache directive, every multi-modal input has to translate between canonical types and Gemini's parts-array format. That's net-new code paths every conformance test exercises.
Implementation
atomic_agents/llm/gemini.py (new file, ~300-500 LOC)
Structural mirror of atomic_agents/llm/anthropic.py:
GeminiLLMBackend class implementing SyncLLMBackend Protocol
_translate_outbound_messages(canonical_messages) -> gemini_contents — canonical message list → Gemini contents array. System messages route to systemInstruction. Tool definitions route to tools[].functionDeclarations. Tool results route to parts[].functionResponse.
_translate_inbound_response(gemini_response) -> _RawLLMResponse — Gemini's candidates[0].content.parts → canonical text + LLMToolUse[]. Multiple parts (mixed text + function_call) merge into a single _RawLLMResponse.
_get_capabilities(model) returns LLMCapabilities per model (Gemini 2.0 Flash, 2.5 Pro, etc., with their respective context windows + tool-use support + caching tiers).
- HTTP client via
httpx (consistent with anthropic.py).
- API key via
Authorization: Bearer <key> or ?key=<key> URL param depending on which endpoint surface — decide in plan-eng-review (Google's docs recommend Bearer for v1beta+ but URL param is still supported and simpler for testing).
atomic_agents/_llm.py — key resolver
def _get_gemini_key() -> str:
return _get_key(
env_vars=["ATOMIC_AGENTS_GEMINI_KEY", "GOOGLE_API_KEY", "GEMINI_API_KEY"],
keychain_name="atomic-agents-gemini",
config_key="gemini",
)
GOOGLE_API_KEY is Google's canonical env var (covers Gemini + other Google AI Studio APIs); GEMINI_API_KEY accepted for operator convenience.
atomic_agents/_costs.py — pricing table
# Gemini pricing as of impl date. Source: https://ai.google.dev/gemini-api/docs/pricing.
"gemini/gemini-2.0-flash": {"input": 0.10, "output": 0.40},
"gemini/gemini-2.0-flash-lite": {"input": 0.075, "output": 0.30},
"gemini/gemini-2.5-pro": {"input": 1.25, "output": 5.00},
"gemini/gemini-2.5-flash": {"input": 0.30, "output": 2.50},
Verify exact numbers at impl time; Gemini pricing has tier brackets (long-context surcharge above 128K) that may need additional table entries. Resolve in plan-eng-review.
Registration in atomic_agents/llm/__init__.py
Add GeminiLLMBackend() to _register_default_backends(). Skip silently when no key resolves.
Doctor integration
check_provider_keys extends to Gemini. Same shape as the OpenAI-compat providers.
Spec/31 amendments (substantial)
- §"Reference implementations" gains a third entry:
GeminiLLMBackend (gemini.py). Note that this is a native-protocol backend, not an OpenAI-compat factory.
- §"Tool definition / tool result translation" gets a new sub-bullet documenting Gemini's
functionDeclarations / functionCall / functionResponse translation contract.
- §"Default model" considers whether the framework's default-model recommendation should change. (Recommendation: NO — Anthropic stays primary; Gemini is opt-in via
provider: gemini in model.md.)
- §"Conformance requirements" — verify Gemini-specific edge cases are exercised: multi-turn with mixed text + function_call parts, system instruction handling, long-context request shape, tool-result-as-part round-trip.
Conformance suite parametrization
Gemini's parts-array shape needs dedicated conformance tests beyond what the OpenAI-compat backends cover:
test_gemini_multi_part_response_merges_to_single_RawLLMResponse
test_gemini_system_instruction_routes_separately
test_gemini_tool_result_as_function_response_part
test_gemini_long_context_pricing_tier_handling (if pricing tiers exist)
test_gemini_streaming_chunk_translation (deferred to StreamingLLMBackend; out of scope for this issue)
_costs.py cache-tier accounting
Gemini has implicit/explicit caching (explicit via cachedContent API; implicit caching is automatic). If caching pricing matters for cost-guardrail accuracy, add cached-input tier entries:
"gemini/gemini-2.5-pro": {"input": 1.25, "output": 5.00},
"gemini/gemini-2.5-pro-cached": {"input": 0.31, "output": 5.00}, # 75% discount on cached input
Decide in plan-eng-review whether to surface caching as a _RawLLMResponse field (so cost accounting reflects actual cached-input usage) or punt to v1.1.
Acceptance criteria
Out of scope
- Gemini streaming (reserved
StreamingLLMBackend Protocol per spec/31).
- Gemini async (reserved
AsyncLLMBackend Protocol).
- Multi-modal image / video / audio inputs — defer to a v1.1 follow-up that extends
LLMToolDefinition / canonical types to support binary content blocks.
- Vertex AI (separate auth + endpoint surface from AI Studio; different product). Vertex would be its own backend if added.
- Gemini Code Execution tool (gated behind a separate API).
- Explicit
cachedContent API for prompt-template caching — possible v1.1 once cost-tier accounting decision is settled.
Risk register
- Translation correctness is the biggest risk. Gemini's
parts-array shape has many valid permutations; canonical-types round-trip must preserve every one. Heavy unit-test discipline required.
- Long-context pricing tiers. Gemini's "long-context surcharge above 128K tokens" needs
_costs.py table entries OR a per-call tier-resolution function. Decide in plan-eng-review.
- Cached-input accounting. Implicit caching means a single prompt may be cheaper than the table suggests. If we don't track this, the framework's reported cost overstates. Cross-cutting concern; same shape as Anthropic prompt caching.
- API surface churn. Google has changed Gemini's API surface meaningfully across versions (v1 → v1beta → v1beta2). Pin a specific API version in the backend config; document the version in spec/31.
References
Parent
Part of meta-issue #325 (expand LLM provider catalog). Architectural decision to land in core is documented there.
What's different about Gemini
Unlike Grok / Deepseek / OpenRouter (all OpenAI-compatible at the wire level), Gemini exposes a native protocol with its own:
generateContentendpoint, not/chat/completions)tool_callsand Anthropic'stool_useblocks)partsarray with text / inlineData / functionCall / functionResponse)systemInstructionfield, not a message role)Implementation mirrors
atomic_agents/llm/anthropic.pyrather thanmoonshot.py. ~300-500 LOC with translation at thecall()boundary. Full adversarial review cadence (3 Opus rounds minimum) per CLAUDE.md taste rule 11.Why this is the biggest scope of the 4
OpenAI-shape backends (Grok / Deepseek / OpenRouter) are factories over an already-tested
openai_compat.pylayer. Bugs in the layer affect all of them uniformly. Gemini introduces a SECOND translation surface — every tool-use round trip, every cache directive, every multi-modal input has to translate between canonical types and Gemini'sparts-array format. That's net-new code paths every conformance test exercises.Implementation
atomic_agents/llm/gemini.py(new file, ~300-500 LOC)Structural mirror of
atomic_agents/llm/anthropic.py:GeminiLLMBackendclass implementingSyncLLMBackendProtocol_translate_outbound_messages(canonical_messages) -> gemini_contents— canonical message list → Geminicontentsarray. System messages route tosystemInstruction. Tool definitions route totools[].functionDeclarations. Tool results route toparts[].functionResponse._translate_inbound_response(gemini_response) -> _RawLLMResponse— Gemini'scandidates[0].content.parts→ canonical text +LLMToolUse[]. Multipleparts(mixed text + function_call) merge into a single_RawLLMResponse._get_capabilities(model)returnsLLMCapabilitiesper model (Gemini 2.0 Flash, 2.5 Pro, etc., with their respective context windows + tool-use support + caching tiers).httpx(consistent with anthropic.py).Authorization: Bearer <key>or?key=<key>URL param depending on which endpoint surface — decide in plan-eng-review (Google's docs recommend Bearer for v1beta+ but URL param is still supported and simpler for testing).atomic_agents/_llm.py— key resolverGOOGLE_API_KEYis Google's canonical env var (covers Gemini + other Google AI Studio APIs);GEMINI_API_KEYaccepted for operator convenience.atomic_agents/_costs.py— pricing tableVerify exact numbers at impl time; Gemini pricing has tier brackets (long-context surcharge above 128K) that may need additional table entries. Resolve in plan-eng-review.
Registration in
atomic_agents/llm/__init__.pyAdd
GeminiLLMBackend()to_register_default_backends(). Skip silently when no key resolves.Doctor integration
check_provider_keysextends to Gemini. Same shape as the OpenAI-compat providers.Spec/31 amendments (substantial)
GeminiLLMBackend (gemini.py). Note that this is a native-protocol backend, not an OpenAI-compat factory.functionDeclarations/functionCall/functionResponsetranslation contract.provider: geminiinmodel.md.)Conformance suite parametrization
Gemini's
parts-array shape needs dedicated conformance tests beyond what the OpenAI-compat backends cover:test_gemini_multi_part_response_merges_to_single_RawLLMResponsetest_gemini_system_instruction_routes_separatelytest_gemini_tool_result_as_function_response_parttest_gemini_long_context_pricing_tier_handling(if pricing tiers exist)test_gemini_streaming_chunk_translation(deferred to StreamingLLMBackend; out of scope for this issue)_costs.pycache-tier accountingGemini has implicit/explicit caching (explicit via
cachedContentAPI; implicit caching is automatic). If caching pricing matters for cost-guardrail accuracy, add cached-input tier entries:Decide in plan-eng-review whether to surface caching as a
_RawLLMResponsefield (so cost accounting reflects actual cached-input usage) or punt to v1.1.Acceptance criteria
atomic_agents/llm/gemini.pyshipped withGeminiLLMBackendsatisfyingSyncLLMBackendProtocol._get_gemini_key()resolver with 3-env-var ladder.check_provider_keysenumerates Gemini.parts-array translation correctness.Out of scope
StreamingLLMBackendProtocol per spec/31).AsyncLLMBackendProtocol).LLMToolDefinition/ canonical types to support binary content blocks.cachedContentAPI for prompt-template caching — possible v1.1 once cost-tier accounting decision is settled.Risk register
parts-array shape has many valid permutations; canonical-types round-trip must preserve every one. Heavy unit-test discipline required._costs.pytable entries OR a per-call tier-resolution function. Decide in plan-eng-review.References
atomic_agents/llm/anthropic.py(closest existing analog; both native protocols).