Background
Recent Gemma variants ship a "thinking mode" capability whose output uses raw special tokens rather than the <think>...</think> tags that the existing ThinkTagParser recognises. See:
https://ai.google.dev/gemma/docs/capabilities/thinking
When vllm serve is invoked with the right reasoning-parser flag, vLLM extracts these specials server-side and populates delta.reasoning_content directly — at which point BaseAgent.astep_stream already routes them correctly. But for deployments that don't / can't set the vLLM flag (or for runtimes that don't have the parser), the Gemma reasoning leaks into delta.content, polluting the visible answer and the assistant message threaded back into the next turn.
Suggested approach
Once #177 (the implicit-open Nemotron path) lands, follow the same pattern:
- Investigate Gemma's actual output shape (what tokens, in what order, with what stream-time framing)
- If Gemma's output can be parsed in pure Python like the
<think> model family, add a Gemma-specific parser branch and match "gemma" in create_reasoning_parser
- If the format is too tightly coupled to tokenizer specials to parse without a tokenizer, document the recommended
--reasoning-parser flag for vLLM and have create_reasoning_parser return None for Gemma (matching current behaviour for models that emit reasoning_content natively)
Reference
Background
Recent Gemma variants ship a "thinking mode" capability whose output uses raw special tokens rather than the
<think>...</think>tags that the existingThinkTagParserrecognises. See:https://ai.google.dev/gemma/docs/capabilities/thinking
When
vllm serveis invoked with the right reasoning-parser flag, vLLM extracts these specials server-side and populatesdelta.reasoning_contentdirectly — at which pointBaseAgent.astep_streamalready routes them correctly. But for deployments that don't / can't set the vLLM flag (or for runtimes that don't have the parser), the Gemma reasoning leaks intodelta.content, polluting the visible answer and the assistant message threaded back into the next turn.Suggested approach
Once #177 (the implicit-open Nemotron path) lands, follow the same pattern:
<think>model family, add a Gemma-specific parser branch and match"gemma"increate_reasoning_parser--reasoning-parserflag for vLLM and havecreate_reasoning_parserreturnNonefor Gemma (matching current behaviour for models that emitreasoning_contentnatively)Reference
packages/fipsagents/src/fipsagents/baseagent/reasoning.py