Skip to content

feat: route Gemma's reasoning output to the Thinking panel #178

@rdwj

Description

@rdwj

Background

Recent Gemma variants ship a "thinking mode" capability whose output uses raw special tokens rather than the <think>...</think> tags that the existing ThinkTagParser recognises. See:

https://ai.google.dev/gemma/docs/capabilities/thinking

When vllm serve is invoked with the right reasoning-parser flag, vLLM extracts these specials server-side and populates delta.reasoning_content directly — at which point BaseAgent.astep_stream already routes them correctly. But for deployments that don't / can't set the vLLM flag (or for runtimes that don't have the parser), the Gemma reasoning leaks into delta.content, polluting the visible answer and the assistant message threaded back into the next turn.

Suggested approach

Once #177 (the implicit-open Nemotron path) lands, follow the same pattern:

  1. Investigate Gemma's actual output shape (what tokens, in what order, with what stream-time framing)
  2. If Gemma's output can be parsed in pure Python like the <think> model family, add a Gemma-specific parser branch and match "gemma" in create_reasoning_parser
  3. If the format is too tightly coupled to tokenizer specials to parse without a tokenizer, document the recommended --reasoning-parser flag for vLLM and have create_reasoning_parser return None for Gemma (matching current behaviour for models that emit reasoning_content natively)

Reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions