Skip to content

10. Tool Calling

github-actions[bot] edited this page Apr 18, 2026 · 6 revisions

Tool Calling in TabbyAPI

Tool calling is available for supported models, and enabled by selecting a tool format in the model config. This can also be specifed per model using tabby_config.yml.

Most tool-calling models are also reasoning models and it is recommended to enable reasoning as well, with appropriate reasoning tags (these cannot currently be inferred from the model's template).

model:
    reasoning: true
    reasoning_start_token: "<think>"
    reasoning_end_token: "</think>"
    tool_format: qwen3_5

Supported formats

Below are the currently recognized formats

tool_format Aliases Model types
qwen3_coder qwen3_5, step3_5 Qwen3-Coder
Qwen3-Next
Qwen3.5
minimax_m2 Minimax-M2
Minimax-M2.1
Minimax-M2.5
glm4_5 glm4_6
glm4_7
GLM4.5
GLM4.6
GLM4.7
mistral_old ¹ (older Mistral-family models)
mistral Codestral 2508+
Devstral-Small 2507+
Magistral-Medium 2506+
Magistral-Small 2506+
Ministral-3 2512+
Mistral-Medium-3.1 2508+
Mistral-Small-3.2 2506+
gemma4 Gemma 4-it

¹ Older Mistral models tend to have unreliable tool calling support and even newer ones are often released without official chat templates or with templates that omit any tool formatting. Tokenization also changes frequently between model releases. YMMV

Clients

TabbyAPI should support any software that uses the OAI tool calling API. But the standard is evolving, no two clients can agree on exactly what it looks like and models are trained with different assumptions as well. Below will be collected notes pertaining to various client software and how it relates to TabbyAPI's tool calling support.

OpenCode

  • OpenCode by default forces categorical sampling, overriding TabbyAPI's defaults with top-P = 1.0. This confuses some models, so if you're experiencing occasional random gibberish in your output, check your OpenCode config to make sure sampling is configured there, e.g.:

    "agent": {
      "build": {
        "top_p": 0.8
      },
      "plan": {
        "top_p": 0.8
      }
    }
  • OpenCode doesn't explicitly enable reasoning in the request by default. For some models this doesn't matter. For others (e.g. Gemma4) you can configure force_enable_thinking: true in TabbyAPI.

Clone this wiki locally