-
-
Notifications
You must be signed in to change notification settings - Fork 152
10. Tool Calling
Tool calling is available for supported models, and enabled by selecting a tool format in the model
config. This can also be specifed per model using tabby_config.yml.
Most tool-calling models are also reasoning models and it is recommended to enable reasoning as well, with appropriate reasoning tags (these cannot currently be inferred from the model's template).
model:
reasoning: true
reasoning_start_token: "<think>"
reasoning_end_token: "</think>"
tool_format: qwen3_5Below are the currently recognized formats
| tool_format | Aliases | Model types |
|---|---|---|
| qwen3_coder | qwen3_5, step3_5 | Qwen3-Coder Qwen3-Next Qwen3.5 |
| minimax_m2 | Minimax-M2 Minimax-M2.1 Minimax-M2.5 |
|
| glm4_5 | glm4_6 glm4_7 |
GLM4.5 GLM4.6 GLM4.7 |
| mistral_old ¹ | (older Mistral-family models) | |
| mistral | Codestral 2508+ Devstral-Small 2507+ Magistral-Medium 2506+ Magistral-Small 2506+ Ministral-3 2512+ Mistral-Medium-3.1 2508+ Mistral-Small-3.2 2506+ |
|
| gemma4 | Gemma 4-it |
¹ Older Mistral models tend to have unreliable tool calling support and even newer ones are often released without official chat templates or with templates that omit any tool formatting. Tokenization also changes frequently between model releases. YMMV
TabbyAPI should support any software that uses the OAI tool calling API. But the standard is evolving, no two clients can agree on exactly what it looks like and models are trained with different assumptions as well. Below will be collected notes pertaining to various client software and how it relates to TabbyAPI's tool calling support.
-
OpenCode by default forces categorical sampling, overriding TabbyAPI's defaults with top-P = 1.0. This confuses some models, so if you're experiencing occasional random gibberish in your output, check your OpenCode config to make sure sampling is configured there, e.g.:
"agent": { "build": { "top_p": 0.8 }, "plan": { "top_p": 0.8 } }
-
OpenCode doesn't explicitly enable reasoning in the request by default. For some models this doesn't matter. For others (e.g. Gemma4) you can configure
force_enable_thinking: truein TabbyAPI.