-
Notifications
You must be signed in to change notification settings - Fork 3
24.1. Custom Chat Templates
Changes Made
- Updated documentation to reflect new control prefix handling for /think and /no_think commands
- Added details about empty think block injection in no_think mode
- Enhanced explanation of control prefix detection in message history
- Updated flowchart to include control prefix processing
- Added new section on control prefix handling
- Introduction
- Project Structure
- Core Components
- Architecture Overview
- Detailed Component Analysis
- Control Prefix Handling
- Dependency Analysis
- Performance Considerations
- Troubleshooting Guide
- Conclusion
This document provides comprehensive documentation for the custom chat templates implementation in the Oxide-Lab repository. The system enables model-specific prompt formatting for different language models such as Qwen and Llama 3 through a flexible template system that integrates frontend and backend components. The documentation covers the architecture, implementation details, and integration points between the frontend prompt definitions in prompts.ts and the backend tokenizer logic in tokenizer.rs. Special attention is given to how the system handles model-specific requirements, injects special tokens, and provides fallback mechanisms for unsupported models.
The project structure reveals a clear separation between frontend and backend components, with the chat template functionality spanning both domains. The frontend components are located in the src/lib/chat directory, while the backend implementation resides in src-tauri/src. This separation allows for independent development and testing of the user interface and core model processing logic.
``mermaid graph TB subgraph "Frontend" A[src/lib/chat] A1[prompts.ts] A2[controller.ts] A3[sanitize.ts] end subgraph "Backend" B[src-tauri/src] B1[core/tokenizer.rs] B2[api/mod.rs] B3[models/qwen3.rs] B4[state.rs] end A1 --> |"Tauri API calls"| B2 B2 --> |"Template rendering"| B1 B3 --> |"Model state"| B4
**Diagram sources**
- [prompts.ts](file://src/lib/chat/prompts.ts)
- [tokenizer.rs](file://src-tauri/src/core/tokenizer.rs)
- [api/mod.rs](file://src-tauri/src/api/mod.rs)
- [qwen3.rs](file://src-tauri/src/models/qwen3.rs)
- [state.rs](file://src-tauri/src/state.rs)
**Section sources**
- [prompts.ts](file://src/lib/chat/prompts.ts)
- [tokenizer.rs](file://src-tauri/src/core/tokenizer.rs)
## Core Components
The custom chat template system consists of several core components that work together to provide model-specific prompt formatting. The frontend component (prompts.ts) handles the initial template request and fallback logic, while the backend components (tokenizer.rs and api/mod.rs) manage template extraction, rendering, and special token handling. The system is designed to first attempt backend-based template rendering using the model's native chat template, falling back to a hardcoded Qwen-compatible format when necessary.
**Section sources**
- [prompts.ts](file://src/lib/chat/prompts.ts)
- [tokenizer.rs](file://src-tauri/src/core/tokenizer.rs)
- [api/mod.rs](file://src-tauri/src/api/mod.rs)
## Architecture Overview
The architecture follows a client-server pattern where the frontend requests template information from the backend and uses it to format prompts appropriately. When a user submits a chat message, the frontend first queries the backend for the current model's chat template. If a template is available, it sends the message history to the backend for rendering using the minijinja template engine. If no template is available, it falls back to a hardcoded Qwen-compatible format.
``mermaid
sequenceDiagram
participant Frontend
participant Backend
participant Tokenizer
Frontend->>Backend : get_chat_template()
Backend->>Tokenizer : extract_chat_template()
Tokenizer-->>Backend : Returns template string
Backend-->>Frontend : Returns template (or null)
alt Template exists
Frontend->>Backend : render_prompt(messages)
Backend->>Backend : minijinja.render(template, messages)
Backend-->>Frontend : Returns rendered prompt
else No template
Frontend->>Frontend : Build Qwen-compatible prompt
Frontend-->>Backend : Send formatted prompt
end
Backend->>Model : generate_stream(prompt)
Diagram sources
- prompts.ts
- tokenizer.rs
- api/mod.rs
The frontend implementation in prompts.ts provides the primary interface for chat template processing. It first attempts to retrieve the chat template from the backend and uses it for rendering when available. The function buildPromptWithChatTemplate serves as the main entry point, handling both the primary template rendering path and the fallback mechanism.
``mermaid flowchart TD Start([buildPromptWithChatTemplate]) --> GetTemplate["invoke('get_chat_template')"] GetTemplate --> TemplateExists{"Template exists?"} TemplateExists --> |Yes| PrepareMessages["Prepare messages with sanitizeForPrompt"] PrepareMessages --> RenderBackend["invoke('render_prompt', messages)"] RenderBackend --> ReturnResult["Return rendered prompt"] TemplateExists --> |No| FallbackFormat["Use Qwen-compatible format"] FallbackFormat --> ProcessHistory["Iterate through message history"] ProcessHistory --> FormatMessages["Format messages with <|im_start|> tags"] FormatMessages --> AddAssistant["Add <|im_start|>assistant\n"] AddAssistant --> HandleControl["Check for /think or /no_think commands"] HandleControl --> ReturnResult ReturnResult --> End([Return final prompt])
**Diagram sources**
- [prompts.ts](file://src/lib/chat/prompts.ts)
**Section sources**
- [prompts.ts](file://src/lib/chat/prompts.ts)
### Backend Template Rendering
The backend template rendering system is implemented in api/mod.rs and leverages the minijinja templating engine to process model-specific chat templates. The render_prompt function receives message history from the frontend and applies the template using a context that includes messages, add_generation_prompt, and tools parameters.
``mermaid
flowchart TD
A["render_prompt(state, messages)"] --> B["Get chat_template from state"]
B --> C{"Template available?"}
C --> |No| D["Return error: chat_template not available"]
C --> |Yes| E["Create minijinja Environment"]
E --> F["Add template 'tpl' with template string"]
F --> G["Get template reference"]
G --> H["Create context with messages, add_generation_prompt=true, tools=[]"]
H --> I["Render template with context"]
I --> J{"Render successful?"}
J --> |Yes| K["Return rendered string"]
J --> |No| L["Return render error"]
Diagram sources
- api/mod.rs
Section sources
- api/mod.rs
The tokenizer component in tokenizer.rs plays a crucial role in chat template functionality by extracting templates from model metadata and managing special tokens. The system identifies special tokens like <|im_start|>, <|im_end|>, , and and marks them as special tokens in the tokenizer vocabulary to ensure proper handling during tokenization.
``mermaid classDiagram class Tokenizer { +get_vocab(include_special : bool) Map[String, u32] +add_special_tokens(tokens : Vec[AddedToken]) Result +to_string(include_special : bool) Result[String] } class AddedToken { +content : String +single_word : bool +lstrip : bool +rstrip : bool } class HashMap { +get(key : String) Option[Value] } class TokenizerConfig { +added_tokens : Vec[Value] +special_tokens : Vec[Value] +chat_template : Option[String] } Tokenizer --> AddedToken : "uses in add_special_tokens" Tokenizer --> HashMap : "uses get_vocab" Tokenizer --> TokenizerConfig : "uses in extract_chat_template"
**Diagram sources**
- [tokenizer.rs](file://src-tauri/src/core/tokenizer.rs)
**Section sources**
- [tokenizer.rs](file://src-tauri/src/core/tokenizer.rs)
## Control Prefix Handling
The system now supports control prefixes (/think and /no_think) that allow users to control the chain-of-thought (CoT) behavior for individual messages. The control prefix is determined by examining the last user message in the conversation history.
When the /no_think command is detected, the system ensures that an empty <think>...</think> block is added to the prompt, which explicitly disables CoT reasoning. This handling occurs in both the backend template rendering path and the frontend fallback path.
``mermaid
flowchart TD
A["buildPromptWithChatTemplate"] --> B["Find last user message"]
B --> C{"Message starts with /no_think?"}
C --> |Yes| D["Set control = no_think"]
C --> |No| E{"Message starts with /think?"}
E --> |Yes| F["Set control = think"]
E --> |No| G["Set control = null"]
D --> H["Process template rendering"]
F --> H
G --> H
H --> I{"Using backend template?"}
I --> |Yes| J["Render with minijinja"]
J --> K{"control = no_think?"}
K --> |Yes| L{"Contains <think>...</think>?"}
L --> |No| M["Append empty <think>\\n\\n</think>\\n\\n"]
L --> |Yes| N["Return rendered prompt"]
M --> N
I --> |No| O["Use Qwen-compatible format"]
O --> P{"control = no_think?"}
P --> |Yes| Q["Append empty <think>\\n\\n</think>\\n\\n"]
P --> |No| R["Return formatted prompt"]
Q --> R
Diagram sources
- prompts.ts - Control prefix detection logic
- prompts.ts - Empty think block injection for backend templates
- prompts.ts - Empty think block injection for fallback templates
Section sources
- prompts.ts
The chat template system has well-defined dependencies between components, with clear interfaces and data flow patterns. The frontend depends on the backend for template information and rendering, while the backend components depend on the tokenizer for template extraction and special token management.
``mermaid graph TD A[prompts.ts] --> |Tauri API| B[api/mod.rs] B --> |Calls| C[tokenizer.rs] B --> |Uses| D[minijinja] C --> |Extracts from| E[GGUF metadata] F[qwen3.rs] --> |Provides model| G[state.rs] G --> |Stores| H[tokenizer] G --> |Stores| I[chat_template] B --> |Accesses| G
**Diagram sources**
- [prompts.ts](file://src/lib/chat/prompts.ts)
- [api/mod.rs](file://src-tauri/src/api/mod.rs)
- [tokenizer.rs](file://src-tauri/src/core/tokenizer.rs)
- [qwen3.rs](file://src-tauri/src/models/qwen3.rs)
- [state.rs](file://src-tauri/src/state.rs)
**Section sources**
- [prompts.ts](file://src/lib/chat/prompts.ts)
- [api/mod.rs](file://src-tauri/src/api/mod.rs)
- [tokenizer.rs](file://src-tauri/src/core/tokenizer.rs)
## Performance Considerations
The chat template system is designed with performance in mind, minimizing redundant operations and leveraging efficient data structures. The template extraction from GGUF metadata occurs once during model loading, and the rendered template is cached in the model state. The use of minijinja for template rendering provides efficient string processing with minimal overhead. The fallback mechanism in the frontend avoids unnecessary backend calls when no template is available, reducing latency for models without native template support.
## Troubleshooting Guide
When encountering issues with chat template rendering, consider the following troubleshooting steps:
1. **Template not being applied**: Check if the model's tokenizer contains a chat_template field in its configuration. Use get_chat_template API to verify template availability.
2. **Special tokens not recognized**: Verify that special tokens like <|im_start|> and <|im_end|> exist in the tokenizer vocabulary. The mark_special_chat_tokens function automatically identifies and registers these tokens.
3. **Incorrect message formatting**: Ensure message roles (user, assistant) match the expected values in the template. The system is case-sensitive for role names.
4. **Template rendering errors**: Check the template syntax for minijinja compatibility. Common issues include unbalanced brackets or incorrect variable names.
5. **Control prefix issues**: When using /think or /no_think commands, ensure the command is at the beginning of the user message. The system only checks the last user message for control prefixes.
6. **Empty think block not appearing**: When using /no_think, verify that the empty <think>...\n\n</think>\n\n block is being added to the prompt. This should happen in both backend-rendered and fallback templates.
**Section sources**
- [prompts.ts](file://src/lib/chat/prompts.ts)
- [tokenizer.rs](file://src-tauri/src/core/tokenizer.rs)
- [api/mod.rs](file://src-tauri/src/api/mod.rs)
## Conclusion
The custom chat template system in Oxide-Lab provides a flexible and extensible framework for handling model-specific prompt formatting. By combining frontend and backend components, it supports both native template rendering through minijinja and fallback mechanisms for models without template support. The system effectively handles special tokens and provides a clean interface for integrating new models with their specific formatting requirements. This architecture ensures compatibility with various language models while maintaining performance and reliability.
**Referenced Files in This Document**
- [prompts.ts](file://src/lib/chat/prompts.ts) - *Updated to support control prefixes and think tag handling*
- [tokenizer.rs](file://src-tauri/src/core/tokenizer.rs) - *Contains tokenizer and special token management*
- [api/mod.rs](file://src-tauri/src/api/mod.rs) - *Backend API implementation for chat templates*
- [template.rs](file://src-tauri/src/api/template.rs) - *Template rendering logic*
- [qwen3.rs](file://src-tauri/src/models/qwen3.rs) - *Qwen3 model implementation*
- [state.rs](file://src-tauri/src/state.rs) - *Model state management*