-
Notifications
You must be signed in to change notification settings - Fork 3
19.5.3. Prompt Structure And Conversation Formatting
- Introduction
- Qwen3 Model Architecture and Prompt Formatting
- Frontend Parser Implementation
- Conversation Formatting and Special Tokens
- Multi-turn Conversation Examples
- Common Mistakes and Troubleshooting
- Guidelines for Custom Template Development
This document provides comprehensive documentation on prompt structure and conversation formatting for different model architectures, with a focus on Qwen3-specific requirements. It explains how system messages, user queries, and assistant responses are formatted with appropriate delimiters and special tokens. The document also covers the role of BOS/EOS tokens and separator sequences in conversation continuity, and how the frontend parser in impl.ts prepares message sequences before tokenization.
The Qwen3 model architecture is implemented in both the candle-transformers library and the application-specific wrapper in src-tauri. The model configuration includes parameters such as vocabulary size, hidden size, number of attention heads, and rotary embedding settings.
classDiagram
class Config {
+vocab_size : usize
+hidden_size : usize
+intermediate_size : usize
+num_hidden_layers : usize
+num_attention_heads : usize
+head_dim : usize
+attention_bias : bool
+num_key_value_heads : usize
+max_position_embeddings : usize
+rope_theta : f64
+rms_norm_eps : f64
+hidden_act : Activation
}
class Qwen3RotaryEmbedding {
+sin : Tensor
+cos : Tensor
+new(dtype : DType, cfg : &Config, dev : &Device) Result<Self>
+apply(q : &Tensor, k : &Tensor, offset : usize) Result<(Tensor, Tensor)>
}
class Qwen3MLP {
+gate_proj : Linear
+up_proj : Linear
+down_proj : Linear
+act_fn : Activation
+new(cfg : &Config, vb : VarBuilder) Result<Self>
+forward(x : &Tensor) Result<Tensor>
}
class Qwen3Attention {
+q_proj : Linear
+k_proj : Linear
+v_proj : Linear
+o_proj : Linear
+q_norm : RmsNorm
+k_norm : RmsNorm
+num_heads : usize
+num_kv_heads : usize
+num_kv_groups : usize
+head_dim : usize
+hidden_size : usize
+rotary_emb : Arc<Qwen3RotaryEmbedding>
+kv_cache : KvCache
+new(cfg : &Config, rotary_emb : Arc<Qwen3RotaryEmbedding>, vb : VarBuilder) Result<Self>
+forward(x : &Tensor, attn_mask : Option<&Tensor>, offset : usize) Result<Tensor>
}
class ModelWeights {
+inner : candle_transformers : : models : : quantized_qwen3 : : ModelWeights
+from_gguf(content : Content, reader : &mut R, device : &Device, _context_length : usize, _flag : bool) Result<Self, String>
}
Config <|-- Qwen3RotaryEmbedding
Config <|-- Qwen3MLP
Config <|-- Qwen3Attention
Qwen3RotaryEmbedding <|-- Qwen3Attention
Qwen3MLP <|-- DecoderLayer
Qwen3Attention <|-- DecoderLayer
DecoderLayer <|-- ModelWeights
Diagram sources
- qwen3.rs
Section sources
- qwen3.rs
- qwen3.rs
The frontend parser in impl.ts is responsible for processing the stream of text and identifying special tags that control the rendering of the conversation. It handles various types of content including thinking blocks, code blocks, tool calls, and media.
flowchart TD
Start([Parse Stream]) --> CheckState["Check Current State"]
CheckState --> |inThink| ProcessThink["Process Think Block"]
ProcessThink --> |End Found| EmitThink["Emit HTML for Think Block"]
ProcessThink --> |No End| BufferThink["Buffer Content"]
EmitThink --> ResetThink["Reset inThink State"]
ResetThink --> Continue["Continue Parsing"]
CheckState --> |inCode| ProcessCode["Process Code Block"]
ProcessCode --> |End Found| EmitCode["Emit HTML for Code Block"]
ProcessCode --> |No End| BufferCode["Buffer Content"]
EmitCode --> ResetCode["Reset inCode State"]
ResetCode --> Continue
CheckState --> |inToolCall| ProcessToolCall["Process Tool Call"]
ProcessToolCall --> |End Found| EmitToolCall["Emit HTML for Tool Call"]
ProcessToolCall --> |No End| BufferToolCall["Buffer Content"]
EmitToolCall --> ResetToolCall["Reset inToolCall State"]
ResetToolCall --> Continue
CheckState --> |inToolResponse| ProcessToolResponse["Process Tool Response"]
ProcessToolResponse --> |End Found| EmitToolResponse["Emit HTML for Tool Response"]
ProcessToolResponse --> |No End| BufferToolResponse["Buffer Content"]
EmitToolResponse --> ResetToolResponse["Reset inToolResponse State"]
ResetToolResponse --> Continue
CheckState --> |inMedia| ProcessMedia["Process Media Block"]
ProcessMedia --> |End Found| EmitMedia["Emit HTML for Media"]
ProcessMedia --> |No End| BufferMedia["Buffer Content"]
EmitMedia --> ResetMedia["Reset inMedia State"]
ResetMedia --> Continue
CheckState --> |Normal Text| FindTags["Find Special Tags"]
FindTags --> |Found| ProcessTag["Process Tag"]
ProcessTag --> Continue
FindTags --> |Not Found| EmitText["Emit Plain Text"]
EmitText --> Continue
Continue --> |More Input| CheckState
Continue --> |End| End([Complete])
Diagram sources
- impl.ts
Section sources
- impl.ts
- constants.ts
The conversation formatting system uses a combination of special tokens and HTML-like tags to structure the conversation. The primary delimiters are <|im_start|> and <|im_end|> for message boundaries, with role-specific tags like <|user|>, <|assistant|>, and <|system|>.
The system also supports special control sequences such as <think> for chain-of-thought reasoning, and various media tags like <|image|>, <|audio|>, and <|video|>. BOS/EOS tokens are represented by <s> and </s> respectively.
flowchart LR
A[User Input] --> B{Check for /think or /no_think}
B --> |Has /think| C[Set control = "think"]
B --> |Has /no_think| D[Set control = "no_think"]
B --> |Neither| E[Set control = null]
F[Process History] --> G{Message Role}
G --> |User| H[Format as <|im_start|>user\n{content}<|im_end|>\n]
G --> |Assistant| I[Format as <|im_start|>assistant\n{content}<|im_end|>\n]
J[Add Current Assistant] --> K[Append <|im_start|>assistant\n]
L{Control = "no_think"} --> |Yes| M[Append empty think block]
L --> |No| N[Proceed normally]
O[Final Prompt] --> P[Return formatted string]
Diagram sources
- prompts.ts
- tokenizer.rs
Section sources
- prompts.ts
- tokenizer.rs
- types.ts
Here are examples of properly formatted multi-turn conversations:
Simple Conversation:
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you for asking! How can I help you today?<|im_end|>
<|im_start|>user
Can you tell me about Qwen3?<|im_end|>
<|im_start|>assistant
Conversation with Chain-of-Thought:
<|im_start|>user
/think What is the capital of France?<|im_end|>
<|im_start|>assistant
<think>
France is a country in Europe. The capital city is a major cultural and political center. I recall that Paris is the capital of France.
</think>
The capital of France is Paris.<|im_end|>
<|im_start|>user
And what about Germany?<|im_end|>
<|im_start|>assistant
<think>
Germany is another European country. Its capital is a major city with historical significance. I believe Berlin is the capital of Germany.
</think>
The capital of Germany is Berlin.<|im_end|>
<|im_start|>assistant
Conversation with Code Output:
<|im_start|>user
Write a Python function to calculate factorial<|im_end|>
<|im_start|>assistant
Here's a Python function to calculate factorial:
<|python|>
def factorial(n):
if n <= 1:
return 1
return n * factorial(n - 1)
# Example usage
print(factorial(5)) # Output: 120
<|/python|><|im_end|>
<|im_start|>user
Thanks!<|im_end|>
<|im_start|>assistant
You're welcome! Let me know if you need any clarification on the code.<|im_end|>
<|im_start|>assistant
Common Mistakes:
-
Incorrect tag nesting: Failing to properly close tags like
<think>or<|python|>can disrupt parsing -
Missing message delimiters: Omitting
<|im_start|>or<|im_end|>breaks conversation structure - Improper role tags: Using incorrect role identifiers prevents proper message routing
- Unescaped special characters: Not properly escaping content that contains special sequences
Troubleshooting Guide:
-
Issue: Parser stops processing mid-conversation
- Solution: Check for unclosed tags or malformed special sequences
-
Issue: Content appears as raw text instead of formatted output
- Solution: Verify that special tokens are correctly formatted and not escaped
-
Issue: Model fails to generate response
-
Solution: Ensure the final
<|im_start|>assistant\nis present to prompt generation
-
Solution: Ensure the final
-
Issue: Chain-of-thought not displaying
-
Solution: Verify that
/thinkcommand is used at the beginning of user message
-
Solution: Verify that
Section sources
- impl.ts
- prompts.ts
- constants.ts
When developing custom templates, follow these guidelines:
- Consistent Delimiters: Use consistent opening and closing tags for all special content
- Proper Nesting: Ensure all tags are properly nested and closed in the correct order
- Escape Special Characters: Properly escape content that might contain template delimiters
- State Management: Track parser state to handle streaming content correctly
- Error Recovery: Implement graceful recovery from malformed input
- Performance: Optimize parsing algorithms for efficiency, especially for streaming use cases
For Qwen3-specific templates, maintain compatibility with the <|im_start|> and <|im_end|> message delimiters, and support the <think> block for chain-of-thought reasoning. When extending functionality, add new tag pairs following the same pattern as existing tags.
Section sources
- prompts.ts
- impl.ts
- constants.ts
Referenced Files in This Document
- qwen3.rs
- qwen3.rs
- impl.ts
- prompts.ts
- constants.ts
- tokenizer.rs
- types.ts