19.5.3. Prompt Structure And Conversation Formatting

Prompt Structure and Conversation Formatting

Introduction

This document provides comprehensive documentation on prompt structure and conversation formatting for different model architectures, with a focus on Qwen3-specific requirements. It explains how system messages, user queries, and assistant responses are formatted with appropriate delimiters and special tokens. The document also covers the role of BOS/EOS tokens and separator sequences in conversation continuity, and how the frontend parser in impl.ts prepares message sequences before tokenization.

Qwen3 Model Architecture and Prompt Formatting

The Qwen3 model architecture is implemented in both the candle-transformers library and the application-specific wrapper in src-tauri. The model configuration includes parameters such as vocabulary size, hidden size, number of attention heads, and rotary embedding settings.

classDiagram
class Config {
+vocab_size : usize
+hidden_size : usize
+intermediate_size : usize
+num_hidden_layers : usize
+num_attention_heads : usize
+head_dim : usize
+attention_bias : bool
+num_key_value_heads : usize
+max_position_embeddings : usize
+rope_theta : f64
+rms_norm_eps : f64
+hidden_act : Activation
}
class Qwen3RotaryEmbedding {
+sin : Tensor
+cos : Tensor
+new(dtype : DType, cfg : &Config, dev : &Device) Result<Self>
+apply(q : &Tensor, k : &Tensor, offset : usize) Result<(Tensor, Tensor)>
}
class Qwen3MLP {
+gate_proj : Linear
+up_proj : Linear
+down_proj : Linear
+act_fn : Activation
+new(cfg : &Config, vb : VarBuilder) Result<Self>
+forward(x : &Tensor) Result<Tensor>
}
class Qwen3Attention {
+q_proj : Linear
+k_proj : Linear
+v_proj : Linear
+o_proj : Linear
+q_norm : RmsNorm
+k_norm : RmsNorm
+num_heads : usize
+num_kv_heads : usize
+num_kv_groups : usize
+head_dim : usize
+hidden_size : usize
+rotary_emb : Arc<Qwen3RotaryEmbedding>
+kv_cache : KvCache
+new(cfg : &Config, rotary_emb : Arc<Qwen3RotaryEmbedding>, vb : VarBuilder) Result<Self>
+forward(x : &Tensor, attn_mask : Option<&Tensor>, offset : usize) Result<Tensor>
}
class ModelWeights {
+inner : candle_transformers : : models : : quantized_qwen3 : : ModelWeights
+from_gguf(content : Content, reader : &mut R, device : &Device, _context_length : usize, _flag : bool) Result<Self, String>
}
Config <|-- Qwen3RotaryEmbedding
Config <|-- Qwen3MLP
Config <|-- Qwen3Attention
Qwen3RotaryEmbedding <|-- Qwen3Attention
Qwen3MLP <|-- DecoderLayer
Qwen3Attention <|-- DecoderLayer
DecoderLayer <|-- ModelWeights

Diagram sources

qwen3.rs

Section sources

qwen3.rs
qwen3.rs

Frontend Parser Implementation

The frontend parser in impl.ts is responsible for processing the stream of text and identifying special tags that control the rendering of the conversation. It handles various types of content including thinking blocks, code blocks, tool calls, and media.

flowchart TD
Start([Parse Stream]) --> CheckState["Check Current State"]
CheckState --> |inThink| ProcessThink["Process Think Block"]
ProcessThink --> |End Found| EmitThink["Emit HTML for Think Block"]
ProcessThink --> |No End| BufferThink["Buffer Content"]
EmitThink --> ResetThink["Reset inThink State"]
ResetThink --> Continue["Continue Parsing"]
CheckState --> |inCode| ProcessCode["Process Code Block"]
ProcessCode --> |End Found| EmitCode["Emit HTML for Code Block"]
ProcessCode --> |No End| BufferCode["Buffer Content"]
EmitCode --> ResetCode["Reset inCode State"]
ResetCode --> Continue
CheckState --> |inToolCall| ProcessToolCall["Process Tool Call"]
ProcessToolCall --> |End Found| EmitToolCall["Emit HTML for Tool Call"]
ProcessToolCall --> |No End| BufferToolCall["Buffer Content"]
EmitToolCall --> ResetToolCall["Reset inToolCall State"]
ResetToolCall --> Continue
CheckState --> |inToolResponse| ProcessToolResponse["Process Tool Response"]
ProcessToolResponse --> |End Found| EmitToolResponse["Emit HTML for Tool Response"]
ProcessToolResponse --> |No End| BufferToolResponse["Buffer Content"]
EmitToolResponse --> ResetToolResponse["Reset inToolResponse State"]
ResetToolResponse --> Continue
CheckState --> |inMedia| ProcessMedia["Process Media Block"]
ProcessMedia --> |End Found| EmitMedia["Emit HTML for Media"]
ProcessMedia --> |No End| BufferMedia["Buffer Content"]
EmitMedia --> ResetMedia["Reset inMedia State"]
ResetMedia --> Continue
CheckState --> |Normal Text| FindTags["Find Special Tags"]
FindTags --> |Found| ProcessTag["Process Tag"]
ProcessTag --> Continue
FindTags --> |Not Found| EmitText["Emit Plain Text"]
EmitText --> Continue
Continue --> |More Input| CheckState
Continue --> |End| End([Complete])

Diagram sources

impl.ts

Section sources

impl.ts
constants.ts

Conversation Formatting and Special Tokens

flowchart LR
A[User Input] --> B{Check for /think or /no_think}
B --> |Has /think| C[Set control = "think"]
B --> |Has /no_think| D[Set control = "no_think"]
B --> |Neither| E[Set control = null]
F[Process History] --> G{Message Role}
G --> |User| H[Format as <|im_start|>user\n{content}<|im_end|>\n]
G --> |Assistant| I[Format as <|im_start|>assistant\n{content}<|im_end|>\n]
J[Add Current Assistant] --> K[Append <|im_start|>assistant\n]
L{Control = "no_think"} --> |Yes| M[Append empty think block]
L --> |No| N[Proceed normally]
O[Final Prompt] --> P[Return formatted string]

Diagram sources

prompts.ts
tokenizer.rs

Section sources

prompts.ts
tokenizer.rs
types.ts

Multi-turn Conversation Examples

Here are examples of properly formatted multi-turn conversations:

Simple Conversation:

<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you for asking! How can I help you today?<|im_end|>
<|im_start|>user
Can you tell me about Qwen3?<|im_end|>
<|im_start|>assistant

Conversation with Chain-of-Thought:

<|im_start|>user
/think What is the capital of France?<|im_end|>
<|im_start|>assistant
<think>
France is a country in Europe. The capital city is a major cultural and political center. I recall that Paris is the capital of France.
</think>
The capital of France is Paris.<|im_end|>
<|im_start|>user
And what about Germany?<|im_end|>
<|im_start|>assistant
<think>
Germany is another European country. Its capital is a major city with historical significance. I believe Berlin is the capital of Germany.
</think>
The capital of Germany is Berlin.<|im_end|>
<|im_start|>assistant

Conversation with Code Output:

<|im_start|>user
Write a Python function to calculate factorial<|im_end|>
<|im_start|>assistant
Here's a Python function to calculate factorial:
<|python|>
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)

# Example usage
print(factorial(5))  # Output: 120
<|/python|><|im_end|>
<|im_start|>user
Thanks!<|im_end|>
<|im_start|>assistant
You're welcome! Let me know if you need any clarification on the code.<|im_end|>
<|im_start|>assistant

Common Mistakes and Troubleshooting

Common Mistakes:

Incorrect tag nesting: Failing to properly close tags like <think> or <|python|> can disrupt parsing
Missing message delimiters: Omitting <|im_start|> or <|im_end|> breaks conversation structure
Improper role tags: Using incorrect role identifiers prevents proper message routing
Unescaped special characters: Not properly escaping content that contains special sequences

Troubleshooting Guide:

Issue: Parser stops processing mid-conversation
- Solution: Check for unclosed tags or malformed special sequences
Issue: Content appears as raw text instead of formatted output
- Solution: Verify that special tokens are correctly formatted and not escaped
Issue: Model fails to generate response
- Solution: Ensure the final <|im_start|>assistant\n is present to prompt generation
Issue: Chain-of-thought not displaying
- Solution: Verify that /think command is used at the beginning of user message

Section sources

impl.ts
prompts.ts
constants.ts

Guidelines for Custom Template Development

When developing custom templates, follow these guidelines:

Consistent Delimiters: Use consistent opening and closing tags for all special content
Proper Nesting: Ensure all tags are properly nested and closed in the correct order
Escape Special Characters: Properly escape content that might contain template delimiters
State Management: Track parser state to handle streaming content correctly
Error Recovery: Implement graceful recovery from malformed input
Performance: Optimize parsing algorithms for efficiency, especially for streaming use cases

For Qwen3-specific templates, maintain compatibility with the <|im_start|> and <|im_end|> message delimiters, and support the <think> block for chain-of-thought reasoning. When extending functionality, add new tag pairs following the same pattern as existing tags.

Section sources

prompts.ts
impl.ts
constants.ts

Referenced Files in This Document

qwen3.rs
qwen3.rs
impl.ts
prompts.ts
constants.ts
tokenizer.rs
types.ts

19.5.3. Prompt Structure And Conversation Formatting

Prompt Structure and Conversation Formatting

Table of Contents

Introduction

Qwen3 Model Architecture and Prompt Formatting

Frontend Parser Implementation

Conversation Formatting and Special Tokens

Multi-turn Conversation Examples

Common Mistakes and Troubleshooting

Guidelines for Custom Template Development

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally