|
| 1 | +--- |
| 2 | +title: Chat UI |
| 3 | +description: A built-in chat interface for interactive conversations with your LLM agents, supporting rich media |
| 4 | +--- |
| 5 | + |
| 6 | +## π Rich Markdown & Syntax Highlighting |
| 7 | + |
| 8 | +Full markdown rendering with syntax highlighting for popular programming languages: |
| 9 | + |
| 10 | +<Screenshot src="/img/llms-syntax.webp" /> |
| 11 | + |
| 12 | +Code blocks include: |
| 13 | +- Copy to clipboard on hover |
| 14 | +- Language detection |
| 15 | +- Line numbers |
| 16 | +- Syntax highlighting |
| 17 | + |
| 18 | +## Compact Feature |
| 19 | + |
| 20 | +The **Compact** feature is a powerful tool designed to help you manage long conversations by summarizing the current thread into a more concise version. This allows you to continue your conversation with the AI while significantly reducing token usage and costs, without losing the context of your discussion. |
| 21 | + |
| 22 | +## When to use it |
| 23 | + |
| 24 | +The **Compact** button appears automatically at the bottom of your thread when: |
| 25 | +* The conversation has more than **10 messages**. |
| 26 | +* **OR** you have used more than **40%** of the model's context limit. |
| 27 | + |
| 28 | +<ScreenshotsGallery className="mb-8" gridClass="grid grid-cols-1 md:grid-cols-2 gap-4" images={{ |
| 29 | + 'Compact Button': '/img/compact-button.webp', |
| 30 | + 'Compact Button Intensity': '/img/compact-intensity.webp', |
| 31 | +}} /> |
| 32 | + |
| 33 | + |
| 34 | +## What it does |
| 35 | + |
| 36 | +When activated, the Compact feature: |
| 37 | +1. **Analyzes** your current conversation thread. |
| 38 | +2. **Creates a new thread** with a summarized version of the chat history. |
| 39 | +3. **Preserves key information** while discarding redundant or less important details. |
| 40 | +4. **Targets a 30%** size of the original context, giving you much more room to continue. |
| 41 | + |
| 42 | +<Screenshot src="/img/compact-result.webp" /> |
| 43 | + |
| 44 | +<Info>Your original thread is preserved! Compact creates a *new* thread, so you can always go back to the full history if needed.</Info> |
| 45 | + |
| 46 | +## Benefits |
| 47 | + |
| 48 | +* **Save Costs**: Reduces the number of tokens sent to the LLM, lowering the cost per request |
| 49 | +* **Extend Conversations**: Frees up context window space, preventing you from hitting the model's hard limit |
| 50 | +* **Improve Focus**: Helps AI focus on the current state of the conversation rather than getting distracted by old history |
| 51 | + |
| 52 | +## Customizing Compact Behavior |
| 53 | + |
| 54 | +The Compact feature is fully customizable through your [~/.llms/llms.json](https://github.com/ServiceStack/llms/blob/main/llms/llms.json) configuration file. You can modify the AI model used, the system prompt, and the user message template to tailor the compaction process to your needs. |
| 55 | + |
| 56 | +### Configuration Location |
| 57 | + |
| 58 | +Add a `compact` section to your [~/.llms/llms.json](https://github.com/ServiceStack/llms/blob/main/llms/llms.json) file under the `default` key: |
| 59 | + |
| 60 | +```json |
| 61 | +{ |
| 62 | + "compact": { |
| 63 | + "model": "Gemini 2.5 Flash Lite", |
| 64 | + "messages": [ |
| 65 | + { "role": "system", "content": "Your system prompt here..." }, |
| 66 | + { "role": "user", "content": "Your user message template here..." } |
| 67 | + ] |
| 68 | + } |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +### Choosing a Model |
| 73 | + |
| 74 | +You can specify any configured model for the compaction task. Fast, cost-effective models like **Gemini 2.5 Flash Lite** or **Claude 3.5 Haiku** are good choices since compaction is a straightforward summarization task. |
| 75 | + |
| 76 | +### Template Placeholders |
| 77 | + |
| 78 | +The user message template supports the following placeholders that get replaced with the actual thread data: |
| 79 | + |
| 80 | +| Placeholder | Description | |
| 81 | +|-------------|-------------| |
| 82 | +| `{message_count}` | The total number of messages in the conversation being compacted | |
| 83 | +| `{token_count}` | The approximate token count of the original conversation | |
| 84 | +| `{target_tokens}` | The target token count for the compacted result (default: 30% of original) | |
| 85 | +| `{messages_json}` | The full conversation history as a JSON array of message objects | |
| 86 | + |
| 87 | +### Example User Message Template |
| 88 | + |
| 89 | +``` |
| 90 | +Compact the following conversation while preserving all context needed to |
| 91 | +continue it coherently. The conversation has {message_count} messages totaling |
| 92 | +approximately {token_count} tokens. Target approximately {target_tokens} tokens. |
| 93 | +
|
| 94 | +<conversation> |
| 95 | +{messages_json} |
| 96 | +</conversation> |
| 97 | +
|
| 98 | +Return your response as a JSON object with a single "messages" key containing |
| 99 | +the compacted array. |
| 100 | +``` |
| 101 | + |
| 102 | +### Customization Tips |
| 103 | + |
| 104 | +- **Adjust the target ratio**: Modify the system prompt to request more or less aggressive compaction |
| 105 | +- **Preserve specific content**: Add instructions to always keep certain types of information (code, URLs, decisions) |
| 106 | +- **Change the output format**: Customize how the AI structures the compacted conversation |
| 107 | +- **Use specialized models**: For technical conversations, you might prefer a model with stronger code understanding |
| 108 | + |
| 109 | +## π Reasoning Support |
| 110 | + |
| 111 | +Specialized rendering for reasoning models with thinking processes: |
| 112 | + |
| 113 | +<Screenshot src="/img/llms-reasoning.webp" /> |
| 114 | + |
| 115 | +Shows: |
| 116 | +- Thinking process (collapsed by default) |
| 117 | +- Final response |
| 118 | +- Clear separation between reasoning and output |
| 119 | + |
| 120 | +## π Token Metrics |
| 121 | + |
| 122 | +See token usage for every message and conversation: |
| 123 | + |
| 124 | +<Screenshot src="/img/llms-tokens-usage.webp" /> |
| 125 | + |
| 126 | +Displayed metrics: |
| 127 | +- Per-message token count |
| 128 | +- Thread total tokens |
| 129 | +- Input vs output tokens |
| 130 | +- Total cost |
| 131 | +- Response time |
| 132 | + |
| 133 | +## βοΈ Edit & Redo |
| 134 | + |
| 135 | +Edit previous messages or retry with different parameters: |
| 136 | + |
| 137 | +- **Edit**: Modify user messages and rerun |
| 138 | +- **Redo**: Regenerate AI responses |
| 139 | +- Hover over messages to see options |
0 commit comments