Skip to content

Commit 91bb9b3

Browse files
committed
docs(ai-chat): add prompt caching guide
1 parent 911a1cf commit 91bb9b3

2 files changed

Lines changed: 203 additions & 0 deletions

File tree

docs/ai-chat/prompt-caching.mdx

Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
---
2+
title: "Prompt caching"
3+
sidebarTitle: "Prompt caching"
4+
description: "Cache the stable prefix of your agent's prompt with Anthropic prompt caching to cut token cost and latency on every turn."
5+
---
6+
7+
import RcBanner from "/snippets/ai-chat-rc-banner.mdx";
8+
9+
<RcBanner />
10+
11+
**Prompt caching lets a provider reuse the unchanged prefix of your prompt across requests, billing it at a fraction of the input price and skipping re-processing.** With Anthropic, cache reads cost ~10% of base input tokens, so a long, stable system prompt or a growing conversation history pays full price once and reads cheaply on every turn after.
12+
13+
Caching is a **byte-exact prefix match**: any change in the prefix invalidates everything after it. A multi-turn agent is the ideal case — the system prompt, tools, and earlier turns are identical turn over turn, so the cacheable prefix only grows. `chat.agent` is built to keep that prefix stable across turns, suspends, and resumes; this page shows how to place the cache breakpoints and verify they're hitting.
14+
15+
Caching is provider-specific. This guide covers Anthropic (`@ai-sdk/anthropic`), where you opt in per breakpoint with `providerOptions.anthropic.cacheControl`. Other providers cache differently, and most cache automatically — see [Other providers](#other-providers).
16+
17+
## What you cache, and where
18+
19+
A request renders as `tools``system``messages`. There are three prefix regions worth caching, in order:
20+
21+
| Region | How to cache it | Stability |
22+
| --- | --- | --- |
23+
| System prompt (+ tools) | `cacheControl` / `systemProviderOptions` on `chat.toStreamTextOptions()`, or `providerOptions` on `chat.prompt.set()` | Set once, never changes — the highest-value target |
24+
| Conversation history | `prepareMessages` adds a breakpoint to the last message | Grows append-only across turns |
25+
| Tool definitions | Stable as long as your tool set doesn't change between turns | Render at position 0 — changing them invalidates everything |
26+
27+
`chat.agent` preserves `providerOptions` through message persistence and rehydration, so a breakpoint you place survives a suspend/resume or a page refresh. The recommended way to place message breakpoints is `prepareMessages` (below) rather than baking `cacheControl` into stored messages — `prepareMessages` runs on every prompt-assembly path, including after compaction, so the breakpoint is always in the right place.
28+
29+
## Cache the system prompt
30+
31+
The system prompt (your `chat.prompt` text plus any skills preamble) is usually the largest stable block, so it's the first thing to cache. `chat.toStreamTextOptions()` returns `system` as a plain string by default; opt into caching and it returns a structured system message carrying the cache breakpoint instead.
32+
33+
Three ways to opt in, depending on where you'd rather express it.
34+
35+
**`cacheControl` at the `streamText` call site** — the Anthropic-flavored one-liner:
36+
37+
```ts /trigger/chat.ts
38+
import { chat } from "@trigger.dev/sdk/ai";
39+
import { streamText } from "ai";
40+
import { anthropic } from "@ai-sdk/anthropic";
41+
42+
export const myChat = chat.agent({
43+
id: "my-chat",
44+
onChatStart: async () => {
45+
chat.prompt.set(SYSTEM_PROMPT); // a large, stable instruction block
46+
},
47+
run: async ({ messages, signal }) => {
48+
return streamText({
49+
model: anthropic("claude-sonnet-4-5"),
50+
// Caches the system block with a 5-minute breakpoint.
51+
...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }),
52+
messages,
53+
abortSignal: signal,
54+
});
55+
},
56+
});
57+
```
58+
59+
**`systemProviderOptions`** is the provider-agnostic form — pass the raw `providerOptions` so it composes with any provider:
60+
61+
```ts /trigger/chat.ts
62+
return streamText({
63+
model: anthropic("claude-sonnet-4-5"),
64+
...chat.toStreamTextOptions({
65+
systemProviderOptions: { anthropic: { cacheControl: { type: "ephemeral" } } },
66+
}),
67+
messages,
68+
abortSignal: signal,
69+
});
70+
```
71+
72+
**`providerOptions` on `chat.prompt.set()`** co-locates the intent with where the prompt is defined. It carries through to `toStreamTextOptions()` with no call-site change:
73+
74+
```ts /trigger/chat.ts
75+
onChatStart: async () => {
76+
chat.prompt.set(SYSTEM_PROMPT, {
77+
providerOptions: { anthropic: { cacheControl: { type: "ephemeral" } } },
78+
});
79+
},
80+
run: async ({ messages, signal }) => {
81+
return streamText({
82+
model: anthropic("claude-sonnet-4-5"),
83+
...chat.toStreamTextOptions(), // already cached
84+
messages,
85+
abortSignal: signal,
86+
});
87+
},
88+
```
89+
90+
If more than one is set, the call-site option wins: `systemProviderOptions` overrides `cacheControl`, and both override `chat.prompt.set`'s `providerOptions`. There's no deep merge — the most specific option replaces the rest.
91+
92+
<Note>
93+
Use the 1-hour cache for prefixes that sit idle longer than 5 minutes between turns: `cacheControl: { type: "ephemeral", ttl: "1h" }`. Writes cost more (2× vs 1.25×), so it pays off only when reads span the longer window.
94+
</Note>
95+
96+
## Cache the conversation history
97+
98+
Place a breakpoint on the last message and the entire conversation prefix up to that point is cached, so the next turn reads it back instead of re-processing it. Do this in [`prepareMessages`](/ai-chat/reference#chatagentoptions) — it transforms model messages once, and `chat.agent` applies it on every path that builds a prompt (each turn, and both compaction rebuild paths), so the breakpoint always lands on the real last message.
99+
100+
```ts /trigger/chat.ts
101+
export const myChat = chat.agent({
102+
id: "my-chat",
103+
prepareMessages: async ({ messages }) => {
104+
if (messages.length === 0) return messages;
105+
const last = messages[messages.length - 1];
106+
return [
107+
...messages.slice(0, -1),
108+
{
109+
...last,
110+
providerOptions: {
111+
...last.providerOptions,
112+
anthropic: { cacheControl: { type: "ephemeral" } },
113+
},
114+
},
115+
];
116+
},
117+
run: async ({ messages, signal }) => {
118+
return streamText({
119+
model: anthropic("claude-sonnet-4-5"),
120+
...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }),
121+
messages,
122+
abortSignal: signal,
123+
});
124+
},
125+
});
126+
```
127+
128+
The system breakpoint and the conversation breakpoint compose: the system block is cached once for the life of the chat, and each turn extends the cached message prefix.
129+
130+
<Note>
131+
Anthropic allows **at most 4** cache breakpoints per request, and a prefix must be at least ~1024 tokens (model-dependent) to cache at all — shorter prefixes silently don't cache. One system breakpoint plus one rolling message breakpoint is the typical setup and leaves headroom.
132+
</Note>
133+
134+
## Caching and compaction
135+
136+
Compaction rewrites the conversation prefix — it replaces earlier turns with a summary — so it necessarily invalidates the cached message prefix at that point. That's a one-time reset, not a regression: because `prepareMessages` also runs on the compaction rebuild and result paths, the new (shorter) prefix gets a fresh breakpoint and re-warms on the next turn. Your system-prompt cache is unaffected — compaction never touches the system block. See [Compaction](/ai-chat/compaction) for how the summary is produced.
137+
138+
## Other providers
139+
140+
Caching is provider-specific, and most providers don't use per-block breakpoints at all:
141+
142+
- **OpenAI** and **Google Gemini** cache automatically. OpenAI caches any prompt prefix over 1024 tokens; Gemini 2.5 caches implicitly (1024 tokens on Flash, 2048 on Pro). Neither needs a breakpoint, so the system-caching options above are a no-op for them — `chat.agent` already gives automatic caching exactly what it needs: a byte-stable prefix that only grows across turns. Keep the system prompt frozen and the prefix over the model's minimum and reads happen on their own. (OpenAI's optional `providerOptions.openai.promptCacheKey` improves hit-routing across requests; it's a top-level option, not a system-block breakpoint.)
143+
144+
- **Anthropic** and **Amazon Bedrock** take an explicit breakpoint on the system block — Anthropic via `cacheControl`, Bedrock via `cachePoint`. Both go through the provider-agnostic `systemProviderOptions`:
145+
146+
```ts /trigger/chat.ts
147+
// Amazon Bedrock
148+
return streamText({
149+
...chat.toStreamTextOptions({
150+
systemProviderOptions: { bedrock: { cachePoint: { type: "default" } } },
151+
}),
152+
messages,
153+
});
154+
```
155+
156+
The `cacheControl` shorthand is Anthropic-only; `systemProviderOptions` (and `chat.prompt.set`'s `providerOptions`) is the form to reach for on any other breakpoint-based provider.
157+
158+
Usage reporting is normalized. Providers report cache tokens under different names (`cachedPromptTokens`, `cachedContentTokenCount`, `cacheReadInputTokens`), but the AI SDK maps them into the same `inputTokenDetails.cacheReadTokens` / `cacheWriteTokens` that `previousTurnUsage` and `totalUsage` carry and the dashboard shows — so the [verify step](#verify-caching-is-working) is the same regardless of provider.
159+
160+
## Verify caching is working
161+
162+
The turn's usage carries cache token counts. `chat.agent` accumulates them across turns and hands them to `run` as `previousTurnUsage` (last turn) and `totalUsage` (whole chat), both `LanguageModelUsage`:
163+
164+
```ts /trigger/chat.ts
165+
run: async ({ messages, signal, previousTurnUsage }) => {
166+
// After turn 1, cacheReadTokens should be > 0 on a stable prefix.
167+
console.log("cache read", previousTurnUsage?.inputTokenDetails?.cacheReadTokens);
168+
console.log("cache write", previousTurnUsage?.inputTokenDetails?.cacheWriteTokens);
169+
170+
return streamText({
171+
model: anthropic("claude-sonnet-4-5"),
172+
...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }),
173+
messages,
174+
abortSignal: signal,
175+
});
176+
},
177+
```
178+
179+
The first turn writes the cache (`cacheWriteTokens > 0`, `cacheReadTokens` is 0). Every turn after, on an unchanged prefix, reads it (`cacheReadTokens > 0`). The dashboard surfaces the same numbers on the AI span as **Cache write** and **Cache read**, so you can confirm hits per run without logging.
180+
181+
If `cacheReadTokens` stays 0 across turns with an identical prefix, a silent invalidator is shifting the bytes — see below.
182+
183+
<Warning>
184+
Anything that changes the prefix between turns silently kills the cache. Keep the system prompt **byte-stable** — never interpolate a timestamp, request ID, or per-turn value into `chat.prompt`. Don't change the **model** or the **tool set** mid-conversation (tools render at position 0, so adding one invalidates everything after). Inject dynamic per-turn context as a late message via [pending messages](/ai-chat/pending-messages) or [background injection](/ai-chat/background-injection), not into the cached prefix.
185+
</Warning>
186+
187+
## Next steps
188+
189+
<CardGroup cols={2}>
190+
<Card title="Compaction" icon="compress" href="/ai-chat/compaction">
191+
Keep long conversations within token limits — and re-warm the cache after.
192+
</Card>
193+
<Card title="Fast starts" icon="bolt" href="/ai-chat/fast-starts">
194+
Cut cold-start latency so a cached prefix is the only thing between a message and a reply.
195+
</Card>
196+
<Card title="chat.agent reference" icon="book" href="/ai-chat/reference#chatagentoptions">
197+
Full option surface, including `prepareMessages` and `toStreamTextOptions`.
198+
</Card>
199+
<Card title="Building agents: backend" icon="server" href="/ai-chat/backend">
200+
The three ways to build a chat backend and when to reach for each.
201+
</Card>
202+
</CardGroup>

docs/docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@
123123
"ai/prompts",
124124
"ai-chat/fast-starts",
125125
"ai-chat/compaction",
126+
"ai-chat/prompt-caching",
126127
"ai-chat/pending-messages",
127128
"ai-chat/background-injection",
128129
"ai-chat/actions",

0 commit comments

Comments
 (0)