Goal
Add Anthropic prompt caching and conversation summarization to reduce costs and manage context window in multi-turn conversations.
1. Prompt Caching
What: Anthropic's prompt caching caches the static system prompt and tool definitions so subsequent requests in a session pay 0.1x on those tokens instead of 1x. First request pays 1.25x (cache write), then reads are 0.1x with a 5-minute TTL.
What we learned:
AnthropicPromptCachingMiddleware from langchain_anthropic.middleware exists (verified in langchain-anthropic 1.3.2) but only works with create_agent() — not compatible with our custom StateGraph
ChatAnthropic natively supports cache_control as a kwarg — it applies the cache breakpoint to the last eligible message block in _generate()
.bind(cache_control={"type": "ephemeral"}) works and creates a RunnableBinding that passes cache_control on every invocation
- Our
_PROVIDER_REGISTRY in langgraph_runner.py already has an extra_kwargs lambda for each provider — cache_control can be added there for anthropic
Open question: The kwarg approach applies cache_control to the last eligible message block. For optimal caching, we want the breakpoint on the system prompt + tool definitions (static), not on user messages (which change each turn). Need to verify if with_structured_output() (used by planner) preserves the cache_control kwarg propagation, and whether the "last eligible" heuristic puts the breakpoint in the right place.
Approach: Add cache_control={"type": "ephemeral"} to the anthropic provider's extra_kwargs in _PROVIDER_REGISTRY. This is a one-line change once we verify the breakpoint placement is correct.
2. Conversation Summarization
What: Automatically summarize older conversation messages once they exceed a token threshold, preventing unbounded message growth in multi-turn sessions.
What we learned:
SummarizationMiddleware from langchain.agents.middleware exists (verified in langchain 1.0.4) but only works with create_agent() — not compatible with our custom StateGraph
- Our agent state uses
messages: Annotated[List[BaseMessage], operator.add] — messages accumulate indefinitely via the checkpointer
- Currently no message cleanup or trimming happens
Recommended approach: Preprocessing graph node (Option A from exploration)
- Add a node before the planner that checks message count/tokens
- When threshold exceeded: summarize older messages into a system message, keep recent N messages
- Controlled via behavior config JSON (threshold, keep count, enabled flag)
- Most LangGraph-idiomatic, easy to test, no checkpointer modifications needed
Files to modify:
shared/aviation_agent/graph.py — add summarizer node
shared/aviation_agent/behavior_config.py — add summarization config
configs/aviation_agent/*.json — add summarization settings
Priority
Prompt caching is a quick win (small change, immediate cost savings for Anthropic configs). Summarization is more involved but important for production multi-turn usage.
Goal
Add Anthropic prompt caching and conversation summarization to reduce costs and manage context window in multi-turn conversations.
1. Prompt Caching
What: Anthropic's prompt caching caches the static system prompt and tool definitions so subsequent requests in a session pay 0.1x on those tokens instead of 1x. First request pays 1.25x (cache write), then reads are 0.1x with a 5-minute TTL.
What we learned:
AnthropicPromptCachingMiddlewarefromlangchain_anthropic.middlewareexists (verified in langchain-anthropic 1.3.2) but only works withcreate_agent()— not compatible with our customStateGraphChatAnthropicnatively supportscache_controlas a kwarg — it applies the cache breakpoint to the last eligible message block in_generate().bind(cache_control={"type": "ephemeral"})works and creates aRunnableBindingthat passes cache_control on every invocation_PROVIDER_REGISTRYinlanggraph_runner.pyalready has anextra_kwargslambda for each provider —cache_controlcan be added there for anthropicOpen question: The kwarg approach applies cache_control to the last eligible message block. For optimal caching, we want the breakpoint on the system prompt + tool definitions (static), not on user messages (which change each turn). Need to verify if
with_structured_output()(used by planner) preserves the cache_control kwarg propagation, and whether the "last eligible" heuristic puts the breakpoint in the right place.Approach: Add
cache_control={"type": "ephemeral"}to the anthropic provider'sextra_kwargsin_PROVIDER_REGISTRY. This is a one-line change once we verify the breakpoint placement is correct.2. Conversation Summarization
What: Automatically summarize older conversation messages once they exceed a token threshold, preventing unbounded message growth in multi-turn sessions.
What we learned:
SummarizationMiddlewarefromlangchain.agents.middlewareexists (verified in langchain 1.0.4) but only works withcreate_agent()— not compatible with our customStateGraphmessages: Annotated[List[BaseMessage], operator.add]— messages accumulate indefinitely via the checkpointerRecommended approach: Preprocessing graph node (Option A from exploration)
Files to modify:
shared/aviation_agent/graph.py— add summarizer nodeshared/aviation_agent/behavior_config.py— add summarization configconfigs/aviation_agent/*.json— add summarization settingsPriority
Prompt caching is a quick win (small change, immediate cost savings for Anthropic configs). Summarization is more involved but important for production multi-turn usage.