Skip to content

Anthropic prompt caching and conversation summarization #30

@roznet

Description

@roznet

Goal

Add Anthropic prompt caching and conversation summarization to reduce costs and manage context window in multi-turn conversations.

1. Prompt Caching

What: Anthropic's prompt caching caches the static system prompt and tool definitions so subsequent requests in a session pay 0.1x on those tokens instead of 1x. First request pays 1.25x (cache write), then reads are 0.1x with a 5-minute TTL.

What we learned:

  • AnthropicPromptCachingMiddleware from langchain_anthropic.middleware exists (verified in langchain-anthropic 1.3.2) but only works with create_agent()not compatible with our custom StateGraph
  • ChatAnthropic natively supports cache_control as a kwarg — it applies the cache breakpoint to the last eligible message block in _generate()
  • .bind(cache_control={"type": "ephemeral"}) works and creates a RunnableBinding that passes cache_control on every invocation
  • Our _PROVIDER_REGISTRY in langgraph_runner.py already has an extra_kwargs lambda for each provider — cache_control can be added there for anthropic

Open question: The kwarg approach applies cache_control to the last eligible message block. For optimal caching, we want the breakpoint on the system prompt + tool definitions (static), not on user messages (which change each turn). Need to verify if with_structured_output() (used by planner) preserves the cache_control kwarg propagation, and whether the "last eligible" heuristic puts the breakpoint in the right place.

Approach: Add cache_control={"type": "ephemeral"} to the anthropic provider's extra_kwargs in _PROVIDER_REGISTRY. This is a one-line change once we verify the breakpoint placement is correct.

2. Conversation Summarization

What: Automatically summarize older conversation messages once they exceed a token threshold, preventing unbounded message growth in multi-turn sessions.

What we learned:

  • SummarizationMiddleware from langchain.agents.middleware exists (verified in langchain 1.0.4) but only works with create_agent()not compatible with our custom StateGraph
  • Our agent state uses messages: Annotated[List[BaseMessage], operator.add] — messages accumulate indefinitely via the checkpointer
  • Currently no message cleanup or trimming happens

Recommended approach: Preprocessing graph node (Option A from exploration)

  • Add a node before the planner that checks message count/tokens
  • When threshold exceeded: summarize older messages into a system message, keep recent N messages
  • Controlled via behavior config JSON (threshold, keep count, enabled flag)
  • Most LangGraph-idiomatic, easy to test, no checkpointer modifications needed

Files to modify:

  • shared/aviation_agent/graph.py — add summarizer node
  • shared/aviation_agent/behavior_config.py — add summarization config
  • configs/aviation_agent/*.json — add summarization settings

Priority

Prompt caching is a quick win (small change, immediate cost savings for Anthropic configs). Summarization is more involved but important for production multi-turn usage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions