Anthropic prompt caching and conversation summarization

## Goal

Add Anthropic prompt caching and conversation summarization to reduce costs and manage context window in multi-turn conversations.

## 1. Prompt Caching

**What:** Anthropic's prompt caching caches the static system prompt and tool definitions so subsequent requests in a session pay 0.1x on those tokens instead of 1x. First request pays 1.25x (cache write), then reads are 0.1x with a 5-minute TTL.

**What we learned:**
- `AnthropicPromptCachingMiddleware` from `langchain_anthropic.middleware` exists (verified in langchain-anthropic 1.3.2) but only works with `create_agent()` — **not compatible** with our custom `StateGraph`
- `ChatAnthropic` natively supports `cache_control` as a kwarg — it applies the cache breakpoint to the last eligible message block in `_generate()`
- `.bind(cache_control={"type": "ephemeral"})` works and creates a `RunnableBinding` that passes cache_control on every invocation
- Our `_PROVIDER_REGISTRY` in `langgraph_runner.py` already has an `extra_kwargs` lambda for each provider — `cache_control` can be added there for anthropic

**Open question:** The kwarg approach applies cache_control to the last eligible message block. For optimal caching, we want the breakpoint on the system prompt + tool definitions (static), not on user messages (which change each turn). Need to verify if `with_structured_output()` (used by planner) preserves the cache_control kwarg propagation, and whether the "last eligible" heuristic puts the breakpoint in the right place.

**Approach:** Add `cache_control={"type": "ephemeral"}` to the anthropic provider's `extra_kwargs` in `_PROVIDER_REGISTRY`. This is a one-line change once we verify the breakpoint placement is correct.

## 2. Conversation Summarization

**What:** Automatically summarize older conversation messages once they exceed a token threshold, preventing unbounded message growth in multi-turn sessions.

**What we learned:**
- `SummarizationMiddleware` from `langchain.agents.middleware` exists (verified in langchain 1.0.4) but only works with `create_agent()` — **not compatible** with our custom `StateGraph`
- Our agent state uses `messages: Annotated[List[BaseMessage], operator.add]` — messages accumulate indefinitely via the checkpointer
- Currently no message cleanup or trimming happens

**Recommended approach: Preprocessing graph node** (Option A from exploration)
- Add a node before the planner that checks message count/tokens
- When threshold exceeded: summarize older messages into a system message, keep recent N messages
- Controlled via behavior config JSON (threshold, keep count, enabled flag)
- Most LangGraph-idiomatic, easy to test, no checkpointer modifications needed

**Files to modify:**
- `shared/aviation_agent/graph.py` — add summarizer node
- `shared/aviation_agent/behavior_config.py` — add summarization config
- `configs/aviation_agent/*.json` — add summarization settings

## Priority

Prompt caching is a quick win (small change, immediate cost savings for Anthropic configs). Summarization is more involved but important for production multi-turn usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anthropic prompt caching and conversation summarization #30

Goal

1. Prompt Caching

2. Conversation Summarization

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Anthropic prompt caching and conversation summarization #30

Description

Goal

1. Prompt Caching

2. Conversation Summarization

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions