feat: prompt cache-aware routing — prefer providers with warm caches

## Problem

The proxy already does raw body passthrough to preserve upstream prompt caching, but routing is static (weight-based). Two providers serving the same model may have very different cache hit rates, but traffic doesn't favor the one with the warmer cache.

## Proposal

Track per-provider cache hit/miss signals from response headers (e.g., Anthropic's `x-cache-hit`, token usage `cache_read_input_tokens`) and use this to influence routing:

1. Record cache hit rates per `(provider, model)` pair
2. When multiple providers can serve a request, boost routing weight for the provider with higher cache hit rate
3. This directly reduces input token costs and latency

## Expected Benefit

- Lower latency on cache hits (no re-processing of cached prefix)
- Reduced token costs (cache reads are cheaper)
- Smarter traffic distribution without manual weight tuning

## Related

Part of stability & performance exploration vs direct Claude Code CLI → LLM gateway connections.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: prompt cache-aware routing — prefer providers with warm caches #124

Problem

Proposal

Expected Benefit

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: prompt cache-aware routing — prefer providers with warm caches #124

Description

Problem

Proposal

Expected Benefit

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions