feat(claude): configurable prompt-cache TTL (ATTUNE_RAG_CACHE_TTL=1h)#185
Merged
Conversation
ClaudeProvider hard-coded the 5-minute `ephemeral` cache_control marker at
both emit sites (generate() prefix + citations first-document). Add a
`_cache_control()` helper that resolves the marker from `ATTUNE_RAG_CACHE_TTL`:
`1h` (case/whitespace-insensitive) → `{"type": "ephemeral", "ttl": "1h"}` for
workloads that issue clusters of related queries within the hour (dashboards,
benchmark sweeps), at the same per-token rate. Any other value (unset / `5m`)
yields the prior `{"type": "ephemeral"}` — byte-identical default.
Read per-call (not a module global) so the env flips behavior without
re-import. Single source of truth reused at both sites; no public API change.
- _cache_control() helper + both call sites use it
- wire-shape unit tests: helper resolution (unset/5m/1h/" 1H "/garbage),
generate prefix + citations first-doc carry the 1h shape, defaults unchanged,
i==0 invariant preserved (mocked client, no live API)
- README: when to set 1h vs default
Implements specs/long-cache-ttl-citations (tasks 1-5). Task 6 (live cache-hit
smoke) deferred — needs API access; gated behind @pytest.mark.live.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Security scan✅ No findings at or above severity |
Contributor
Perf delta — within baselineBlocking on regression in:
|
Contributor
Downstream attune-gui — greenBlocking gate (Phase 4 W3.2 onwards). Failure here fails the job and blocks merge.
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Implements
specs/long-cache-ttl-citations(tasks 1–5; design + tasks were approved 2026-06-14). No API key needed.ClaudeProviderhard-coded the 5-minuteephemeralcache_controlmarker at both emit sites. This adds a single_cache_control()helper resolved from a new env var:1h→{"type": "ephemeral", "ttl": "1h"}(extended window, same per-token price)5m/ anything else →{"type": "ephemeral"}— byte-identical to prior behaviorUsed at both sites:
generate()cached prefix + the citations first-document block (thei == 0invariant is preserved). Read per-call so the env flips behavior without re-import. No public API change.Tests (no live API)
_cache_control()resolution: unset /5m/1h/" 1H "(case+whitespace) /30m/ empty1hshape; defaults unchanged; only the first citation doc is markedDeferred
Task 6 — a
@pytest.mark.livecache-hit smoke (assertcache_creation → cache_readover two calls + confirm the SDK acceptsttl) — needs API access; gated behind thelivemarker per testing-conventions.Spec
Tasks 1–5 → done in
specs/long-cache-ttl-citations/tasks.md(status update follows in the umbrella repo).🤖 Generated with Claude Code