Skip to content

feat(claude): configurable prompt-cache TTL (ATTUNE_RAG_CACHE_TTL=1h)#185

Merged
silversurfer562 merged 1 commit into
mainfrom
feat/cache-ttl-citations
Jun 17, 2026
Merged

feat(claude): configurable prompt-cache TTL (ATTUNE_RAG_CACHE_TTL=1h)#185
silversurfer562 merged 1 commit into
mainfrom
feat/cache-ttl-citations

Conversation

@silversurfer562

Copy link
Copy Markdown
Member

What

Implements specs/long-cache-ttl-citations (tasks 1–5; design + tasks were approved 2026-06-14). No API key needed.

ClaudeProvider hard-coded the 5-minute ephemeral cache_control marker at both emit sites. This adds a single _cache_control() helper resolved from a new env var:

export ATTUNE_RAG_CACHE_TTL=1h   # default: 5m
  • 1h{"type": "ephemeral", "ttl": "1h"} (extended window, same per-token price)
  • unset / 5m / anything else → {"type": "ephemeral"}byte-identical to prior behavior

Used at both sites: generate() cached prefix + the citations first-document block (the i == 0 invariant is preserved). Read per-call so the env flips behavior without re-import. No public API change.

Tests (no live API)

  • _cache_control() resolution: unset / 5m / 1h / " 1H " (case+whitespace) / 30m / empty
  • generate prefix + citations first-doc carry the 1h shape; defaults unchanged; only the first citation doc is marked
  • Full providers suite: 42 passed, ruff clean

Deferred

Task 6 — a @pytest.mark.live cache-hit smoke (assert cache_creation → cache_read over two calls + confirm the SDK accepts ttl) — needs API access; gated behind the live marker per testing-conventions.

Spec

Tasks 1–5 → done in specs/long-cache-ttl-citations/tasks.md (status update follows in the umbrella repo).

🤖 Generated with Claude Code

ClaudeProvider hard-coded the 5-minute `ephemeral` cache_control marker at
both emit sites (generate() prefix + citations first-document). Add a
`_cache_control()` helper that resolves the marker from `ATTUNE_RAG_CACHE_TTL`:
`1h` (case/whitespace-insensitive) → `{"type": "ephemeral", "ttl": "1h"}` for
workloads that issue clusters of related queries within the hour (dashboards,
benchmark sweeps), at the same per-token rate. Any other value (unset / `5m`)
yields the prior `{"type": "ephemeral"}` — byte-identical default.

Read per-call (not a module global) so the env flips behavior without
re-import. Single source of truth reused at both sites; no public API change.

- _cache_control() helper + both call sites use it
- wire-shape unit tests: helper resolution (unset/5m/1h/" 1H "/garbage),
  generate prefix + citations first-doc carry the 1h shape, defaults unchanged,
  i==0 invariant preserved (mocked client, no live API)
- README: when to set 1h vs default

Implements specs/long-cache-ttl-citations (tasks 1-5). Task 6 (live cache-hit
smoke) deferred — needs API access; gated behind @pytest.mark.live.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 17, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
attune-rag Ready Ready Preview, Comment Jun 17, 2026 9:22am

@github-actions

Copy link
Copy Markdown
Contributor

Security scan

✅ No findings at or above severity high.

@github-actions

Copy link
Copy Markdown
Contributor

Perf delta — within baseline

Blocking on regression in: keyword_retriever_retrieve.cpu, rag_pipeline_run.cpu. Other metrics are advisory and don't block merge.

Metric Baseline mean (s) Current mean (s) Δ Threshold (s) Status
directory_corpus_load.cpu 0.000055 0.000049 -10.9% 0.000059 ok
directory_corpus_load.wall 0.000054 0.000049 -9.3% 0.000058 ok
keyword_retriever_retrieve.cpu 0.005437 0.003510 -35.4% 0.005631 ok
keyword_retriever_retrieve.wall 0.005437 0.003510 -35.4% 0.005632 ok
rag_pipeline_run.cpu 0.000633 0.000544 -14.1% 0.000793 ok
rag_pipeline_run.wall 0.000632 0.000544 -13.9% 0.000793 ok

@github-actions

Copy link
Copy Markdown
Contributor

Downstream attune-gui — green

Blocking gate (Phase 4 W3.2 onwards). Failure here fails the job and blocks merge.

Field Value
attune-rag PR SHA 9af23f9b5d58ae96d89dd665ecad732427b2c383
attune-gui ref feature/attune-rag-0.2-editor-rename
Test selector pytest sidecar/tests -k 'editor or rag'
Status ✅ pass
Workflow run link

@codecov

codecov Bot commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@silversurfer562 silversurfer562 merged commit 28a753f into main Jun 17, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant