Summary
Implement request-level cache observability so BB-Agent can measure cache reuse directly and make cache-hit rate the primary optimization target.
Why
The KV cache refactor should be driven by measurable outcomes, not architecture changes alone. We already store provider cache_read_tokens / cache_write_tokens, but we are missing the request-level telemetry needed to answer:
- what changed between two consecutive requests,
- how much stable prefix was reused,
- whether a turn was warm or cold,
- whether compaction or tool waits caused divergence,
- whether follow-up TTFT improves after cache-friendly refactors.
This is the highest-priority issue in the KV-cache plan.
Scope
Add request instrumentation for every provider call.
Required metrics
- request/session/turn identifiers
- provider/model
- full request hash
- stable prefix hash
- system prompt hash
- tool defs hash
- previous request hash
- first divergence byte/token estimate
- reused prefix length estimate
- cache read tokens
- cache write tokens
- input/output tokens
- request start time
- first stream event time
- first text delta time
- request finished time
- TTFT
- total latency
- tool wait time
- resume latency
- post-compaction flag
- mutation flags for system/context/request rewrite
Code touchpoints
crates/cli/src/turn_runner/runner.rs
crates/provider/src/types.rs
crates/provider/src/streaming.rs
crates/cli/src/session_info.rs
- new instrumentation module(s)
Acceptance criteria
- repeated turns emit comparable request fingerprints
- metrics clearly show warm vs cold behavior
- cache-hit proxy can be derived from
cache_read_tokens and prefix reuse estimate
- TTFT is recorded and inspectable for follow-up turns
- compaction and hook-driven mutations are visible in telemetry
Reference
knowledge/internal/KV_CACHE_REFACTOR_MASTER_PLAN.md
Summary
Implement request-level cache observability so BB-Agent can measure cache reuse directly and make cache-hit rate the primary optimization target.
Why
The KV cache refactor should be driven by measurable outcomes, not architecture changes alone. We already store provider
cache_read_tokens/cache_write_tokens, but we are missing the request-level telemetry needed to answer:This is the highest-priority issue in the KV-cache plan.
Scope
Add request instrumentation for every provider call.
Required metrics
Code touchpoints
crates/cli/src/turn_runner/runner.rscrates/provider/src/types.rscrates/provider/src/streaming.rscrates/cli/src/session_info.rsAcceptance criteria
cache_read_tokensand prefix reuse estimateReference
knowledge/internal/KV_CACHE_REFACTOR_MASTER_PLAN.md