KV Cache Phase 1: instrument cache-hit metrics, prefix divergence, and warm/cold latency

## Summary
Implement request-level cache observability so BB-Agent can measure cache reuse directly and make cache-hit rate the primary optimization target.

## Why
The KV cache refactor should be driven by measurable outcomes, not architecture changes alone. We already store provider `cache_read_tokens` / `cache_write_tokens`, but we are missing the request-level telemetry needed to answer:

- what changed between two consecutive requests,
- how much stable prefix was reused,
- whether a turn was warm or cold,
- whether compaction or tool waits caused divergence,
- whether follow-up TTFT improves after cache-friendly refactors.

This is the highest-priority issue in the KV-cache plan.

## Scope
Add request instrumentation for every provider call.

### Required metrics
- request/session/turn identifiers
- provider/model
- full request hash
- stable prefix hash
- system prompt hash
- tool defs hash
- previous request hash
- first divergence byte/token estimate
- reused prefix length estimate
- cache read tokens
- cache write tokens
- input/output tokens
- request start time
- first stream event time
- first text delta time
- request finished time
- TTFT
- total latency
- tool wait time
- resume latency
- post-compaction flag
- mutation flags for system/context/request rewrite

## Code touchpoints
- `crates/cli/src/turn_runner/runner.rs`
- `crates/provider/src/types.rs`
- `crates/provider/src/streaming.rs`
- `crates/cli/src/session_info.rs`
- new instrumentation module(s)

## Acceptance criteria
- repeated turns emit comparable request fingerprints
- metrics clearly show warm vs cold behavior
- cache-hit proxy can be derived from `cache_read_tokens` and prefix reuse estimate
- TTFT is recorded and inspectable for follow-up turns
- compaction and hook-driven mutations are visible in telemetry

## Reference
- `knowledge/internal/KV_CACHE_REFACTOR_MASTER_PLAN.md`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KV Cache Phase 1: instrument cache-hit metrics, prefix divergence, and warm/cold latency #46

Summary

Why

Scope

Required metrics

Code touchpoints

Acceptance criteria

Reference

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

KV Cache Phase 1: instrument cache-hit metrics, prefix divergence, and warm/cold latency #46

Description

Summary

Why

Scope

Required metrics

Code touchpoints

Acceptance criteria

Reference

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions