feat(cachet): structured telemetry with spans, events, and handler API#460
feat(cachet): structured telemetry with spans, events, and handler API#460schgoo wants to merge 27 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #460 +/- ##
========================================
Coverage 100.0% 100.0%
========================================
Files 335 334 -1
Lines 25586 26100 +514
========================================
+ Hits 25586 26100 +514 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR upgrades cachet telemetry from standalone events to a correlated span + event model, and introduces a callback-based handler API so consumers can build structured telemetry pipelines without tracing visitor boilerplate.
Changes:
- Adds per-operation and per-tier spans/events, including timing fields and flags (coalesced/fallback).
- Introduces
CacheEventHandlerwith typedCacheTierEvent/CacheOperationEventand request correlation viarequest_id. - Removes the old timing extension (
telemetry/ext.rs) and updates wrappers/builders/examples accordingly.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/cachet/src/wrapper.rs | Wraps tier operations in tier spans; records hit/miss/expired/etc with new telemetry API; propagates fallback flag. |
| crates/cachet/src/telemetry/mod.rs | Exposes new handler module and updates module docs. |
| crates/cachet/src/telemetry/handler.rs | Adds public callback API types (CacheEventHandler, event structs, RequestId). |
| crates/cachet/src/telemetry/ext.rs | Removes old ClockExt/Timed timing utilities. |
| crates/cachet/src/telemetry/cache.rs | Implements request correlation, span factories, event recording, and handler emission; adds extensive tests. |
| crates/cachet/src/telemetry/attributes.rs | Updates public telemetry field constants and removes EVENT_FALLBACK in favor of FIELD_FALLBACK. |
| crates/cachet/src/refresh.rs | Updates background refresh flow to use new spans/recording helpers. |
| crates/cachet/src/lib.rs | Updates crate docs for span-based telemetry and re-exports handler types. |
| crates/cachet/src/fallback.rs | Updates fallback get/promotion path to set fallback flag and rely on wrapper tier telemetry. |
| crates/cachet/src/eviction.rs | Routes eviction/expiration causes to new telemetry helpers and adds a no-logging behavior test. |
| crates/cachet/src/cache.rs | Adds operation-level spans/events with request_id correlation and stampede-protection coalesced flag recording. |
| crates/cachet/src/builder/transform.rs | Threads fallback flag through transform tier construction and passes telemetry into Cache::new. |
| crates/cachet/src/builder/cache.rs | Updates enable_logs behavior and adds event_handler(...) registration API. |
| crates/cachet/src/builder/buildable.rs | Extends tier-building to propagate fallback metadata and passes telemetry into Cache::new. |
| crates/cachet/README.md | Updates generated README telemetry docs to match the new span model. |
| crates/cachet/examples/telemetry_subscriber.rs | Simplifies example to show default span+event output using fmt subscriber. |
| crates/cachet/examples/telemetry_accumulator.rs | Adds example demonstrating request-correlated accumulation using CacheEventHandler + DashMap. |
| crates/cachet/Cargo.toml | Adjusts logs feature and dependencies; adds dashmap dev-dependency; registers new example. |
| crates/cachet/benches/operations.rs | Adds a benchmark configuration to measure overhead with an active tracing subscriber. |
| crates/cachet_memory/README.md | Updates generated README dependency info. |
| Cargo.lock | Adds dashmap and removes thread_aware from cachet crate dependency set. |
| .spelling | Adds “moka’s” to dictionary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Have you had a look at |
@ralfbiedert that's the idea (it's actually in the PR description)! I'm not sure when it will be available, though, but I'm hoping this provides a good middle ground. |
|
| pub(crate) fn record_refresh_miss(&self, cache_name: CacheName, duration: Duration) { | ||
| self.record_info_with_duration(cache_name, attributes::EVENT_REFRESH_MISS, duration); | ||
| } |
| }); | ||
| }); | ||
|
|
||
| // With telemetry + active subscriber (measures span processing overhead) |
| //! Uses `DashMap` for lock-free concurrent accumulation — safe across | ||
| //! all async runtimes, including work-stealing (tokio) and thread-per-core | ||
| //! (oxidizer), even if a task migrates between cores mid-operation. |
| //! **Tier events** (hit, miss, expired, etc.) carry `FIELD_NAME`, `FIELD_EVENT`, | ||
| //! and `FIELD_DURATION_NS`. Some events intentionally omit `FIELD_DURATION_NS` | ||
| //! to indicate "not timed": `EVENT_INSERT_REJECTED`, `EVENT_EVICTION`, and | ||
| //! background `EVENT_EXPIRED` events emitted from eviction listeners. | ||
| //! | ||
| //! **Operation-complete events** carry `FIELD_NAME`, `FIELD_OPERATION`, | ||
| //! `FIELD_DURATION_NS`, and `FIELD_COALESCED`. |
| use std::cell::Cell; | ||
| use std::pin::Pin; | ||
| use std::sync::Arc; |
Summary
Replaces cachet's event-only telemetry with structured tracing events, request correlation, and a callback API for consumers to build custom telemetry pipelines.
Motivation
The previous telemetry emitted standalone tracing events with no correlation between tiers. Consumers couldn't distinguish which tier events belonged to which cache operation, and there was no way to subscribe to structured telemetry without parsing tracing fields via the visitor pattern.
Changes
Structured tracing events
cache.hit,cache.miss,cache.expired, etc.) emits a tracing event withcache.name,cache.event, andcache.duration_nscache.name,cache.operation,cache.duration_ns, andcache.coalescedself.logging_enabled— no ambient span pollution when logging is disabledCacheEventHandlercallback APICacheEventHandlertrait withon_tier_eventandon_operation_completecallbacksCacheBuilder::event_handler(handler)CacheTierEventandCacheOperationEventstructs — no tracing visitor boilerplatelogsfeature flagemitRequest correlation
request_id: u64from a process-wide atomic counterWithRequestId<F>future wrapper saves/restores the request ID in a thread-local on everypoll(), surviving task migration across threads/cores and supporting nested cache operations (e.g., aget_or_insertclosure calling another cache method)CacheTierEventandCacheOperationEventcarry therequest_idfor groupingrequest_id; background maintenance evictions haverequest_id = 0Fallback as a flag
cache.fallbackis a boolean flag on tier events, not a separate event typeEVENT_FALLBACKevent is emitted when a fallback path is takenEviction telemetry
record_evictionandrecord_background_expiredemit standalone events for moka eviction/expiry callbacksmemory_with(|b| b.with_eviction_telemetry())builderRemoved
telemetry/ext.rs(ClockExt,Timed,TimedResult) — replaced byclock.stopwatch()directlyEVENT_REQUEST_MERGEDattribute constantCacheTelemetryInner— emit logic moved intoCacheTelemetrydirectlyemitwhen trace support is addedNew attributes
FIELD_OPERATION— the cache operation name on completion eventsFIELD_COALESCED— boolean flag for stampede protectionFIELD_FALLBACK— boolean flag for fallback tier consultationEVENT_FALLBACK— emitted when a fallback path is takenEVENT_EVICTION— emitted on capacity evictionsExamples
telemetry_subscriber— shows event output withtracing_subscriber::fmttelemetry_accumulator— demonstrates accumulating tier events into a single summary per operation usingCacheEventHandler+DashMap, mirroring a TVS-style consumer patternPerformance
Benchmarked in release mode (MockCache get, single tier):
Telemetry with no active subscriber adds ~10ns overhead.