Skip to content

feat(cachet): structured telemetry with spans, events, and handler API#460

Open
schgoo wants to merge 27 commits into
mainfrom
u/schgoo/coalescemetrics
Open

feat(cachet): structured telemetry with spans, events, and handler API#460
schgoo wants to merge 27 commits into
mainfrom
u/schgoo/coalescemetrics

Conversation

@schgoo
Copy link
Copy Markdown
Collaborator

@schgoo schgoo commented May 28, 2026

Summary

Replaces cachet's event-only telemetry with structured tracing events, request correlation, and a callback API for consumers to build custom telemetry pipelines.

Motivation

The previous telemetry emitted standalone tracing events with no correlation between tiers. Consumers couldn't distinguish which tier events belonged to which cache operation, and there was no way to subscribe to structured telemetry without parsing tracing fields via the visitor pattern.

Changes

Structured tracing events

  • Each tier outcome (cache.hit, cache.miss, cache.expired, etc.) emits a tracing event with cache.name, cache.event, and cache.duration_ns
  • Each cache operation emits a completion event with cache.name, cache.operation, cache.duration_ns, and cache.coalesced
  • Events are emitted at the appropriate severity level (debug for hits/misses, info for inserts/expirations, error for failures)
  • All tracing emission is gated on self.logging_enabled — no ambient span pollution when logging is disabled

CacheEventHandler callback API

  • New CacheEventHandler trait with on_tier_event and on_operation_complete callbacks
  • Registered via CacheBuilder::event_handler(handler)
  • Receives typed CacheTierEvent and CacheOperationEvent structs — no tracing visitor boilerplate
  • Works independently of the logs feature flag
  • Designed as the semi-stable consumer API that hopefully survives a future migration to emit

Request correlation

  • Each cache operation gets a unique request_id: u64 from a process-wide atomic counter
  • WithRequestId<F> future wrapper saves/restores the request ID in a thread-local on every poll(), surviving task migration across threads/cores and supporting nested cache operations (e.g., a get_or_insert closure calling another cache method)
  • Both CacheTierEvent and CacheOperationEvent carry the request_id for grouping
  • Eviction events triggered synchronously during an insert inherit the insert's request_id; background maintenance evictions have request_id = 0

Fallback as a flag

  • cache.fallback is a boolean flag on tier events, not a separate event type
  • Indicates whether a tier was consulted as a fallback
  • A separate EVENT_FALLBACK event is emitted when a fallback path is taken

Eviction telemetry

  • record_eviction and record_background_expired emit standalone events for moka eviction/expiry callbacks
  • Integrated with the eviction hook from main's memory_with(|b| b.with_eviction_telemetry()) builder

Removed

  • telemetry/ext.rs (ClockExt, Timed, TimedResult) — replaced by clock.stopwatch() directly
  • EVENT_REQUEST_MERGED attribute constant
  • CacheTelemetryInner — emit logic moved into CacheTelemetry directly
  • Tracing spans — replaced by event-only model with request ID correlation; spans will be reintroduced via emit when trace support is added

New attributes

  • FIELD_OPERATION — the cache operation name on completion events
  • FIELD_COALESCED — boolean flag for stampede protection
  • FIELD_FALLBACK — boolean flag for fallback tier consultation
  • EVENT_FALLBACK — emitted when a fallback path is taken
  • EVENT_EVICTION — emitted on capacity evictions

Examples

  • telemetry_subscriber — shows event output with tracing_subscriber::fmt
  • telemetry_accumulator — demonstrates accumulating tier events into a single summary per operation using CacheEventHandler + DashMap, mirroring a TVS-style consumer pattern

Performance

Benchmarked in release mode (MockCache get, single tier):

Configuration Time
No telemetry 471ns
Telemetry enabled, no subscriber 481ns
With stampede protection 1,122ns

Telemetry with no active subscriber adds ~10ns overhead.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.0%. Comparing base (bf1ba72) to head (b9c44e8).

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #460    +/-   ##
========================================
  Coverage   100.0%   100.0%            
========================================
  Files         335      334     -1     
  Lines       25586    26100   +514     
========================================
+ Hits        25586    26100   +514     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@schgoo schgoo marked this pull request as ready for review June 2, 2026 18:44
Copilot AI review requested due to automatic review settings June 2, 2026 18:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades cachet telemetry from standalone events to a correlated span + event model, and introduces a callback-based handler API so consumers can build structured telemetry pipelines without tracing visitor boilerplate.

Changes:

  • Adds per-operation and per-tier spans/events, including timing fields and flags (coalesced/fallback).
  • Introduces CacheEventHandler with typed CacheTierEvent / CacheOperationEvent and request correlation via request_id.
  • Removes the old timing extension (telemetry/ext.rs) and updates wrappers/builders/examples accordingly.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
crates/cachet/src/wrapper.rs Wraps tier operations in tier spans; records hit/miss/expired/etc with new telemetry API; propagates fallback flag.
crates/cachet/src/telemetry/mod.rs Exposes new handler module and updates module docs.
crates/cachet/src/telemetry/handler.rs Adds public callback API types (CacheEventHandler, event structs, RequestId).
crates/cachet/src/telemetry/ext.rs Removes old ClockExt/Timed timing utilities.
crates/cachet/src/telemetry/cache.rs Implements request correlation, span factories, event recording, and handler emission; adds extensive tests.
crates/cachet/src/telemetry/attributes.rs Updates public telemetry field constants and removes EVENT_FALLBACK in favor of FIELD_FALLBACK.
crates/cachet/src/refresh.rs Updates background refresh flow to use new spans/recording helpers.
crates/cachet/src/lib.rs Updates crate docs for span-based telemetry and re-exports handler types.
crates/cachet/src/fallback.rs Updates fallback get/promotion path to set fallback flag and rely on wrapper tier telemetry.
crates/cachet/src/eviction.rs Routes eviction/expiration causes to new telemetry helpers and adds a no-logging behavior test.
crates/cachet/src/cache.rs Adds operation-level spans/events with request_id correlation and stampede-protection coalesced flag recording.
crates/cachet/src/builder/transform.rs Threads fallback flag through transform tier construction and passes telemetry into Cache::new.
crates/cachet/src/builder/cache.rs Updates enable_logs behavior and adds event_handler(...) registration API.
crates/cachet/src/builder/buildable.rs Extends tier-building to propagate fallback metadata and passes telemetry into Cache::new.
crates/cachet/README.md Updates generated README telemetry docs to match the new span model.
crates/cachet/examples/telemetry_subscriber.rs Simplifies example to show default span+event output using fmt subscriber.
crates/cachet/examples/telemetry_accumulator.rs Adds example demonstrating request-correlated accumulation using CacheEventHandler + DashMap.
crates/cachet/Cargo.toml Adjusts logs feature and dependencies; adds dashmap dev-dependency; registers new example.
crates/cachet/benches/operations.rs Adds a benchmark configuration to measure overhead with an active tracing subscriber.
crates/cachet_memory/README.md Updates generated README dependency info.
Cargo.lock Adds dashmap and removes thread_aware from cachet crate dependency set.
.spelling Adds “moka’s” to dictionary.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/cachet/src/telemetry/cache.rs
Comment thread crates/cachet/src/telemetry/cache.rs
Comment thread crates/cachet/src/eviction.rs Outdated
@ralfbiedert
Copy link
Copy Markdown
Collaborator

Have you had a look at emit? Not sure about ETA, but once we release that I'd imagine we want to switch everything to that.

@schgoo
Copy link
Copy Markdown
Collaborator Author

schgoo commented Jun 3, 2026

Have you had a look at emit? Not sure about ETA, but once we release that I'd imagine we want to switch everything to that.

@ralfbiedert that's the idea (it's actually in the PR description)! I'm not sure when it will be available, though, but I'm hoping this provides a good middle ground.

Copilot AI review requested due to automatic review settings June 3, 2026 16:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 7 comments.

Comment thread crates/cachet/src/telemetry/cache.rs
Comment thread crates/cachet/tests/cache.rs Outdated
Comment thread crates/cachet/tests/cache.rs Outdated
Comment thread crates/cachet/tests/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs
Comment thread crates/cachet/src/telemetry/cache.rs
Comment thread crates/cachet/Cargo.toml
Copilot AI review requested due to automatic review settings June 3, 2026 18:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 5 comments.

Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs
Copilot AI review requested due to automatic review settings June 3, 2026 21:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Copilot AI review requested due to automatic review settings June 4, 2026 17:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 7 comments.

Comment thread crates/cachet/src/telemetry/cache.rs
Comment thread crates/cachet/src/telemetry/cache.rs
Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/attributes.rs
Comment thread crates/cachet/src/telemetry/attributes.rs
Comment thread crates/cachet/src/cache.rs
Copilot AI review requested due to automatic review settings June 5, 2026 16:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 6 comments.

Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs Outdated
Comment thread crates/cachet/src/lib.rs
Comment thread crates/cachet/src/telemetry/attributes.rs Outdated
Comment thread crates/cachet/src/telemetry/handler.rs Outdated
Comment thread crates/cachet/src/telemetry/cache.rs
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

⚠️ Breaking Changes Detected


--- failure pub_module_level_const_missing: pub module-level const is missing ---

Description:
A public const is missing or renamed
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/pub_module_level_const_missing.ron

Failed in:
  EVENT_FALLBACK in file /home/runner/work/oxidizer/oxidizer/target/semver-checks/git-origin_main/2cae70143277a47f7fd01c2b353eeee72c22ffb7/crates/cachet/src/telemetry/attributes.rs:59

If the breaking changes are intentional then everything is fine - this message is merely informative.

Remember to apply a version number bump with the correct severity when publishing a version with breaking changes (1.x.x -> 2.x.x or 0.1.x -> 0.2.x).

Copilot AI review requested due to automatic review settings June 5, 2026 17:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 8 comments.

Comment thread crates/cachet/src/wrapper.rs Outdated
Comment thread crates/cachet/src/wrapper.rs Outdated
Comment thread crates/cachet/src/wrapper.rs Outdated
Comment thread crates/cachet/src/wrapper.rs Outdated
Comment thread crates/cachet/src/refresh.rs Outdated
Comment thread crates/cachet/src/telemetry/attributes.rs
Comment thread crates/cachet/Cargo.toml
Comment thread crates/cachet/src/telemetry/cache.rs
Copilot AI review requested due to automatic review settings June 5, 2026 18:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 23 changed files in this pull request and generated 6 comments.

Comment thread crates/cachet/src/telemetry/cache.rs
Comment on lines +286 to 288
pub(crate) fn record_refresh_miss(&self, cache_name: CacheName, duration: Duration) {
self.record_info_with_duration(cache_name, attributes::EVENT_REFRESH_MISS, duration);
}
Comment thread crates/cachet/benches/operations.rs Outdated
});
});

// With telemetry + active subscriber (measures span processing overhead)
Comment on lines +10 to +12
//! Uses `DashMap` for lock-free concurrent accumulation — safe across
//! all async runtimes, including work-stealing (tokio) and thread-per-core
//! (oxidizer), even if a task migrates between cores mid-operation.
Comment on lines +9 to +15
//! **Tier events** (hit, miss, expired, etc.) carry `FIELD_NAME`, `FIELD_EVENT`,
//! and `FIELD_DURATION_NS`. Some events intentionally omit `FIELD_DURATION_NS`
//! to indicate "not timed": `EVENT_INSERT_REJECTED`, `EVENT_EVICTION`, and
//! background `EVENT_EXPIRED` events emitted from eviction listeners.
//!
//! **Operation-complete events** carry `FIELD_NAME`, `FIELD_OPERATION`,
//! `FIELD_DURATION_NS`, and `FIELD_COALESCED`.
Comment on lines +6 to +8
use std::cell::Cell;
use std::pin::Pin;
use std::sync::Arc;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants