-
Notifications
You must be signed in to change notification settings - Fork 851
Description
Problem
The cache currently has a single wrap-related stat: proxy.process.cache.wrap_count (and its per-volume variant proxy.process.cache.volume_N.wrap_count). This is a runtime counter that increments each time a stripe wraps around and resets to zero on process restart.
The problem is that this stat alone doesn't tell you much that's operationally useful:
-
You can't determine total wrap history. After a restart,
wrap_countis zero. You have no idea if the cache has wrapped 0 times or 500 times. The only way to find out is to runtraffic_cache_tooland inspect the directory header, which requires stopping traffic or SSHing into the box. -
You can't determine cache age. There's no stat for when the cache was created. Combined with the lack of persistent wrap count, you can't compute wrap frequency (e.g., "this cache wraps every 4 hours" vs "every 4 days").
-
You can't tell if a cache has ever wrapped. On a fresh deployment or after a cache clear, there's no way to know from stats alone whether the cache has filled up and started overwriting old content. This matters for capacity planning -- if your caches are wrapping frequently, you may need more disk.
-
The
Note()log message is minimal. When a wrap occurs, the log says:Cache volume 1 on disk '/dev/sda' wraps aroundNo cycle count, no age information. To correlate wrap frequency you'd have to parse timestamps from syslog and count occurrences.
Real-world scenario
Consider a fleet of proxy servers with 500GB cache disks. You want to answer: "How often are our caches cycling through?" Today, the only way is to either:
- Watch
wrap_countin real-time and hope ATS doesn't restart during your observation window - SSH into each host and run
traffic_cache_toolto read the directory header
Neither scales to a fleet. If cycle and create_time were exposed as stats, you could query your metrics system and immediately compute wrap frequency across every host.
The directory header already has the data
The StripeHeaderFooter structure persists all of this to disk:
struct StripeHeaderFooter {
// ...
time_t create_time; // when the stripe was initialized
uint32_t cycle; // total wrap count (incremented on each wrap, persisted)
uint32_t phase; // toggled on wrap
// ...
};The cycle field is already used internally -- cache_bytes_used() checks cycle to determine whether the stripe has ever wrapped (cycle 0 means it hasn't filled up yet, so bytes used = write_pos - start; otherwise the stripe is full).
The create_time is set when the stripe is initialized and never changes.
Neither value is exposed as a metric.
Proposal
Add two new gauge stats
| Stat | Type | Source | Meaning |
|---|---|---|---|
proxy.process.cache.directory.cycle |
Gauge | header->cycle |
Total historical wrap count (persists across restarts) |
proxy.process.cache.directory.create_time |
Gauge | header->create_time |
Stripe creation time (epoch seconds) |
Per-volume variants:
proxy.process.cache.volume_N.directory.cycleproxy.process.cache.volume_N.directory.create_time
Keep existing wrap_count unchanged
The existing wrap_count counter is still useful for rate-based alerting (wraps/hour). The new gauges serve a different purpose -- persistent state inspection.
Update in CachePeriodicMetricsUpdate()
The existing periodic update function already iterates all stripes every ~5 seconds to compute bytes_used. Adding reads of cycle and create_time from the directory header is trivial -- just a few more lines in the same loop.
For per-volume aggregation:
cycle: sum across stripes in the volumecreate_time: minimum across stripes (oldest creation time)
Improve the wrap log message
Enhance the Note() in agg_wrap() to include the cycle count and stripe age:
Cache volume 1 on disk '/dev/sda' wraps around (cycle 47, created 2025-01-15 08:30:00)
This makes log-based analysis much easier without needing to cross-reference metrics.
Implementation scope
Three files need changes:
src/iocore/cache/P_CacheStats.h-- adddirectory_cycleanddirectory_create_timegauge fieldssrc/iocore/cache/CacheProcessor.cc-- register new stats, extendCachePeriodicMetricsUpdate()src/iocore/cache/StripeSM.cc-- improveNote()log message inagg_wrap()
No on-disk format changes. No new config knobs. No behavioral changes to the cache itself.