The telemetry module uses Exponential Moving Average (EMA) for P50/P95/P99 percentile estimation. While lightweight, EMA is order-dependent: the same data processed in different order produces different percentile estimates.
Proposed fix
Implement reservoir sampling: maintain a sorted buffer of the last 100-200 latency measurements per node. This ensures order-invariant percentile calculation while still being memory-efficient.
Current code
src/telemetry.rs in update_percentiles:
meta.p50_ms = meta.p50_ms * 0.9 + latency_ms * 0.1;
Acceptance
- Percentile values are invariant to trace file ordering
- Memory overhead stays under ~1KB per active node
- No regression in existing telemetry tests
The telemetry module uses Exponential Moving Average (EMA) for P50/P95/P99 percentile estimation. While lightweight, EMA is order-dependent: the same data processed in different order produces different percentile estimates.
Proposed fix
Implement reservoir sampling: maintain a sorted buffer of the last 100-200 latency measurements per node. This ensures order-invariant percentile calculation while still being memory-efficient.
Current code
src/telemetry.rsinupdate_percentiles:Acceptance