Adaptive hedged requests for Go. Cut your p99 latency with zero configuration. Based on Google's "The Tail at Scale" paper.
-
Updated
Apr 16, 2026 - Go
Adaptive hedged requests for Go. Cut your p99 latency with zero configuration. Based on Google's "The Tail at Scale" paper.
[ACM SIGCOMM 2024] "m3: Accurate Flow-Level Performance Estimation using Machine Learning" by Chenning Li, Arash Nasr-Esfahany, Kevin Zhao, Kimia Noorbakhsh, Prateesh Goyal, Mohammad Alizadeh, Thomas Anderson.
[TBD] "m4: A Learned Flow-level Network Simulator" by Chenning Li, Anton A. Zabreyko, Om Chabra, Arash Nasr-Esfahany, Kevin Zhao, Prateesh Goyal, Mohammad Alizadeh, Thomas Anderson.
Proof of concept lib for creating finagle-like composable Services/Filters in node.
Simulation of adversarial queueing (high packet loss) events in computer networks.
Reverse proxy which eliminates the tail latency caused by non-deterministic garbage collection
Chasing the long tail of Clojure HTTP servers.
Linux kernel with CTS scheduler enabled.
TailBench (Updated version)
A deployment-oriented study of latency bias in real-time face recognition systems, showing that fairness violations emerge in tail inference latency rather than mean performance, with label-free auditing and mitigation analysis.
WASL: Multi-Module Coordination in Adaptive Multi-Tenant Clouds (ACM/SPEC ICPE'26)
Production-grade AI latency budgeting and reactive scaling framework for LLM inference systems. Covers p50/p95/p99 modeling, SLO design, Kubernetes (K8s) HPA patterns, and distributed AI infrastructure. By Vipin Kumar
Rust toolkit for Tokio tail-latency triage with evidence-ranked suspects and next checks.
Offline, fail-closed verifier for JSONL telemetry event logs. Emits deterministic audit certificates + human summaries with explicit claims/non-claims for bottleneck and integrity review.
A zero-overhead, transparent sidecar for UDP Request Hedging built with eBPF/BCC. Bypasses Python GC and Scheduler jitter by handling retries directly in the SoftIRQ context.
Request hedging for tail latency reduction in distributed systems
Simulation study of cache architecture tradeoffs under concurrency — partitioned vs LRU vs client affinity, with trace-driven evaluation on Twitter cache workloads
Add a description, image, and links to the tail-latency topic page so that developers can more easily learn about it.
To associate your repository with the tail-latency topic, visit your repo's landing page and select "manage topics."