Skip to content

Heuristic Debounce Algorithm

ZjzMisaka edited this page Mar 23, 2026 · 2 revisions

After a worker finishes executing a task, if there is no new task, it will usually enter the Idle state and be woken up again when a new task arrives.
If tasks are submitted at a very high frequency and are highly fragmented, a "state ping-pong" phenomenon may occur. This leads to a large gap between the benchmark results of high-frequency, fine-grained tasks and those of the .NET ThreadPool.
The purpose of StatusPingPongChecker is to detect and eliminate this "state ping-pong" by using heuristic spinning to reduce the performance overhead caused by frequent state transitions.

Detection and Debounce

  1. Time recording: Each time a worker is about to enter the Idle state, it resets and starts a Stopwatch.
  2. Threshold: The system maintains a dynamic time threshold (initially 50 microseconds).
  3. Trigger condition: When the worker is woken up again and finishes executing the new task, just before it enters Idle again, if it finds that the time elapsed since the last Idle is less than this threshold, it determines that the system is currently in a ping-pong state.
  4. Debounce intervention: When a ping-pong state is detected, the worker does not immediately enter Idle. Instead, it enters a short spinning loop and repeatedly tries to fetch new tasks from the queue.

Heuristic Dynamic Adjustment

To avoid blind spinning when tasks are not dense, the algorithm introduces HitChecker for feedback control.
HitChecker is a ring bitmap with a capacity of 10, which records the results of the last 10 spins:

  • Hit (1): A task was successfully acquired during spinning.
  • Miss (0): Spinning timed out and the worker eventually entered sleep.

After each spin, the algorithm dynamically adjusts the threshold based on the historical hit rate:

  1. High failure rate (MissCount > 2): The algorithm shrinks the time threshold, making the condition for triggering spins more stringent, until it determines that "no ping-pong is occurring" and spinning is no longer triggered.
  2. High success rate (MissCount <= 1): The algorithm increases the time threshold (up to a maximum of 1/2000 second), encouraging workers to use spinning more often to take over tasks and improve throughput.

Most of the numbers that appear in this article actually come from repeatedly running benchmarks and tuning them based on experience, and the chosen values are quite conservative. In fact, if we were more aggressive, the benchmark results would look better. However, I dislike the idea of Workers eventually turning into "always spin for a while before going idle." That feels inelegant and undermines the heuristic’s purpose.

A large portion of tasks are coarse-grained, and for PTP, a thread pool that advertises "control" as its selling point—this is even more true for the tasks it receives. Once the lifetime of an instance is over, it should simply end; after that, it should be settled and retired. Keeping it in a half-alive limbo right before retirement is rather ugly and is also the source of some bugs.

Clone this wiki locally