You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a managed, in-process, method-scoped API that lets a caller drive and observe a single method's progression through tiered compilation to its final optimization tier — deterministically, losslessly, and without any ETW/EventPipe session. The headline surface is an enumerable that yields the call schedule needed to promote the method; it is backed by lower-level primitives (a status poll, a remaining-count query, and a ready-state wait) it can be composed from, plus a process-wide quiescence query for warming methods the caller can't name (callees).
The need arises because in-process diagnostic consumers (an EventListener on Microsoft-Windows-DotNETRuntime) have no prompt, method-specific, lossless signal of where a method is on the tiering ladder. The only per-method tier signals are events, insufficient in four compounding ways.
Why today's events are insufficient
1. The decision point is silent. When a method's call count crosses the threshold, CallCountingManager::OnCallCountThresholdReached only flags it PendingCompletion and calls AsyncCompleteCallCounting() — it emits no ETW/EventPipe event. (src/coreclr/vm/callcounting.cpp) The only per-method tier events are:
Event
Fires when
Carries tier?
Problem
MethodJittingStarted (V1)
tier-up compile begins
❌ (no MethodFlags)
Can't tell a tier0 compile-start from a tier-up compile-start
MethodLoadVerbose
tier-up compile completes / is published
✅ (MethodFlags bits [7..9])
Latest possible point; gated behind the whole background queue
2. Publication lags the decision by an unbounded, unrelated amount. Optimization is a single-threaded FIFO queue: methods are enqueued via InsertTail and drained one at a time on a single background worker (GetNextMethodToOptimize → RemoveHead → OptimizeMethod). (src/coreclr/vm/tieredcompilation.cpp) So a method's MethodLoadVerbose does not fire until every method ahead of it has compiled. The only events in that window are batch-level TieredCompilationBackgroundJitStart/Stop, which carry just a count — no method identity.
3. Event delivery is lossy by design. An in-proc EventListener consuming the native runtime events runs an EventPipe session: CLR threads write to bounded, circular buffers, drained asynchronously by a dispatcher thread. The writer never blocks (blocking app threads to deliver diagnostics is unacceptable), so under buffer pressure events are dropped, not backpressured — best-effort delivery, with dropped counts recorded only at sequence points. The danger window is process startup, exactly when warmup runs and when MethodLoadVerbose (one per jitted method, thousands at startup) floods the same buffers a consumer needs its one publication to survive in.
4. The whole mechanism can be turned off. It all depends on EventSource being enabled. The System.Diagnostics.Tracing.EventSource.IsSupported feature switch — set false for trimming/size, or to avoid EventSource overhead — makes EventSource a no-op, so an in-proc EventListener receives nothing at all and the consumer is left with no signal, falling back to the very fixed delay this is meant to replace. A direct runtime API has no such dependency on the diagnostics infrastructure being enabled.
Net effect: a consumer that wants to know "has method M reached its final tier yet?" must await MethodLoadVerbose, whose arrival lags the runtime's decision by queue depth and unrelated compile times, and may simply never arrive — or, if EventSource is disabled, cannot observe at all. There is no signal at the decision point, and no way to poll ground truth.
Concrete use case: BenchmarkDotNet
BenchmarkDotNet is replacing its fixed ~250 ms per-tier Thread.Sleep warmup with event-driven tier-up detection (an in-process EventListener), so the JIT-warmup stage proceeds the instant the benchmark method reaches its final tier instead of sleeping a conservative fixed amount (dotnet/BenchmarkDotNet#3169). BenchmarkDotNet is the call driver — it invokes the workload in a loop, which is what accumulates the call count that drives promotion. But with only the events above, it must reconstruct the tiering state it is itself causing:
read JitInfo.TieredCallCountThreshold and model the per-tier call budget, the MaxTierPromotions count, and the final-tier set {MinOptJitted, Optimized, OptimizedTier1} to know when it's done;
detect that call-counting is even active from the TieredCompilationPause/Resume bracket, because invocations issued during the call-counting delay aren't counted (there is no per-method "counting started" signal);
watch MethodLoadVerbose for each tier publication, decoding MethodFlags and ignoring OptimizedTier1OSR (OSR fires off a loop back-edge counter, is never the entry-point version, and is never call-counted — so it's not a step on the call-count ladder);
when a burst doesn't publish in time, issue extra nudge invocations one at a time, and absorb the ~10 ms async event-delivery lag after each;
because the publication can lag arbitrarily behind unrelated background traffic, fall back to a quiescence proxy: treat TieredCompilationBackgroundJitStart/Stop with PendingMethodCount == 0 as "the queue drained, so if my method was going to publish, it has." This proxy is batch-level and unattributable — it can't tell whether a batch contains our method or unrelated methods — so it over-waits on unrelated traffic and races when the method enqueues just after a drain;
and, because any of these waits can hang on a dropped event, cap each otherwise-correct infinite wait with a watchdog timeout — reintroducing the very non-determinism the effort set out to remove.
Absent a decision signal, the tool also cannot distinguish "the promotion is still coming" from "it is never coming" (already final / not enough calls / counting paused) without a timeout.
The proposed API replaces all of it (see API Usage):
Gap (with events only)
Closed by
Silent decision point; publication lags by unrelated queue depth
MoveNext awaits this method's transition, runtime-signaled
runtime-internal synchronization — no buffers to drop
"Is counting active?" inferred from Pause/Resume
the wait readies counting before yielding
OSR / instrumented tiers pollute the ladder
entry-point semantics; only call-count promotions are yielded
"Still coming vs never coming?" needs a timeout
ended sequence (and Optimizing vs Final)
Tier-set / MaxTierPromotions / threshold modeling
the yielded count + sequence termination
Untracked-callee warming
PendingTieredCompilationCount
API Proposal
namespaceSystem.Runtime.CompilerServices;publicenumMethodCompilationStatus{Cold,// no native code for the active entry-point version yetOptimizing,// compiled at a non-terminal tier; a further promotion WILL come (tier0 / instrumented-PGO)Final,// terminal tier for the active entry-point version; no further promotion is coming}publicstaticpartialclassRuntimeHelpers{// HEADLINE: the call schedule to drive a method to its final tier. Yields the ADDITIONAL invocations needed for// each successive promotion (a cold method yields 1 first, for cold -> tier0); the sequence ends when the method// is final. Lossless (runtime-signaled, not event-based) and side-effect-free apart from the awaiting.publicstaticIAsyncEnumerable<int>EnumerateTierPromotionCallCountsAsync(RuntimeMethodHandlemethod,CancellationTokencancellationToken=default);// LOWER-LEVEL PRIMITIVES — the enumerable can be composed from these.// Lossless, side-effect-free poll of the active entry-point code version. Never triggers jitting or counting.publicstaticMethodCompilationStatusGetMethodCompilationStatus(RuntimeMethodHandlemethod);// Additional invocations to reach the next code version: 1 if cold (cold -> tier0), the remaining count to the// next tier's threshold otherwise, or 0 if already final / not counted.publicstaticintGetCallsToNextTierPromotion(RuntimeMethodHandlemethod);// Completes when the method is in a state the caller can act on: EITHER it has reached its final tier, OR it is// ready to count toward the next promotion (any tier-up already triggered has published AND the call-counting// stub for the current tier is installed). Waits for a QUIESCENT, READY state — never for a promotion to occur —// so a caller that under-invokes is never deadlocked. Bridges the background-compilation lag and the call-counting// delay (TC_CallCountingDelayMs); replaces the MethodLoadVerbose publication wait and Pause/Resume inference.publicstaticValueTaskWaitForFinalCompilationOrCallCountingReadyAsync(RuntimeMethodHandlemethod,CancellationTokencancellationToken=default);// COMPANION: process-wide count of methods queued for / undergoing background tiered (re)compilation. 0 == idle.// Lossless replacement for the batch-level BackgroundJitStart/Stop quiescence proxy; for warming untracked callees.publicstaticintPendingTieredCompilationCount{get;}}
Semantics of EnumerateTierPromotionCallCountsAsync:
Element value: the number of additional invocations from now needed to cross the next tier's call-count threshold. The caller invokes that many (more is harmless; fewer just doesn't promote).
No precondition; cold starts in the sequence. A method never called (no native code yet) yields 1 first — a single invoke compiles it cold → tier0 — then proceeds with the higher-tier counts. The caller never has to pre-jit or special-case a cold method.
Termination = final tier. The sequence ends when the active entry-point code version is at a tier from which no further promotion will occur. A method that never tiers (NoOptimization, AggressiveOptimization compiled straight to Optimized, R2R-pinned, a non-tiered runtime) yields just its single cold-compile count and then ends — "nothing more to wait for," for free.
Entry-point semantics. Counts and termination track the active entry-point version (what fresh calls dispatch to). OSR versions are never surfaced — they aren't on the call-count ladder.
Cancellation via the token / [EnumeratorCancellation].
Generics: per instantiation — the handle identifies the instantiated method (same shape as RuntimeHelpers.PrepareMethod's instantiation overloads).
A status enum (rather than the raw tier or MethodFlags bitfield) is deliberate: a stable abstraction giving the only distinction a consumer needs — transitional vs terminal — without ossifying CoreCLR's internal tier numbering, and Optimizing vs Final directly answers "is a promotion still coming?".
API Usage
awaitforeach(intcallsToNextTierinRuntimeHelpers.EnumerateTierPromotionCallCountsAsync(method,cancellationToken)){for(inti=0;i<callsToNextTier;i++)InvokeWorkload();}// the loop ending IS "the method reached its final tier"
No keyword firehose paid during measurement, no tier arithmetic, no Pause/Resume gating, no OSR filtering, no nudge heuristic, no quiescence proxy for the method, and no infinite waits to cap — all of it folds into MoveNext, losslessly. The yielded count is also better than the threshold knob: it's computed fresh against the calls already accumulated on the persistent per-tier counter.
The enumerable is exactly the lower-level primitives composed:
while(true){awaitRuntimeHelpers.WaitForFinalCompilationOrCallCountingReadyAsync(m,ct);if(RuntimeHelpers.GetMethodCompilationStatus(m)==MethodCompilationStatus.Final)break;yieldreturnRuntimeHelpers.GetCallsToNextTierPromotion(m);// caller invokes the yielded count}
A cold method drains nothing, isn't Final, arms nothing, and yields 1; a caller that under-invokes leaves nothing pending and just gets the recomputed count again next iteration instead of hanging — the wait settles into a ready state rather than awaiting a promotion, so it can't deadlock.
Warming callees the caller can't name, after the driven method is final:
MethodJittingQueued event (carrying method identity + target tier). Emit an event at enqueue time (e.g. at OnCallCountThresholdReached / AsyncPromoteToTier1) with a payload mirroring MethodLoadVerbose plus the tier in MethodFlags. This is the smallest event-shaped fix and would close the silent-decision gap and the still-coming-vs-never ambiguity. But it is still lossy (EventPipe drop), still requires the consumer to model the tier ladder and counting activation, and still leaves callee warming to the batch proxy. It's the right thing to do if the API route is rejected, and the enumerable/primitives could even be implemented partly on top of it — but on its own it leaves the consumer stitching lossy pieces together.
Add MethodFlags (tier) to MethodJittingStarted. Even smaller; disambiguates tier0 compile-start from a tier-up compile-start. But MethodJittingStarted fires only when the worker dequeues the method — after everything ahead of it compiled — so it doesn't close the latency gap, and it's equally lossy.
Use TieredCompilationBackgroundJitStart/Stop. Insufficient as a per-method signal: batch-level, count-only, no method identity, coalesced. (Proposed above only as a global quiescence query, which is all it can honestly support.)
Poll MethodLoadVerbose, bounded by background-queue quiescence (status quo). Works, but late and non-deterministic: the quiescence proxy is batch-level and unattributable, so it over-waits on unrelated traffic, races on late enqueue, and still needs a timeout to conclude a method will never tier — and a dropped publication degrades it silently.
bool RuntimeHelpers.IsMethodFullyOptimized(...) (the existing [API Proposal]: RuntimeHelpers.IsMethodFullyOptimized for benchmarking tools #99870 proposal). A point-in-time poll for the same benchmarking use case. The status enum here is a strict generalization: a bool collapses "still tier0," "in-flight," "instrumented-intermediate," and "terminal-but-minopt" into one value and bakes the runtime's definition of "fully optimized" into the answer, where a consumer often needs its own predicate (e.g. is OptimizedTier1Instrumented "done"? it's still call-counted). A poll alone also can't tell the caller when to look, so it forces a spin — which WaitForFinalCompilationOrCallCountingReadyAsync / EnumerateTierPromotionCallCountsAsync remove.
GetCurrentMethodFlags / a raw tier value. Returning the MethodLoadVerboseMethodFlags bitfield (or the raw JitOptimizationTier) would be maximally consistent with the events, and the extra flags (Jitted vs R2R, generic) are independently useful. But a consumer never uses the packed field raw — it decodes (flags >> 7) & 0x7 and tests membership — so the bit-identical "consistency" buys little, while exposing a raw ETW diagnostic bitfield (or raw tier numbers) as a supported API ossifies the entire layout/taxonomy, which is "subject to change" by design. The stable MethodCompilationStatus enum captures the information the consumer actually needs without that commitment.
A force-to-tier API (PrepareMethod-style), deliberately not proposed. A "promote this method to its final tier synchronously" call would be more direct than driving invocations, but it changes semantics — it skips call-count/PGO accumulation, doesn't warm callees, and may compile without the profile natural execution would gather — so it's a deliberate non-goal.
Risks
The call-count model is CoreCLR-tiering-specific. "Promote a method by invoking it N times, in stages" is the shape of the headline API. It degrades cleanly where inapplicable (NativeAOT → empty enumeration / Final; Mono's interp→JIT tiering maps onto the same enum/sequence), and the abstraction is a plus for cross-runtime portability versus raw CoreCLR tier values — but it is a more opinionated surface than a neutral poll, and API review should weigh that.
Ossification. A MethodCompilationStatus enum and a freshly-computed call count are deliberately chosen to avoid pinning internal taxonomy: the enum exposes only transitional-vs-terminal, and the count never commits to a fixed TieredCallCountThreshold constant ("invoke this many now, measured against the current counter"). Exposing raw tiers/MethodFlags instead (see Alternatives) would ossify the layout.
dotnet/BenchmarkDotNet#3169 — the consuming implementation: the event-listener-based JIT-warmup stage this API would replace.
Disclaimer: preparation of this issue (drafting, structuring, and refining the API shape) was assisted by AI (Claude). The design direction and decisions are the author's; the AI helped articulate and organize them.
Background and motivation
Add a managed, in-process, method-scoped API that lets a caller drive and observe a single method's progression through tiered compilation to its final optimization tier — deterministically, losslessly, and without any ETW/EventPipe session. The headline surface is an enumerable that yields the call schedule needed to promote the method; it is backed by lower-level primitives (a status poll, a remaining-count query, and a ready-state wait) it can be composed from, plus a process-wide quiescence query for warming methods the caller can't name (callees).
The need arises because in-process diagnostic consumers (an
EventListeneronMicrosoft-Windows-DotNETRuntime) have no prompt, method-specific, lossless signal of where a method is on the tiering ladder. The only per-method tier signals are events, insufficient in four compounding ways.Why today's events are insufficient
1. The decision point is silent. When a method's call count crosses the threshold,
CallCountingManager::OnCallCountThresholdReachedonly flags itPendingCompletionand callsAsyncCompleteCallCounting()— it emits no ETW/EventPipe event. (src/coreclr/vm/callcounting.cpp) The only per-method tier events are:MethodJittingStarted(V1)MethodFlags)MethodLoadVerboseMethodFlagsbits [7..9])2. Publication lags the decision by an unbounded, unrelated amount. Optimization is a single-threaded FIFO queue: methods are enqueued via
InsertTailand drained one at a time on a single background worker (GetNextMethodToOptimize→RemoveHead→OptimizeMethod). (src/coreclr/vm/tieredcompilation.cpp) So a method'sMethodLoadVerbosedoes not fire until every method ahead of it has compiled. The only events in that window are batch-levelTieredCompilationBackgroundJitStart/Stop, which carry just a count — no method identity.3. Event delivery is lossy by design. An in-proc
EventListenerconsuming the native runtime events runs an EventPipe session: CLR threads write to bounded, circular buffers, drained asynchronously by a dispatcher thread. The writer never blocks (blocking app threads to deliver diagnostics is unacceptable), so under buffer pressure events are dropped, not backpressured — best-effort delivery, with dropped counts recorded only at sequence points. The danger window is process startup, exactly when warmup runs and whenMethodLoadVerbose(one per jitted method, thousands at startup) floods the same buffers a consumer needs its one publication to survive in.4. The whole mechanism can be turned off. It all depends on
EventSourcebeing enabled. TheSystem.Diagnostics.Tracing.EventSource.IsSupportedfeature switch — setfalsefor trimming/size, or to avoidEventSourceoverhead — makesEventSourcea no-op, so an in-procEventListenerreceives nothing at all and the consumer is left with no signal, falling back to the very fixed delay this is meant to replace. A direct runtime API has no such dependency on the diagnostics infrastructure being enabled.Net effect: a consumer that wants to know "has method M reached its final tier yet?" must await
MethodLoadVerbose, whose arrival lags the runtime's decision by queue depth and unrelated compile times, and may simply never arrive — or, ifEventSourceis disabled, cannot observe at all. There is no signal at the decision point, and no way to poll ground truth.Concrete use case: BenchmarkDotNet
BenchmarkDotNet is replacing its fixed
~250 msper-tierThread.Sleepwarmup with event-driven tier-up detection (an in-processEventListener), so the JIT-warmup stage proceeds the instant the benchmark method reaches its final tier instead of sleeping a conservative fixed amount (dotnet/BenchmarkDotNet#3169). BenchmarkDotNet is the call driver — it invokes the workload in a loop, which is what accumulates the call count that drives promotion. But with only the events above, it must reconstruct the tiering state it is itself causing:JitInfo.TieredCallCountThresholdand model the per-tier call budget, theMaxTierPromotionscount, and the final-tier set{MinOptJitted, Optimized, OptimizedTier1}to know when it's done;TieredCompilationPause/Resumebracket, because invocations issued during the call-counting delay aren't counted (there is no per-method "counting started" signal);MethodLoadVerbosefor each tier publication, decodingMethodFlagsand ignoringOptimizedTier1OSR(OSR fires off a loop back-edge counter, is never the entry-point version, and is never call-counted — so it's not a step on the call-count ladder);TieredCompilationBackgroundJitStart/StopwithPendingMethodCount == 0as "the queue drained, so if my method was going to publish, it has." This proxy is batch-level and unattributable — it can't tell whether a batch contains our method or unrelated methods — so it over-waits on unrelated traffic and races when the method enqueues just after a drain;Absent a decision signal, the tool also cannot distinguish "the promotion is still coming" from "it is never coming" (already final / not enough calls / counting paused) without a timeout.
The proposed API replaces all of it (see API Usage):
MoveNextawaits this method's transition, runtime-signaledOptimizingvsFinal)MaxTierPromotions/ threshold modelingPendingTieredCompilationCountAPI Proposal
Semantics of
EnumerateTierPromotionCallCountsAsync:[EnumeratorCancellation].RuntimeHelpers.PrepareMethod's instantiation overloads).A status enum (rather than the raw tier or
MethodFlagsbitfield) is deliberate: a stable abstraction giving the only distinction a consumer needs — transitional vs terminal — without ossifying CoreCLR's internal tier numbering, andOptimizingvsFinaldirectly answers "is a promotion still coming?".API Usage
No keyword firehose paid during measurement, no tier arithmetic, no Pause/Resume gating, no OSR filtering, no nudge heuristic, no quiescence proxy for the method, and no infinite waits to cap — all of it folds into
MoveNext, losslessly. The yielded count is also better than the threshold knob: it's computed fresh against the calls already accumulated on the persistent per-tier counter.The enumerable is exactly the lower-level primitives composed:
A cold method drains nothing, isn't
Final, arms nothing, and yields1; a caller that under-invokes leaves nothing pending and just gets the recomputed count again next iteration instead of hanging — the wait settles into a ready state rather than awaiting a promotion, so it can't deadlock.Warming callees the caller can't name, after the driven method is final:
Alternative Designs
MethodJittingQueuedevent (carrying method identity + target tier). Emit an event at enqueue time (e.g. atOnCallCountThresholdReached/AsyncPromoteToTier1) with a payload mirroringMethodLoadVerboseplus the tier inMethodFlags. This is the smallest event-shaped fix and would close the silent-decision gap and the still-coming-vs-never ambiguity. But it is still lossy (EventPipe drop), still requires the consumer to model the tier ladder and counting activation, and still leaves callee warming to the batch proxy. It's the right thing to do if the API route is rejected, and the enumerable/primitives could even be implemented partly on top of it — but on its own it leaves the consumer stitching lossy pieces together.Add
MethodFlags(tier) toMethodJittingStarted. Even smaller; disambiguates tier0 compile-start from a tier-up compile-start. ButMethodJittingStartedfires only when the worker dequeues the method — after everything ahead of it compiled — so it doesn't close the latency gap, and it's equally lossy.Use
TieredCompilationBackgroundJitStart/Stop. Insufficient as a per-method signal: batch-level, count-only, no method identity, coalesced. (Proposed above only as a global quiescence query, which is all it can honestly support.)Poll
MethodLoadVerbose, bounded by background-queue quiescence (status quo). Works, but late and non-deterministic: the quiescence proxy is batch-level and unattributable, so it over-waits on unrelated traffic, races on late enqueue, and still needs a timeout to conclude a method will never tier — and a dropped publication degrades it silently.bool RuntimeHelpers.IsMethodFullyOptimized(...)(the existing [API Proposal]:RuntimeHelpers.IsMethodFullyOptimizedfor benchmarking tools #99870 proposal). A point-in-time poll for the same benchmarking use case. The status enum here is a strict generalization: a bool collapses "still tier0," "in-flight," "instrumented-intermediate," and "terminal-but-minopt" into one value and bakes the runtime's definition of "fully optimized" into the answer, where a consumer often needs its own predicate (e.g. isOptimizedTier1Instrumented"done"? it's still call-counted). A poll alone also can't tell the caller when to look, so it forces a spin — whichWaitForFinalCompilationOrCallCountingReadyAsync/EnumerateTierPromotionCallCountsAsyncremove.GetCurrentMethodFlags/ a raw tier value. Returning theMethodLoadVerboseMethodFlagsbitfield (or the rawJitOptimizationTier) would be maximally consistent with the events, and the extra flags (Jitted vs R2R, generic) are independently useful. But a consumer never uses the packed field raw — it decodes(flags >> 7) & 0x7and tests membership — so the bit-identical "consistency" buys little, while exposing a raw ETW diagnostic bitfield (or raw tier numbers) as a supported API ossifies the entire layout/taxonomy, which is "subject to change" by design. The stableMethodCompilationStatusenum captures the information the consumer actually needs without that commitment.A force-to-tier API (
PrepareMethod-style), deliberately not proposed. A "promote this method to its final tier synchronously" call would be more direct than driving invocations, but it changes semantics — it skips call-count/PGO accumulation, doesn't warm callees, and may compile without the profile natural execution would gather — so it's a deliberate non-goal.Risks
Final; Mono's interp→JIT tiering maps onto the same enum/sequence), and the abstraction is a plus for cross-runtime portability versus raw CoreCLR tier values — but it is a more opinionated surface than a neutral poll, and API review should weigh that.MethodCompilationStatusenum and a freshly-computed call count are deliberately chosen to avoid pinning internal taxonomy: the enum exposes only transitional-vs-terminal, and the count never commits to a fixedTieredCallCountThresholdconstant ("invoke this many now, measured against the current counter"). Exposing raw tiers/MethodFlagsinstead (see Alternatives) would ossify the layout.Related:
RuntimeHelpers.IsMethodFullyOptimizedfor benchmarking tools #99870 (RuntimeHelpers.IsMethodFullyOptimized, same use case — subsumed here as theMethodCompilationStatuspoll primitive).Disclaimer: preparation of this issue (drafting, structuring, and refining the API shape) was assisted by AI (Claude). The design direction and decisions are the author's; the AI helped articulate and organize them.