Skip to content

[API Proposal]: APIs to drive a method's tiered-compilation promotion to its final tier #129020

@timcassell

Description

@timcassell

Background and motivation

Add a managed, in-process, method-scoped API that lets a caller drive and observe a single method's progression through tiered compilation to its final optimization tier — deterministically, losslessly, and without any ETW/EventPipe session. The headline surface is an enumerable that yields the call schedule needed to promote the method; it is backed by lower-level primitives (a status poll, a remaining-count query, and a ready-state wait) it can be composed from, plus a process-wide quiescence query for warming methods the caller can't name (callees).

The need arises because in-process diagnostic consumers (an EventListener on Microsoft-Windows-DotNETRuntime) have no prompt, method-specific, lossless signal of where a method is on the tiering ladder. The only per-method tier signals are events, insufficient in four compounding ways.

Why today's events are insufficient

1. The decision point is silent. When a method's call count crosses the threshold, CallCountingManager::OnCallCountThresholdReached only flags it PendingCompletion and calls AsyncCompleteCallCounting() — it emits no ETW/EventPipe event. (src/coreclr/vm/callcounting.cpp) The only per-method tier events are:

Event Fires when Carries tier? Problem
MethodJittingStarted (V1) tier-up compile begins ❌ (no MethodFlags) Can't tell a tier0 compile-start from a tier-up compile-start
MethodLoadVerbose tier-up compile completes / is published ✅ (MethodFlags bits [7..9]) Latest possible point; gated behind the whole background queue

2. Publication lags the decision by an unbounded, unrelated amount. Optimization is a single-threaded FIFO queue: methods are enqueued via InsertTail and drained one at a time on a single background worker (GetNextMethodToOptimizeRemoveHeadOptimizeMethod). (src/coreclr/vm/tieredcompilation.cpp) So a method's MethodLoadVerbose does not fire until every method ahead of it has compiled. The only events in that window are batch-level TieredCompilationBackgroundJitStart/Stop, which carry just a count — no method identity.

3. Event delivery is lossy by design. An in-proc EventListener consuming the native runtime events runs an EventPipe session: CLR threads write to bounded, circular buffers, drained asynchronously by a dispatcher thread. The writer never blocks (blocking app threads to deliver diagnostics is unacceptable), so under buffer pressure events are dropped, not backpressured — best-effort delivery, with dropped counts recorded only at sequence points. The danger window is process startup, exactly when warmup runs and when MethodLoadVerbose (one per jitted method, thousands at startup) floods the same buffers a consumer needs its one publication to survive in.

4. The whole mechanism can be turned off. It all depends on EventSource being enabled. The System.Diagnostics.Tracing.EventSource.IsSupported feature switch — set false for trimming/size, or to avoid EventSource overhead — makes EventSource a no-op, so an in-proc EventListener receives nothing at all and the consumer is left with no signal, falling back to the very fixed delay this is meant to replace. A direct runtime API has no such dependency on the diagnostics infrastructure being enabled.

Net effect: a consumer that wants to know "has method M reached its final tier yet?" must await MethodLoadVerbose, whose arrival lags the runtime's decision by queue depth and unrelated compile times, and may simply never arrive — or, if EventSource is disabled, cannot observe at all. There is no signal at the decision point, and no way to poll ground truth.

Concrete use case: BenchmarkDotNet

BenchmarkDotNet is replacing its fixed ~250 ms per-tier Thread.Sleep warmup with event-driven tier-up detection (an in-process EventListener), so the JIT-warmup stage proceeds the instant the benchmark method reaches its final tier instead of sleeping a conservative fixed amount (dotnet/BenchmarkDotNet#3169). BenchmarkDotNet is the call driver — it invokes the workload in a loop, which is what accumulates the call count that drives promotion. But with only the events above, it must reconstruct the tiering state it is itself causing:

  • read JitInfo.TieredCallCountThreshold and model the per-tier call budget, the MaxTierPromotions count, and the final-tier set {MinOptJitted, Optimized, OptimizedTier1} to know when it's done;
  • detect that call-counting is even active from the TieredCompilationPause/Resume bracket, because invocations issued during the call-counting delay aren't counted (there is no per-method "counting started" signal);
  • watch MethodLoadVerbose for each tier publication, decoding MethodFlags and ignoring OptimizedTier1OSR (OSR fires off a loop back-edge counter, is never the entry-point version, and is never call-counted — so it's not a step on the call-count ladder);
  • when a burst doesn't publish in time, issue extra nudge invocations one at a time, and absorb the ~10 ms async event-delivery lag after each;
  • because the publication can lag arbitrarily behind unrelated background traffic, fall back to a quiescence proxy: treat TieredCompilationBackgroundJitStart/Stop with PendingMethodCount == 0 as "the queue drained, so if my method was going to publish, it has." This proxy is batch-level and unattributable — it can't tell whether a batch contains our method or unrelated methods — so it over-waits on unrelated traffic and races when the method enqueues just after a drain;
  • and, because any of these waits can hang on a dropped event, cap each otherwise-correct infinite wait with a watchdog timeout — reintroducing the very non-determinism the effort set out to remove.

Absent a decision signal, the tool also cannot distinguish "the promotion is still coming" from "it is never coming" (already final / not enough calls / counting paused) without a timeout.

The proposed API replaces all of it (see API Usage):

Gap (with events only) Closed by
Silent decision point; publication lags by unrelated queue depth MoveNext awaits this method's transition, runtime-signaled
Lossy delivery (dropped publication ⇒ missed/hung) runtime-internal synchronization — no buffers to drop
"Is counting active?" inferred from Pause/Resume the wait readies counting before yielding
OSR / instrumented tiers pollute the ladder entry-point semantics; only call-count promotions are yielded
"Still coming vs never coming?" needs a timeout ended sequence (and Optimizing vs Final)
Tier-set / MaxTierPromotions / threshold modeling the yielded count + sequence termination
Untracked-callee warming PendingTieredCompilationCount

API Proposal

namespace System.Runtime.CompilerServices;

public enum MethodCompilationStatus
{
    Cold,       // no native code for the active entry-point version yet
    Optimizing, // compiled at a non-terminal tier; a further promotion WILL come (tier0 / instrumented-PGO)
    Final,      // terminal tier for the active entry-point version; no further promotion is coming
}

public static partial class RuntimeHelpers
{
    // HEADLINE: the call schedule to drive a method to its final tier. Yields the ADDITIONAL invocations needed for
    // each successive promotion (a cold method yields 1 first, for cold -> tier0); the sequence ends when the method
    // is final. Lossless (runtime-signaled, not event-based) and side-effect-free apart from the awaiting.
    public static IAsyncEnumerable<int> EnumerateTierPromotionCallCountsAsync(RuntimeMethodHandle method, CancellationToken cancellationToken = default);

    // LOWER-LEVEL PRIMITIVES — the enumerable can be composed from these.

    // Lossless, side-effect-free poll of the active entry-point code version. Never triggers jitting or counting.
    public static MethodCompilationStatus GetMethodCompilationStatus(RuntimeMethodHandle method);

    // Additional invocations to reach the next code version: 1 if cold (cold -> tier0), the remaining count to the
    // next tier's threshold otherwise, or 0 if already final / not counted.
    public static int GetCallsToNextTierPromotion(RuntimeMethodHandle method);

    // Completes when the method is in a state the caller can act on: EITHER it has reached its final tier, OR it is
    // ready to count toward the next promotion (any tier-up already triggered has published AND the call-counting
    // stub for the current tier is installed). Waits for a QUIESCENT, READY state — never for a promotion to occur —
    // so a caller that under-invokes is never deadlocked. Bridges the background-compilation lag and the call-counting
    // delay (TC_CallCountingDelayMs); replaces the MethodLoadVerbose publication wait and Pause/Resume inference.
    public static ValueTask WaitForFinalCompilationOrCallCountingReadyAsync(RuntimeMethodHandle method, CancellationToken cancellationToken = default);

    // COMPANION: process-wide count of methods queued for / undergoing background tiered (re)compilation. 0 == idle.
    // Lossless replacement for the batch-level BackgroundJitStart/Stop quiescence proxy; for warming untracked callees.
    public static int PendingTieredCompilationCount { get; }
}

Semantics of EnumerateTierPromotionCallCountsAsync:

  • Element value: the number of additional invocations from now needed to cross the next tier's call-count threshold. The caller invokes that many (more is harmless; fewer just doesn't promote).
  • No precondition; cold starts in the sequence. A method never called (no native code yet) yields 1 first — a single invoke compiles it cold → tier0 — then proceeds with the higher-tier counts. The caller never has to pre-jit or special-case a cold method.
  • Termination = final tier. The sequence ends when the active entry-point code version is at a tier from which no further promotion will occur. A method that never tiers (NoOptimization, AggressiveOptimization compiled straight to Optimized, R2R-pinned, a non-tiered runtime) yields just its single cold-compile count and then ends — "nothing more to wait for," for free.
  • Entry-point semantics. Counts and termination track the active entry-point version (what fresh calls dispatch to). OSR versions are never surfaced — they aren't on the call-count ladder.
  • Cancellation via the token / [EnumeratorCancellation].
  • Generics: per instantiation — the handle identifies the instantiated method (same shape as RuntimeHelpers.PrepareMethod's instantiation overloads).

A status enum (rather than the raw tier or MethodFlags bitfield) is deliberate: a stable abstraction giving the only distinction a consumer needs — transitional vs terminal — without ossifying CoreCLR's internal tier numbering, and Optimizing vs Final directly answers "is a promotion still coming?".

API Usage

await foreach (int callsToNextTier in RuntimeHelpers.EnumerateTierPromotionCallCountsAsync(method, cancellationToken))
{
    for (int i = 0; i < callsToNextTier; i++)
        InvokeWorkload();
}
// the loop ending IS "the method reached its final tier"

No keyword firehose paid during measurement, no tier arithmetic, no Pause/Resume gating, no OSR filtering, no nudge heuristic, no quiescence proxy for the method, and no infinite waits to cap — all of it folds into MoveNext, losslessly. The yielded count is also better than the threshold knob: it's computed fresh against the calls already accumulated on the persistent per-tier counter.

The enumerable is exactly the lower-level primitives composed:

while (true)
{
    await RuntimeHelpers.WaitForFinalCompilationOrCallCountingReadyAsync(m, ct);
    if (RuntimeHelpers.GetMethodCompilationStatus(m) == MethodCompilationStatus.Final) break;
    yield return RuntimeHelpers.GetCallsToNextTierPromotion(m);
    // caller invokes the yielded count
}

A cold method drains nothing, isn't Final, arms nothing, and yields 1; a caller that under-invokes leaves nothing pending and just gets the recomputed count again next iteration instead of hanging — the wait settles into a ready state rather than awaiting a promotion, so it can't deadlock.

Warming callees the caller can't name, after the driven method is final:

for (int i = 0; i < JitInfo.TieredCallCountThreshold; i++)
{
    InvokeWorkload();
}
while (RuntimeHelpers.PendingTieredCompilationCount > 0)
    await Task.Delay(TimeSpan.FromMilliseconds(10), ct);

Alternative Designs

  • MethodJittingQueued event (carrying method identity + target tier). Emit an event at enqueue time (e.g. at OnCallCountThresholdReached / AsyncPromoteToTier1) with a payload mirroring MethodLoadVerbose plus the tier in MethodFlags. This is the smallest event-shaped fix and would close the silent-decision gap and the still-coming-vs-never ambiguity. But it is still lossy (EventPipe drop), still requires the consumer to model the tier ladder and counting activation, and still leaves callee warming to the batch proxy. It's the right thing to do if the API route is rejected, and the enumerable/primitives could even be implemented partly on top of it — but on its own it leaves the consumer stitching lossy pieces together.

  • Add MethodFlags (tier) to MethodJittingStarted. Even smaller; disambiguates tier0 compile-start from a tier-up compile-start. But MethodJittingStarted fires only when the worker dequeues the method — after everything ahead of it compiled — so it doesn't close the latency gap, and it's equally lossy.

  • Use TieredCompilationBackgroundJitStart/Stop. Insufficient as a per-method signal: batch-level, count-only, no method identity, coalesced. (Proposed above only as a global quiescence query, which is all it can honestly support.)

  • Poll MethodLoadVerbose, bounded by background-queue quiescence (status quo). Works, but late and non-deterministic: the quiescence proxy is batch-level and unattributable, so it over-waits on unrelated traffic, races on late enqueue, and still needs a timeout to conclude a method will never tier — and a dropped publication degrades it silently.

  • bool RuntimeHelpers.IsMethodFullyOptimized(...) (the existing [API Proposal]: RuntimeHelpers.IsMethodFullyOptimized for benchmarking tools #99870 proposal). A point-in-time poll for the same benchmarking use case. The status enum here is a strict generalization: a bool collapses "still tier0," "in-flight," "instrumented-intermediate," and "terminal-but-minopt" into one value and bakes the runtime's definition of "fully optimized" into the answer, where a consumer often needs its own predicate (e.g. is OptimizedTier1Instrumented "done"? it's still call-counted). A poll alone also can't tell the caller when to look, so it forces a spin — which WaitForFinalCompilationOrCallCountingReadyAsync / EnumerateTierPromotionCallCountsAsync remove.

  • GetCurrentMethodFlags / a raw tier value. Returning the MethodLoadVerbose MethodFlags bitfield (or the raw JitOptimizationTier) would be maximally consistent with the events, and the extra flags (Jitted vs R2R, generic) are independently useful. But a consumer never uses the packed field raw — it decodes (flags >> 7) & 0x7 and tests membership — so the bit-identical "consistency" buys little, while exposing a raw ETW diagnostic bitfield (or raw tier numbers) as a supported API ossifies the entire layout/taxonomy, which is "subject to change" by design. The stable MethodCompilationStatus enum captures the information the consumer actually needs without that commitment.

  • A force-to-tier API (PrepareMethod-style), deliberately not proposed. A "promote this method to its final tier synchronously" call would be more direct than driving invocations, but it changes semantics — it skips call-count/PGO accumulation, doesn't warm callees, and may compile without the profile natural execution would gather — so it's a deliberate non-goal.

Risks

  • The call-count model is CoreCLR-tiering-specific. "Promote a method by invoking it N times, in stages" is the shape of the headline API. It degrades cleanly where inapplicable (NativeAOT → empty enumeration / Final; Mono's interp→JIT tiering maps onto the same enum/sequence), and the abstraction is a plus for cross-runtime portability versus raw CoreCLR tier values — but it is a more opinionated surface than a neutral poll, and API review should weigh that.
  • Ossification. A MethodCompilationStatus enum and a freshly-computed call count are deliberately chosen to avoid pinning internal taxonomy: the enum exposes only transitional-vs-terminal, and the count never commits to a fixed TieredCallCountThreshold constant ("invoke this many now, measured against the current counter"). Exposing raw tiers/MethodFlags instead (see Alternatives) would ossify the layout.

Related:

Disclaimer: preparation of this issue (drafting, structuring, and refining the API shape) was assisted by AI (Claude). The design direction and decisions are the author's; the AI helped articulate and organize them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api-suggestionEarly API idea and discussion, it is NOT ready for implementationarea-TieredCompilation-coreclruntriagedNew issue has not been triaged by the area owner

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions