[API Proposal]: APIs to drive a method's tiered-compilation promotion to its final tier

### Background and motivation

Add a managed, **in-process, method-scoped** API that lets a caller drive and observe a single method's progression through tiered compilation to its final optimization tier — **deterministically, losslessly, and without any ETW/EventPipe session**. The headline surface is an enumerable that yields the *call schedule* needed to promote the method; it is backed by lower-level primitives (a status poll, a remaining-count query, and a ready-state wait) it can be composed from, plus a process-wide quiescence query for warming methods the caller can't name (callees).

The need arises because in-process diagnostic consumers (an `EventListener` on `Microsoft-Windows-DotNETRuntime`) have **no prompt, method-specific, lossless signal of where a method is on the tiering ladder**. The only per-method tier signals are events, insufficient in four compounding ways.

### Why today's events are insufficient

**1. The decision point is silent.** When a method's call count crosses the threshold, `CallCountingManager::OnCallCountThresholdReached` only flags it `PendingCompletion` and calls `AsyncCompleteCallCounting()` — it emits no ETW/EventPipe event. (`src/coreclr/vm/callcounting.cpp`) The only per-method tier events are:

| Event | Fires when | Carries tier? | Problem |
|---|---|---|---|
| `MethodJittingStarted` (V1) | tier-up compile **begins** | ❌ (no `MethodFlags`) | Can't tell a tier0 compile-start from a tier-up compile-start |
| `MethodLoadVerbose` | tier-up compile **completes / is published** | ✅ (`MethodFlags` bits [7..9]) | Latest possible point; gated behind the whole background queue |

**2. Publication lags the decision by an unbounded, unrelated amount.** Optimization is a single-threaded **FIFO queue**: methods are enqueued via `InsertTail` and drained one at a time on a single background worker (`GetNextMethodToOptimize` → `RemoveHead` → `OptimizeMethod`). (`src/coreclr/vm/tieredcompilation.cpp`) So a method's `MethodLoadVerbose` does not fire until every method ahead of it has compiled. The only events in that window are batch-level `TieredCompilationBackgroundJitStart`/`Stop`, which carry just a **count** — no method identity.

**3. Event delivery is lossy by design.** An in-proc `EventListener` consuming the native runtime events runs an EventPipe session: CLR threads write to **bounded, circular** buffers, drained asynchronously by a dispatcher thread. The writer never blocks (blocking app threads to deliver diagnostics is unacceptable), so under buffer pressure events are **dropped**, not backpressured — best-effort delivery, with dropped counts recorded only at sequence points. The danger window is process **startup**, exactly when warmup runs and when `MethodLoadVerbose` (one per jitted method, thousands at startup) floods the same buffers a consumer needs its one publication to survive in.

**4. The whole mechanism can be turned off.** It all depends on `EventSource` being enabled. The `System.Diagnostics.Tracing.EventSource.IsSupported` feature switch — set `false` for trimming/size, or to avoid `EventSource` overhead — makes `EventSource` a no-op, so an in-proc `EventListener` receives nothing at all and the consumer is left with no signal, falling back to the very fixed delay this is meant to replace. A direct runtime API has no such dependency on the diagnostics infrastructure being enabled.

Net effect: a consumer that wants to know "**has method M reached its final tier yet?**" must await `MethodLoadVerbose`, whose arrival lags the runtime's *decision* by queue depth and unrelated compile times, **and** may simply never arrive — or, if `EventSource` is disabled, cannot observe at all. There is no signal at the decision point, and no way to poll ground truth.

### Concrete use case: BenchmarkDotNet

BenchmarkDotNet is replacing its fixed `~250 ms` per-tier `Thread.Sleep` warmup with **event-driven tier-up detection** (an in-process `EventListener`), so the JIT-warmup stage proceeds the instant the benchmark method reaches its final tier instead of sleeping a conservative fixed amount ([dotnet/BenchmarkDotNet#3169](https://github.com/dotnet/BenchmarkDotNet/pull/3169)). BenchmarkDotNet *is the call driver* — it invokes the workload in a loop, which is what accumulates the call count that drives promotion. But with only the events above, it must **reconstruct** the tiering state it is itself causing:

- read `JitInfo.TieredCallCountThreshold` and model the per-tier call budget, the `MaxTierPromotions` count, and the final-tier set `{MinOptJitted, Optimized, OptimizedTier1}` to know when it's done;
- detect that call-counting is even active from the `TieredCompilationPause`/`Resume` bracket, because invocations issued during the call-counting delay aren't counted (there is no per-method "counting started" signal);
- watch `MethodLoadVerbose` for each tier publication, decoding `MethodFlags` and **ignoring `OptimizedTier1OSR`** (OSR fires off a loop back-edge counter, is never the entry-point version, and is never call-counted — so it's not a step on the call-count ladder);
- when a burst doesn't publish in time, issue extra **nudge** invocations one at a time, and absorb the ~10 ms async event-delivery lag after each;
- because the publication can lag arbitrarily behind unrelated background traffic, fall back to a **quiescence proxy**: treat `TieredCompilationBackgroundJitStart`/`Stop` with `PendingMethodCount == 0` as "the queue drained, so if my method was going to publish, it has." This proxy is **batch-level and unattributable** — it can't tell whether a batch contains *our* method or unrelated methods — so it over-waits on unrelated traffic and races when the method enqueues just after a drain;
- and, because any of these waits can hang on a *dropped* event, cap each otherwise-correct infinite wait with a watchdog timeout — reintroducing the very non-determinism the effort set out to remove.

Absent a decision signal, the tool also cannot distinguish **"the promotion is still coming" from "it is never coming"** (already final / not enough calls / counting paused) without a timeout.

The proposed API replaces all of it (see **API Usage**):

| Gap (with events only) | Closed by |
|---|---|
| Silent decision point; publication lags by unrelated queue depth | `MoveNext` awaits *this method's* transition, runtime-signaled |
| Lossy delivery (dropped publication ⇒ missed/hung) | runtime-internal synchronization — no buffers to drop |
| "Is counting active?" inferred from Pause/Resume | the wait readies counting before yielding |
| OSR / instrumented tiers pollute the ladder | entry-point semantics; only call-count promotions are yielded |
| "Still coming vs never coming?" needs a timeout | ended sequence (and `Optimizing` vs `Final`) |
| Tier-set / `MaxTierPromotions` / threshold modeling | the yielded count + sequence termination |
| Untracked-callee warming | `PendingTieredCompilationCount` |

### API Proposal

```cs
namespace System.Runtime.CompilerServices;

public enum MethodCompilationStatus
{
    Cold,       // no native code for the active entry-point version yet
    Optimizing, // compiled at a non-terminal tier; a further promotion WILL come (tier0 / instrumented-PGO)
    Final,      // terminal tier for the active entry-point version; no further promotion is coming
}

public static partial class RuntimeHelpers
{
    // HEADLINE: the call schedule to drive a method to its final tier. Yields the ADDITIONAL invocations needed for
    // each successive promotion (a cold method yields 1 first, for cold -> tier0); the sequence ends when the method
    // is final. Lossless (runtime-signaled, not event-based) and side-effect-free apart from the awaiting.
    public static IAsyncEnumerable<int> EnumerateTierPromotionCallCountsAsync(RuntimeMethodHandle method, CancellationToken cancellationToken = default);

    // LOWER-LEVEL PRIMITIVES — the enumerable can be composed from these.

    // Lossless, side-effect-free poll of the active entry-point code version. Never triggers jitting or counting.
    public static MethodCompilationStatus GetMethodCompilationStatus(RuntimeMethodHandle method);

    // Additional invocations to reach the next code version: 1 if cold (cold -> tier0), the remaining count to the
    // next tier's threshold otherwise, or 0 if already final / not counted.
    public static int GetCallsToNextTierPromotion(RuntimeMethodHandle method);

    // Completes when the method is in a state the caller can act on: EITHER it has reached its final tier, OR it is
    // ready to count toward the next promotion (any tier-up already triggered has published AND the call-counting
    // stub for the current tier is installed). Waits for a QUIESCENT, READY state — never for a promotion to occur —
    // so a caller that under-invokes is never deadlocked. Bridges the background-compilation lag and the call-counting
    // delay (TC_CallCountingDelayMs); replaces the MethodLoadVerbose publication wait and Pause/Resume inference.
    public static ValueTask WaitForFinalCompilationOrCallCountingReadyAsync(RuntimeMethodHandle method, CancellationToken cancellationToken = default);

    // COMPANION: process-wide count of methods queued for / undergoing background tiered (re)compilation. 0 == idle.
    // Lossless replacement for the batch-level BackgroundJitStart/Stop quiescence proxy; for warming untracked callees.
    public static int PendingTieredCompilationCount { get; }
}
```

**Semantics of `EnumerateTierPromotionCallCountsAsync`:**

- **Element value:** the number of *additional* invocations from now needed to cross the next tier's call-count threshold. The caller invokes that many (more is harmless; fewer just doesn't promote).
- **No precondition; cold starts in the sequence.** A method never called (no native code yet) yields **1** first — a single invoke compiles it cold → tier0 — then proceeds with the higher-tier counts. The caller never has to pre-jit or special-case a cold method.
- **Termination = final tier.** The sequence ends when the active entry-point code version is at a tier from which no further promotion will occur. A method that never tiers (NoOptimization, AggressiveOptimization compiled straight to Optimized, R2R-pinned, a non-tiered runtime) yields just its single cold-compile count and then ends — "nothing more to wait for," for free.
- **Entry-point semantics.** Counts and termination track the active entry-point version (what fresh calls dispatch to). OSR versions are never surfaced — they aren't on the call-count ladder.
- **Cancellation** via the token / `[EnumeratorCancellation]`.
- **Generics:** per instantiation — the handle identifies the instantiated method (same shape as `RuntimeHelpers.PrepareMethod`'s instantiation overloads).

A status **enum** (rather than the raw tier or `MethodFlags` bitfield) is deliberate: a stable abstraction giving the only distinction a consumer needs — *transitional vs terminal* — without ossifying CoreCLR's internal tier numbering, and `Optimizing` vs `Final` directly answers "is a promotion still coming?".

### API Usage

```csharp
await foreach (int callsToNextTier in RuntimeHelpers.EnumerateTierPromotionCallCountsAsync(method, cancellationToken))
{
    for (int i = 0; i < callsToNextTier; i++)
        InvokeWorkload();
}
// the loop ending IS "the method reached its final tier"
```

No keyword firehose paid during measurement, no tier arithmetic, no Pause/Resume gating, no OSR filtering, no nudge heuristic, no quiescence proxy for the method, and no infinite waits to cap — all of it folds into `MoveNext`, losslessly. The yielded count is also *better* than the threshold knob: it's computed fresh against the calls already accumulated on the persistent per-tier counter.

The enumerable is exactly the lower-level primitives composed:

```csharp
while (true)
{
    await RuntimeHelpers.WaitForFinalCompilationOrCallCountingReadyAsync(m, ct);
    if (RuntimeHelpers.GetMethodCompilationStatus(m) == MethodCompilationStatus.Final) break;
    yield return RuntimeHelpers.GetCallsToNextTierPromotion(m);
    // caller invokes the yielded count
}
```

A cold method drains nothing, isn't `Final`, arms nothing, and yields `1`; a caller that under-invokes leaves nothing pending and just gets the recomputed count again next iteration instead of hanging — the wait settles into a ready state rather than awaiting a promotion, so it can't deadlock.

Warming callees the caller can't name, after the driven method is final:

```csharp
for (int i = 0; i < JitInfo.TieredCallCountThreshold; i++)
{
    InvokeWorkload();
}
while (RuntimeHelpers.PendingTieredCompilationCount > 0)
    await Task.Delay(TimeSpan.FromMilliseconds(10), ct);
```

### Alternative Designs

- **`MethodJittingQueued` event (carrying method identity + target tier).** Emit an event at enqueue time (e.g. at `OnCallCountThresholdReached` / `AsyncPromoteToTier1`) with a payload mirroring `MethodLoadVerbose` plus the tier in `MethodFlags`. This is the smallest *event-shaped* fix and would close the silent-decision gap and the still-coming-vs-never ambiguity. But it is still **lossy** (EventPipe drop), still requires the consumer to model the tier ladder and counting activation, and still leaves callee warming to the batch proxy. It's the right thing to do *if* the API route is rejected, and the enumerable/primitives could even be implemented partly on top of it — but on its own it leaves the consumer stitching lossy pieces together.

- **Add `MethodFlags` (tier) to `MethodJittingStarted`.** Even smaller; disambiguates tier0 compile-start from a tier-up compile-start. But `MethodJittingStarted` fires only when the worker *dequeues* the method — after everything ahead of it compiled — so it doesn't close the latency gap, and it's equally lossy.

- **Use `TieredCompilationBackgroundJitStart`/`Stop`.** Insufficient as a per-method signal: batch-level, count-only, no method identity, coalesced. (Proposed above only as a *global* quiescence query, which is all it can honestly support.)

- **Poll `MethodLoadVerbose`, bounded by background-queue quiescence (status quo).** Works, but late and non-deterministic: the quiescence proxy is batch-level and unattributable, so it over-waits on unrelated traffic, races on late enqueue, and still needs a timeout to conclude a method will never tier — and a dropped publication degrades it silently.

- **`bool RuntimeHelpers.IsMethodFullyOptimized(...)` (the existing #99870 proposal).** A point-in-time poll for the same benchmarking use case. The status enum here is a strict generalization: a bool collapses "still tier0," "in-flight," "instrumented-intermediate," and "terminal-but-minopt" into one value and bakes the runtime's definition of "fully optimized" into the answer, where a consumer often needs its own predicate (e.g. is `OptimizedTier1Instrumented` "done"? it's still call-counted). A poll alone also can't tell the caller *when* to look, so it forces a spin — which `WaitForFinalCompilationOrCallCountingReadyAsync` / `EnumerateTierPromotionCallCountsAsync` remove.

- **`GetCurrentMethodFlags` / a raw tier value.** Returning the `MethodLoadVerbose` `MethodFlags` bitfield (or the raw `JitOptimizationTier`) would be maximally consistent with the events, and the extra flags (Jitted vs R2R, generic) are independently useful. But a consumer never uses the packed field raw — it decodes `(flags >> 7) & 0x7` and tests membership — so the bit-identical "consistency" buys little, while exposing a raw ETW diagnostic bitfield (or raw tier numbers) as a supported API **ossifies the entire layout/taxonomy**, which is "subject to change" by design. The stable `MethodCompilationStatus` enum captures the information the consumer actually needs without that commitment.

- **A force-to-tier API (`PrepareMethod`-style), deliberately not proposed.** A "promote this method to its final tier synchronously" call would be more direct than driving invocations, but it changes semantics — it skips call-count/PGO accumulation, doesn't warm callees, and may compile without the profile natural execution would gather — so it's a deliberate non-goal.

### Risks

- **The call-count model is CoreCLR-tiering-specific.** "Promote a method by invoking it N times, in stages" is the shape of the headline API. It degrades cleanly where inapplicable (NativeAOT → empty enumeration / `Final`; Mono's interp→JIT tiering maps onto the same enum/sequence), and the abstraction is a *plus* for cross-runtime portability versus raw CoreCLR tier values — but it is a more opinionated surface than a neutral poll, and API review should weigh that.
- **Ossification.** A `MethodCompilationStatus` enum and a freshly-computed call count are deliberately chosen to *avoid* pinning internal taxonomy: the enum exposes only transitional-vs-terminal, and the count never commits to a fixed `TieredCallCountThreshold` constant ("invoke this many *now*, measured against the current counter"). Exposing raw tiers/`MethodFlags` instead (see Alternatives) would ossify the layout.

---

Related:
- #99870 (`RuntimeHelpers.IsMethodFullyOptimized`, same use case — subsumed here as the `MethodCompilationStatus` poll primitive).
- [dotnet/BenchmarkDotNet#3169](https://github.com/dotnet/BenchmarkDotNet/pull/3169) — the consuming implementation: the event-listener-based JIT-warmup stage this API would replace.

*Disclaimer: preparation of this issue (drafting, structuring, and refining the API shape) was assisted by AI (Claude). The design direction and decisions are the author's; the AI helped articulate and organize them.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: APIs to drive a method's tiered-compilation promotion to its final tier #129020

Background and motivation

Why today's events are insufficient

Concrete use case: BenchmarkDotNet

API Proposal

API Usage

Alternative Designs

Risks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Event	Fires when	Carries tier?	Problem
`MethodJittingStarted` (V1)	tier-up compile begins	❌ (no `MethodFlags`)	Can't tell a tier0 compile-start from a tier-up compile-start
`MethodLoadVerbose`	tier-up compile completes / is published	✅ (`MethodFlags` bits [7..9])	Latest possible point; gated behind the whole background queue

Gap (with events only)	Closed by
Silent decision point; publication lags by unrelated queue depth	`MoveNext` awaits this method's transition, runtime-signaled
Lossy delivery (dropped publication ⇒ missed/hung)	runtime-internal synchronization — no buffers to drop
"Is counting active?" inferred from Pause/Resume	the wait readies counting before yielding
OSR / instrumented tiers pollute the ladder	entry-point semantics; only call-count promotions are yielded
"Still coming vs never coming?" needs a timeout	ended sequence (and `Optimizing` vs `Final`)
Tier-set / `MaxTierPromotions` / threshold modeling	the yielded count + sequence termination
Untracked-callee warming	`PendingTieredCompilationCount`

[API Proposal]: APIs to drive a method's tiered-compilation promotion to its final tier #129020

Description

Background and motivation

Why today's events are insufficient

Concrete use case: BenchmarkDotNet

API Proposal

API Usage

Alternative Designs

Risks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions