Description
I found this problem while trying to optimize the PureHDF package to reduce reflection overhead. I've used AI tooling to try to capture the failure modes and build a standalone repo to demonstrate the problem.
When a single instance of a generic delegate built via reflection is
held alive by a cache and called millions of times, the JIT
intermittently produces bad code for the call path. The symptom is not
constant — across runs of the same binary on the same machine I've
observed at least six distinct failure modes from what is almost
certainly one underlying codegen bug:
| Symptom |
Where |
System.OverflowException at MemoryMarshal.AsBytes |
checked(span.Length * sizeof(T)) with Length = 1, sizeof(T) = 12 |
System.Exception: total file element count != total memory element count |
An upstream ulong[1].Aggregate(...) returns a wrong value |
System.InvalidOperationException from the wrong branch of is null check |
(buffer is null || buffer.Equals(default)) is false when buffer is default(T) |
System.NullReferenceException inside System.RuntimeType.ListBuilder<T>.Add(T) |
Runtime/reflection internals stepped on |
Unhandled System.NullReferenceException at a plain { get; } auto-property |
get_Message() returns null for a non-null readonly field — this corrupted |
System.EntryPointNotFoundException at System.IDisposable.Dispose() during teardown |
Method-table corruption visible during GC/finalization |
Process abort with exit code 0xC0000005 during BDN AfterActualRun |
Heap corruption that GC walks into |
All of them disappear with DOTNET_TieredCompilation=0, which strongly
suggests this is a tier-up codegen issue.
Reproduction Steps
Self-contained repro: https://github.com/marklam/tierproblems
A copy is in this comment for convenience (also see the full repro at
the URL above for the multi-target csproj, ~250 LOC of inlined helper
classes, and the README):
git clone https://github.com/marklam/TierProblems.git tier-problems
cd tier-problems
dotnet build -c Release
dotnet run -c Release --no-build --framework net8.0 # frequent hit
dotnet run -c Release --no-build --framework net10.0 # rare hit
The program builds a chain that mirrors PureHDF's read path:
-
An outer cached Reader<TResult> delegate per (TResult, TElement)
pair, built once via MethodInfo.MakeGenericMethod(...).CreateDelegate(...)
pointing at an instance generic method on the receiver class.
-
An inner cached DecodeDelegate<TElement> per (TElement, isRawMode)
pair on the message instance, built once via
MakeGenericMethod(...).Invoke(...) of a static helper that returns a
static-local-function delegate of the form
static void decode(IH5ReadStream source, Span<T> target)
=> source.ReadDataset(MemoryMarshal.AsBytes(target));
-
A hot loop that calls the outer cached delegate ~200M times on the
same receiver, which routes through ReadCoreLevel1_generic<TResult, TElement>
→ ReadCoreLevel2<TElement> → the cached inner decoder → the static
local function → MemoryMarshal.AsBytes.
The try { } catch (Exception) { … } around each hot-loop call
captures most symptoms; some manifestations (mid-warmup
NullReferenceException from a corrupted this, the AV) skip the
catch and tear the process down.
Expected behavior
The hot loop should complete deterministically. MemoryMarshal.AsBytes
on a Span<T> of length 1 cannot overflow. A field-backed { get; }
auto-property on a non-null instance cannot return null. Equals on
default(T) against default(T) for a sequential struct cannot
return false.
Actual behavior
After a variable number of calls (anywhere from <1M to ~200M
observed), one of the symptoms above fires.
Repro hit-rate (15+ runs each, default tier-up)
13th Gen Intel Core i9-13900KS / Windows 11 (10.0.26200) / x64 / .NET SDK 10.0.300 / Server GC:
| Runtime |
Sample |
Failures |
Rate |
| 8.0.27 (8.0.2726.22922) |
15 |
10 |
~67% |
| 10.0.8 |
45 |
3 |
~7% |
8.0.27 + DOTNET_TieredCompilation=0 |
6 |
0 |
0% |
So the bug is present in both .NET 8 and .NET 10 — .NET 10 just
misses the bad codegen path much more often (and is meaningfully faster
overall on the same loop, ~17s vs ~30s per 200M calls). Disabling
tier-up consistently eliminates it.
Sample stack traces
InvalidOperationException — buffer.Equals(default(TResult))
returning false for default(Sample):
HIT after 22,646,862 calls in 4.0s
outer: System.InvalidOperationException: JIT corruption: the 'buffer is default' check returned false even though the caller passed default(TResult).
at TierProblems.Inlined.NativeAttribute.ReadCoreLevel1_generic[TResult,TElement](TResult buffer, IH5ReadStream source, UInt64[] memoryDims)
at TierProblems.Inlined.NativeAttribute.Read[T](UInt64[] memoryDims)
NullReferenceException in runtime reflection internals:
HIT after 156,419,186 calls in 24.0s
outer: System.NullReferenceException: Object reference not set to an instance of an object.
at System.RuntimeType.ListBuilder`1.Add(T item)
at TierProblems.Inlined.NativeAttribute.ReadCoreLevel1_generic[TResult,TElement](TResult buffer, IH5ReadStream source, UInt64[] memoryDims)
NullReferenceException from a field-backed auto-property (on .NET 10
during the warmup loop, before the catch):
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
at TierProblems.Inlined.NativeAttribute.get_Message()
at TierProblems.Inlined.NativeAttribute.GetDecoderAndFileElementCount[TElement]()
at TierProblems.Inlined.NativeAttribute.ReadCoreLevel1_generic[TResult,TElement](TResult buffer, IH5ReadStream source, UInt64[] memoryDims)
OverflowException at MemoryMarshal.AsBytes (the original symptom
that motivated this repro; observed against the unmodified PureHDF
codebase before the workaround):
System.OverflowException: Arithmetic operation resulted in an overflow.
at PureHDF.VOL.Native.DatatypeMessage.<GetDecodeInfoForUnmanagedMemory>g__decode|46_0[T](IH5ReadStream source, Span`1 target)
at PureHDF.VOL.Native.NativeAttribute.ReadCoreLevel1_generic[TResult,TElement](...)
Notes
- A "shape-only" repro that mirrored the call structure but didn't
carry PureHDF's per-call work (allocation of SystemMemoryStream,
MemoryManager<T> virtual GetSpan(), ulong[1], new T[1], the
reflection chain into RuntimeHelpers.IsReferenceOrContainsReferences<T>)
ran ~7× faster per call and never reproduced the bug at 1B+ calls.
Inlining the actual classes verbatim (which slows the loop to ~6M
calls/sec) brings the bug back. The bug appears to be sensitive to
how much work the JIT compiles into the tier-up target, not just
the abstract call shape — small/cheap targets may get folded enough
to bypass it.
- The bug fires while iterating only one cache entry, but reliably
reproducing requires priming several (TResult, TElement) pairs in
the outer cache during warm-up first. With a single entry primed it
never reproduced in 200M+ calls.
- A
ProjectReference to the affected library (PureHDF) without
calling any of its methods does not reproduce the bug — so this
isn't a module-initialiser / .cctor / metadata-load issue. The
corrupt codegen requires the affected methods to actually run and
reach tier 1.
Regression?
No response
Known Workarounds
set the environment variable DOTNET_TieredCompilation=0
Configuration
- Windows 11 Pro (10.0.26200.8524), x64
- 13th Gen Intel Core i9-13900KS, 24c/24t
- .NET SDK 10.0.300
- Runtimes tested: 8.0.27, 10.0.8
- Server GC enabled (
<ServerGarbageCollection>true</ServerGarbageCollection>); behaviour unchanged with workstation GC on a brief check.
Other information
No response
Description
I found this problem while trying to optimize the PureHDF package to reduce reflection overhead. I've used AI tooling to try to capture the failure modes and build a standalone repo to demonstrate the problem.
When a single instance of a generic delegate built via reflection is
held alive by a cache and called millions of times, the JIT
intermittently produces bad code for the call path. The symptom is not
constant — across runs of the same binary on the same machine I've
observed at least six distinct failure modes from what is almost
certainly one underlying codegen bug:
System.OverflowExceptionatMemoryMarshal.AsByteschecked(span.Length * sizeof(T))withLength = 1,sizeof(T) = 12System.Exception: total file element count != total memory element countulong[1].Aggregate(...)returns a wrong valueSystem.InvalidOperationExceptionfrom the wrong branch ofis nullcheck(buffer is null || buffer.Equals(default))isfalsewhen buffer isdefault(T)System.NullReferenceExceptioninsideSystem.RuntimeType.ListBuilder<T>.Add(T)System.NullReferenceExceptionat a plain{ get; }auto-propertyget_Message()returnsnullfor a non-null readonly field —thiscorruptedSystem.EntryPointNotFoundExceptionatSystem.IDisposable.Dispose()during teardown0xC0000005during BDNAfterActualRunAll of them disappear with
DOTNET_TieredCompilation=0, which stronglysuggests this is a tier-up codegen issue.
Reproduction Steps
Self-contained repro: https://github.com/marklam/tierproblems
A copy is in this comment for convenience (also see the full repro at
the URL above for the multi-target csproj, ~250 LOC of inlined helper
classes, and the README):
The program builds a chain that mirrors
PureHDF's read path:An outer cached
Reader<TResult>delegate per(TResult, TElement)pair, built once via
MethodInfo.MakeGenericMethod(...).CreateDelegate(...)pointing at an instance generic method on the receiver class.
An inner cached
DecodeDelegate<TElement>per(TElement, isRawMode)pair on the message instance, built once via
MakeGenericMethod(...).Invoke(...)of a static helper that returns astatic-local-function delegate of the form
A hot loop that calls the outer cached delegate ~200M times on the
same receiver, which routes through
ReadCoreLevel1_generic<TResult, TElement>→
ReadCoreLevel2<TElement>→ the cached inner decoder → the staticlocal function →
MemoryMarshal.AsBytes.The
try { } catch (Exception) { … }around each hot-loop callcaptures most symptoms; some manifestations (mid-warmup
NullReferenceExceptionfrom a corruptedthis, the AV) skip thecatch and tear the process down.
Expected behavior
The hot loop should complete deterministically.
MemoryMarshal.AsByteson a
Span<T>of length 1 cannot overflow. A field-backed{ get; }auto-property on a non-null instance cannot return
null.Equalsondefault(T)againstdefault(T)for a sequentialstructcannotreturn
false.Actual behavior
After a variable number of calls (anywhere from <1M to ~200M
observed), one of the symptoms above fires.
Repro hit-rate (15+ runs each, default tier-up)
13th Gen Intel Core i9-13900KS / Windows 11 (10.0.26200) / x64 / .NET SDK 10.0.300 / Server GC:
DOTNET_TieredCompilation=0So the bug is present in both .NET 8 and .NET 10 — .NET 10 just
misses the bad codegen path much more often (and is meaningfully faster
overall on the same loop, ~17s vs ~30s per 200M calls). Disabling
tier-up consistently eliminates it.
Sample stack traces
InvalidOperationException—buffer.Equals(default(TResult))returning
falsefordefault(Sample):NullReferenceExceptionin runtime reflection internals:NullReferenceExceptionfrom a field-backed auto-property (on .NET 10during the warmup loop, before the catch):
OverflowExceptionatMemoryMarshal.AsBytes(the original symptomthat motivated this repro; observed against the unmodified PureHDF
codebase before the workaround):
Notes
carry PureHDF's per-call work (allocation of
SystemMemoryStream,MemoryManager<T>virtualGetSpan(),ulong[1],new T[1], thereflection chain into
RuntimeHelpers.IsReferenceOrContainsReferences<T>)ran ~7× faster per call and never reproduced the bug at 1B+ calls.
Inlining the actual classes verbatim (which slows the loop to ~6M
calls/sec) brings the bug back. The bug appears to be sensitive to
how much work the JIT compiles into the tier-up target, not just
the abstract call shape — small/cheap targets may get folded enough
to bypass it.
reproducing requires priming several
(TResult, TElement)pairs inthe outer cache during warm-up first. With a single entry primed it
never reproduced in 200M+ calls.
ProjectReferenceto the affected library (PureHDF) withoutcalling any of its methods does not reproduce the bug — so this
isn't a module-initialiser /
.cctor/ metadata-load issue. Thecorrupt codegen requires the affected methods to actually run and
reach tier 1.
Regression?
No response
Known Workarounds
set the environment variable DOTNET_TieredCompilation=0
Configuration
<ServerGarbageCollection>true</ServerGarbageCollection>); behaviour unchanged with workstation GC on a brief check.Other information
No response