[NativeAOT] Print OOM message before Abort() on Linux#125311
Conversation
Co-authored-by: agocke <515774+agocke@users.noreply.github.com>
|
Please add a unit test to src/tests |
Co-authored-by: agocke <515774+agocke@users.noreply.github.com>
Added a new |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| // Minimal OOM fail-fast path: avoid heap allocations as much as possible, but still | ||
| // report that OOM is the reason for the crash. | ||
| try | ||
| { | ||
| // Try to print the same short message CoreCLR prints. | ||
| Internal.Console.Error.Write("Out of memory."); | ||
| Internal.Console.Error.WriteLine(); | ||
| } | ||
| catch { } |
There was a problem hiding this comment.
We may want to address this by changing Internal.Console.Error.Write to avoid managed allocations for small strings on Unix. Notice that Windows implementation is like that already.
I suspect that the test may be flaky otherwise given that it tries to allocate every last bit of managed memory.
There was a problem hiding this comment.
The test was flaky indeed but I think the problem was using a list that tried to resize from 2048 to 4096 and failed leaving a remaining size of still ~32 KB in memory, too big to start trying to do very tiny allocations. I think the GC was just thrashing so updated the test.
I also added this change so I'm expecting the test doesn't fail, will observe the CI.
There was a problem hiding this comment.
The test consistently times out only on osx x64. At this point I'm wondering if there's a problem with the test or it's actually the runtime the one that is thrashing when it should OOM.
There was a problem hiding this comment.
I would not be surprised if there are runtime bugs that lead to hang on OOM like the one this test is exercising.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| object[] storage = new object[8192]; | ||
| int idx = 0; | ||
| // We expect ~2048 iterations in the first loop and ~64 iterations in the second. | ||
| try { while (idx < storage.Length) storage[idx++] = GC.AllocateArray<byte>(16 * 1024, pinned: true); } catch (OutOfMemoryException) { } |
There was a problem hiding this comment.
Is OOM caused by pinned: true able to hit the minimalFailFast?
There was a problem hiding this comment.
Locally I'm seeing it sometimes hits the minimalFailFast, sometimes it outputs the full stack trace. But in any case the test still doesn't work on osx x64. Should we exclude the test from that config and file an issue?
There was a problem hiding this comment.
Should we exclude the test from that config and file an issue?
Sounds good
| // Even in the small allocations case, the runtime might still have enough memory to construct | ||
| // an OutOfMemoryException and print the full diagnostic. | ||
| // Either token is acceptable, but at least one should be present to confirm that OOM was the reason for termination. | ||
| if (!(stderr.Contains(ExpectedOomToken) || stderr.Contains(ExpectedMinimalOomToken))) | ||
| { |
On Linux, NativeAOT processes terminating due to
OutOfMemoryException(e.g. withDOTNET_GCHeapHardLimitset) printed onlyAbortedwith no diagnostic context.Root cause
RuntimeExceptionHelpers.FailFastdetects the preallocated OOM exception viaminimalFailFast = (exception == PreallocatedOutOfMemoryException.Instance)and skips all stderr output to avoid heap allocations — leaving the user with no indication of why the process died.Fix
In the
minimalFailFastpath, print a hardcoded OOM message to stderr before callingAbort(). The write is wrapped intry/catch {}so a secondary allocation failure silently falls through to the existing abort path.Before:
AbortedAfter:
Out of memory.Test
Added a new
OomHandlingsmoke test insrc/tests/baseservices/exceptions/OutOfMemoryException/. The test spawns itself as a subprocess withDOTNET_GCHeapHardLimit=20000000(32 MB) set, waits for the subprocess to run out of memory, and verifies thatOutOfMemoryExceptionappears in stderr. This covers both the preallocated OOM path (the fix) and the existing unhandled-exception path. The test is skipped on mobile and browser platforms that do not support process spawning.Original prompt
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.