Fix PerfMap lock ordering violation by deferring LogStubs operations#128918
Fix PerfMap lock ordering violation by deferring LogStubs operations#128918davidwrighton wants to merge 1 commit into
Conversation
PerfMap::LogStubs was taking s_csPerfMap (a CRST_DEFAULT lock) while callers could be holding CRST_UNSAFE_ANYMODE locks, creating a potential three-way deadlock scenario (bug dotnet#128401). The fix introduces a deferred queue mechanism: - Add a new CRST_UNSAFE_ANYMODE leaf lock (s_csPerfMapDeferred) for queue access - LogStubs now captures all needed data (name, line, timestamp, code buffer) into a queue under the ANYMODE lock instead of taking s_csPerfMap - Queue is replayed in order at every site that takes s_csPerfMap - Finalizer thread drains the queue via DoExtraWorkForFinalizer to ensure timely flushing even when no JIT activity is occurring - Timestamp and code buffer are only captured when JitDump is active, avoiding unnecessary allocations in perfmap-only mode Also changes ds_rt_enable_perfmap and PerfMap::Enable contracts to STANDARD_VM_CONTRACT for correctness. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @agocke |
There was a problem hiding this comment.
Pull request overview
This PR changes CoreCLR perfmap/jitdump stub logging to avoid taking the PerfMap default Crst (s_csPerfMap) from paths that may already hold CRST_UNSAFE_ANYMODE locks, by deferring stub log work into a queue guarded by a new ANYMODE leaf lock and replaying that queue when s_csPerfMap is held (plus draining from the finalizer thread).
Changes:
- Introduces a deferred-entry queue protected by
s_csPerfMapDeferredand replays it unders_csPerfMapat existing perfmap/jitdump logging sites. - Adds finalizer-thread “extra work” integration to drain deferred entries even when no other perfmap activity occurs.
- Extends PAL jitdump support with “timestamp + code buffer” replay helpers to preserve ordering without touching JIT memory later.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/perfmap.h | Adds deferred entry struct and new queue/lock declarations. |
| src/coreclr/vm/perfmap.cpp | Implements deferral + replay, hooks replay into perfmap/jitdump logging, adds drain/check helpers. |
| src/coreclr/vm/finalizerthread.cpp | Drains deferred perfmap work from finalizer extra-work loop. |
| src/coreclr/vm/eventing/eventpipe/ds-rt-coreclr.h | Updates perfmap-enable IPC handler contract to allow GC-triggering work. |
| src/coreclr/pal/src/misc/perfjitdump.cpp | Adds PAL entrypoints to replay jitdump method records with captured timestamp/buffer. |
| src/coreclr/pal/inc/pal.h | Declares new PAL jitdump APIs. |
| src/coreclr/inc/CrstTypes.def | Adds new Crst type and lock-order relationship. |
| src/coreclr/inc/crsttypes_generated.h | Regenerates Crst type enum/level/name maps to include the new Crst. |
| void PerfMap::Initialize() | ||
| { | ||
| LIMITED_METHOD_CONTRACT; | ||
|
|
||
| s_csPerfMap.Init(CrstPerfMap); | ||
| s_csPerfMapDeferred.Init(CrstPerfMapDeferredActions, CRST_UNSAFE_ANYMODE); | ||
|
|
||
| PerfMapType perfMapType = (PerfMapType)CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_PerfMapEnabled); | ||
| PerfMap::Enable(perfMapType, false); |
| void PerfMap::Disable() | ||
| { | ||
| LIMITED_METHOD_CONTRACT; | ||
|
|
||
| if (s_enabled) | ||
| { | ||
| CrstHolder ch(&(s_csPerfMap)); | ||
| ReplayDeferredEntries(); | ||
|
|
||
| s_enabled = false; | ||
| if (s_Current != nullptr) | ||
| { | ||
| delete s_Current; | ||
| s_Current = nullptr; | ||
| } | ||
|
|
||
| // PAL_PerfJitDump_Finish is lock protected and can safely be called multiple times | ||
| PAL_PerfJitDump_Finish(); | ||
| } |
| bool PerfMap::HasDeferredEntries() | ||
| { | ||
| LIMITED_METHOD_CONTRACT; | ||
|
|
||
| return s_pDeferredHead != nullptr; | ||
| } |
| // Queue the operation under s_csPerfMapDeferred instead of taking s_csPerfMap | ||
| // directly. LogStubs may be called while an ANYMODE lock is held, and s_csPerfMap | ||
| // is a DEFAULT lock that may trigger a GC mode transition. | ||
| PerfMapDeferredEntry * pEntry = new PerfMapDeferredEntry(); |
| PerfMapDeferredEntry * pEntry = new PerfMapDeferredEntry(); | ||
| pEntry->name.Set(name); | ||
| pEntry->line.Set(line); | ||
| pEntry->pCode = pCode; | ||
| pEntry->codeSize = codeSize; |
| int LogMethodWithTimestamp(void* pCode, size_t codeSize, const char* symbol, void* debugInfo, void* unwindInfo, uint64_t timestamp, void* codeBuffer, size_t codeBufferSize) | ||
| { | ||
| int result = 0; | ||
|
|
||
| if (enabled) | ||
| { | ||
| size_t symbolLen = strlen(symbol); | ||
|
|
||
| JitCodeLoadRecord record; | ||
|
|
||
| size_t bytesRemaining = sizeof(JitCodeLoadRecord) + symbolLen + 1 + codeBufferSize; | ||
|
|
||
| record.header.timestamp = timestamp; | ||
| record.vma = (uint64_t) pCode; | ||
| record.code_addr = (uint64_t) pCode; | ||
| record.code_size = codeSize; | ||
| record.header.total_size = bytesRemaining; |
| if (PerfMap::HasDeferredEntries()) | ||
| { | ||
| GCX_PREEMP(); | ||
| PerfMap::DrainDeferredEntries(); |
There was a problem hiding this comment.
It is not clear to me why we need to introduce the deferred logging to fix the log ordering violation. Is the deferred logging required part of the fix for the deadlock or is it trying to be an additional optimization?
If it is required for some reason, are there negative side-effect? Is it going to introduce a window where the tools that use perfmap can produce bad stacktraces?
There was a problem hiding this comment.
If we are working around code like
runtime/src/coreclr/vm/virtualcallstub.cpp
Line 2813 in db87610
I am not sure the pMayHaveReenteredCooperativeGCMode tricks in this code are worth it.
There was a problem hiding this comment.
Is it going to introduce a window where the tools that use perfmap can produce bad stacktraces?
In theory I think this could give us missing symbolication of particular stub stack frames, but I don't expect it to impact the overall unwind. While having 100% correct stacks would be the ideal, pragmatically I imagine the experience would be pretty good most of the time as long as the timing delays are short. If we can get the ideal scenario while still staying low enough risk for a servicing fix of course I'd have no complaints :)
Note
This PR description was AI/Copilot-generated.
Fixes #128401
Summary
PerfMap::LogStubs was taking
s_csPerfMap(aCRST_DEFAULTlock) while callers could be holdingCRST_UNSAFE_ANYMODElocks, creating a potential three-way deadlock scenario.Changes
The fix introduces a deferred queue mechanism:
s_csPerfMapDeferred/CrstPerfMapDeferredActions) for thread-safe queue access without GC mode transitionss_csPerfMap, it captures all needed data (name, perfmap line, timestamp, code buffer) into a queue under the ANYMODE locks_csPerfMapnow also replay the deferred queue in order, preserving correctness of jitdump/perfmap file outputDoExtraWorkForFinalizerpath drains the queue to ensure timely flushing even when no JIT activity is occurringPAL_PerfJitDump_IsStarted()), avoiding unnecessary allocations in perfmap-only modePAL additions
PAL_PerfJitDump_GetTimeStamp()— captures a timestamp at deferral timePAL_PerfJitDump_LogMethodWithTimestamp()— replays a jitdump entry with a previously captured timestamp and code bufferContract fixes
ds_rt_enable_perfmapandPerfMap::Enablechanged toSTANDARD_VM_CONTRACTfor correctness (they can trigger GC)