Skip to content

Fix PerfMap lock ordering violation by deferring LogStubs operations#128918

Open
davidwrighton wants to merge 1 commit into
dotnet:mainfrom
davidwrighton:perfmap-deferred-queue
Open

Fix PerfMap lock ordering violation by deferring LogStubs operations#128918
davidwrighton wants to merge 1 commit into
dotnet:mainfrom
davidwrighton:perfmap-deferred-queue

Conversation

@davidwrighton
Copy link
Copy Markdown
Member

Note

This PR description was AI/Copilot-generated.

Fixes #128401

Summary

PerfMap::LogStubs was taking s_csPerfMap (a CRST_DEFAULT lock) while callers could be holding CRST_UNSAFE_ANYMODE locks, creating a potential three-way deadlock scenario.

Changes

The fix introduces a deferred queue mechanism:

  • New ANYMODE leaf lock (s_csPerfMapDeferred/CrstPerfMapDeferredActions) for thread-safe queue access without GC mode transitions
  • LogStubs defers work: Instead of taking s_csPerfMap, it captures all needed data (name, perfmap line, timestamp, code buffer) into a queue under the ANYMODE lock
  • Queue replay: All sites that take s_csPerfMap now also replay the deferred queue in order, preserving correctness of jitdump/perfmap file output
  • Finalizer thread draining: The finalizer's DoExtraWorkForFinalizer path drains the queue to ensure timely flushing even when no JIT activity is occurring
  • Conditional capture: Timestamp and code buffer are only captured when JitDump is active (PAL_PerfJitDump_IsStarted()), avoiding unnecessary allocations in perfmap-only mode

PAL additions

  • PAL_PerfJitDump_GetTimeStamp() — captures a timestamp at deferral time
  • PAL_PerfJitDump_LogMethodWithTimestamp() — replays a jitdump entry with a previously captured timestamp and code buffer

Contract fixes

  • ds_rt_enable_perfmap and PerfMap::Enable changed to STANDARD_VM_CONTRACT for correctness (they can trigger GC)

PerfMap::LogStubs was taking s_csPerfMap (a CRST_DEFAULT lock) while
callers could be holding CRST_UNSAFE_ANYMODE locks, creating a potential
three-way deadlock scenario (bug dotnet#128401).

The fix introduces a deferred queue mechanism:
- Add a new CRST_UNSAFE_ANYMODE leaf lock (s_csPerfMapDeferred) for
  queue access
- LogStubs now captures all needed data (name, line, timestamp, code
  buffer) into a queue under the ANYMODE lock instead of taking s_csPerfMap
- Queue is replayed in order at every site that takes s_csPerfMap
- Finalizer thread drains the queue via DoExtraWorkForFinalizer to ensure
  timely flushing even when no JIT activity is occurring
- Timestamp and code buffer are only captured when JitDump is active,
  avoiding unnecessary allocations in perfmap-only mode

Also changes ds_rt_enable_perfmap and PerfMap::Enable contracts to
STANDARD_VM_CONTRACT for correctness.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @agocke
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes CoreCLR perfmap/jitdump stub logging to avoid taking the PerfMap default Crst (s_csPerfMap) from paths that may already hold CRST_UNSAFE_ANYMODE locks, by deferring stub log work into a queue guarded by a new ANYMODE leaf lock and replaying that queue when s_csPerfMap is held (plus draining from the finalizer thread).

Changes:

  • Introduces a deferred-entry queue protected by s_csPerfMapDeferred and replays it under s_csPerfMap at existing perfmap/jitdump logging sites.
  • Adds finalizer-thread “extra work” integration to drain deferred entries even when no other perfmap activity occurs.
  • Extends PAL jitdump support with “timestamp + code buffer” replay helpers to preserve ordering without touching JIT memory later.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/coreclr/vm/perfmap.h Adds deferred entry struct and new queue/lock declarations.
src/coreclr/vm/perfmap.cpp Implements deferral + replay, hooks replay into perfmap/jitdump logging, adds drain/check helpers.
src/coreclr/vm/finalizerthread.cpp Drains deferred perfmap work from finalizer extra-work loop.
src/coreclr/vm/eventing/eventpipe/ds-rt-coreclr.h Updates perfmap-enable IPC handler contract to allow GC-triggering work.
src/coreclr/pal/src/misc/perfjitdump.cpp Adds PAL entrypoints to replay jitdump method records with captured timestamp/buffer.
src/coreclr/pal/inc/pal.h Declares new PAL jitdump APIs.
src/coreclr/inc/CrstTypes.def Adds new Crst type and lock-order relationship.
src/coreclr/inc/crsttypes_generated.h Regenerates Crst type enum/level/name maps to include the new Crst.

Comment on lines 49 to 57
void PerfMap::Initialize()
{
LIMITED_METHOD_CONTRACT;

s_csPerfMap.Init(CrstPerfMap);
s_csPerfMapDeferred.Init(CrstPerfMapDeferredActions, CRST_UNSAFE_ANYMODE);

PerfMapType perfMapType = (PerfMapType)CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_PerfMapEnabled);
PerfMap::Enable(perfMapType, false);
Comment on lines 248 to 266
void PerfMap::Disable()
{
LIMITED_METHOD_CONTRACT;

if (s_enabled)
{
CrstHolder ch(&(s_csPerfMap));
ReplayDeferredEntries();

s_enabled = false;
if (s_Current != nullptr)
{
delete s_Current;
s_Current = nullptr;
}

// PAL_PerfJitDump_Finish is lock protected and can safely be called multiple times
PAL_PerfJitDump_Finish();
}
Comment on lines +269 to +274
bool PerfMap::HasDeferredEntries()
{
LIMITED_METHOD_CONTRACT;

return s_pDeferredHead != nullptr;
}
Comment on lines +500 to +503
// Queue the operation under s_csPerfMapDeferred instead of taking s_csPerfMap
// directly. LogStubs may be called while an ANYMODE lock is held, and s_csPerfMap
// is a DEFAULT lock that may trigger a GC mode transition.
PerfMapDeferredEntry * pEntry = new PerfMapDeferredEntry();
Comment on lines +503 to +507
PerfMapDeferredEntry * pEntry = new PerfMapDeferredEntry();
pEntry->name.Set(name);
pEntry->line.Set(line);
pEntry->pCode = pCode;
pEntry->codeSize = codeSize;
Comment on lines +333 to +349
int LogMethodWithTimestamp(void* pCode, size_t codeSize, const char* symbol, void* debugInfo, void* unwindInfo, uint64_t timestamp, void* codeBuffer, size_t codeBufferSize)
{
int result = 0;

if (enabled)
{
size_t symbolLen = strlen(symbol);

JitCodeLoadRecord record;

size_t bytesRemaining = sizeof(JitCodeLoadRecord) + symbolLen + 1 + codeBufferSize;

record.header.timestamp = timestamp;
record.vma = (uint64_t) pCode;
record.code_addr = (uint64_t) pCode;
record.code_size = codeSize;
record.header.total_size = bytesRemaining;
if (PerfMap::HasDeferredEntries())
{
GCX_PREEMP();
PerfMap::DrainDeferredEntries();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me why we need to introduce the deferred logging to fix the log ordering violation. Is the deferred logging required part of the fix for the deadlock or is it trying to be an additional optimization?

If it is required for some reason, are there negative side-effect? Is it going to introduce a window where the tools that use perfmap can produce bad stacktraces?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are working around code like

RETURN GenerateDispatchStubLong(addrOfCode,
that does heavy lifting in cooperative mode, I think it would be better to just switch to preemptive mode.

I am not sure the pMayHaveReenteredCooperativeGCMode tricks in this code are worth it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it going to introduce a window where the tools that use perfmap can produce bad stacktraces?

In theory I think this could give us missing symbolication of particular stub stack frames, but I don't expect it to impact the overall unwind. While having 100% correct stacks would be the ideal, pragmatically I imagine the experience would be pretty good most of the time as long as the timing delays are short. If we can get the ideal scenario while still staying low enough risk for a servicing fix of course I'd have no complaints :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PerfMap / CodeFragmentHeap lock-ordering deadlock during GC suspension

4 participants