Skip to content

Testing a graceful failure of fetching codeInfo to see how much this mitigates test failures.#124089

Open
rcj1 wants to merge 3 commits intodotnet:mainfrom
rcj1:testing-codeinfo-failure
Open

Testing a graceful failure of fetching codeInfo to see how much this mitigates test failures.#124089
rcj1 wants to merge 3 commits intodotnet:mainfrom
rcj1:testing-codeinfo-failure

Conversation

@rcj1
Copy link
Contributor

@rcj1 rcj1 commented Feb 6, 2026

No description provided.

Copilot AI review requested due to automatic review settings February 6, 2026 14:24
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @steveisok, @thaystg, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes CoreCLR’s runtime-async exception stack trace augmentation to handle cases where the native diagnosticIP cannot be resolved to valid code information without asserting/failing.

Changes:

  • Replace an _ASSERTE(codeInfo.IsValid()) + unconditional append with a conditional append only when EECodeInfo(diagnosticIP) is valid.

@rcj1 rcj1 marked this pull request as ready for review February 6, 2026 15:54
@rcj1
Copy link
Contributor Author

rcj1 commented Feb 6, 2026

@jkotas @janvorli

@janvorli
Copy link
Member

janvorli commented Feb 6, 2026

It seems we should rather ensure that the diagnosticIP is always valid. How is it possible that the code got unloaded and we still attempt to use the ip of such code?

@janvorli
Copy link
Member

janvorli commented Feb 6, 2026

There is a similar need to hold code alive in ExceptionDispatchInfo. The DispatchState.StackTrace is responsible for holding the code alive as long as the ExceptionDispatchInfo exists. It seems we need something similar for the continuation resume info.

@rcj1
Copy link
Contributor Author

rcj1 commented Feb 6, 2026

It seems we should rather ensure that the diagnosticIP is always valid. How is it possible that the code got unloaded and we still attempt to use the ip of such code?

The diagnostic IP is populated here, and is not the stub IP at which we resume:

((InterpAsyncSuspendData*)GetDataItemAtIndex(ins->data[0]))->resumeInfo.DiagnosticIP = (size_t)startIp;

DavidWr has indicated he is not surprised there is an issue involving this as certain asyncv2 debug info is still a work in progress.

@rcj1
Copy link
Contributor Author

rcj1 commented Feb 6, 2026

Let us choose between this PR and #124076 - if we merge this, we should be able to test non-diagnostics related interpreter tests with runtime-async.

@janvorli
Copy link
Member

janvorli commented Feb 6, 2026

I would be fine with this as a temporary workaround until we fix the real problem. I think we should leave the original issue open and add a link to it to that long comment you have added and mention that it is a temporary measure.

rcj1 added 2 commits February 6, 2026 11:49
Add a comment regarding the temporary measure for testing.
Copilot AI review requested due to automatic review settings February 6, 2026 19:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

methodDesc,
NULL);
// Interpreter diagnostic IP is not recognized by codeInfo, so this does not work with interpreted code.
// This is a temporary measure to enable testing and once the issue is fixed this condition should be replaced by an assert.
Copy link
Member

@jkotas jkotas Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we get a random pointer here that happens to be valid and maps to some completely unrelated method? I doubt we are robust against mapping debug info onto a completely unrelated method. I think this is just going to make it crash less often, it is not a reliable workaround for the bug.

I think stable CI is more important than having a bit more of async testing enabled ASAP. I would take the other PR to disable the runtime async with interpreter instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach would be to omit the append if the interpreter is enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants