[release/10.0] Fix token memory ordering issue reading from the MethodDef and FieldDef token tables (#123557)#124110
Conversation
…ef token tables (dotnet#123557) We have logic which loads a typedef, and if the typedef is loaded then we assume the method can be loaded via lookup in the MethodDef/FieldDef token map tables. HOWEVER, what this boils down to is we do a load from the TypeDef table, and if that succeeds we will do a load from the FieldDef/MethodDef token map tables, but that second load is done without an explicit memory barrier. Unfortunately, through the wonder of memory ordering behavior, its legal (and not actually all that uncommon) for the CPU to do the load from the FieldDef/MethodDef tables BEFORE it actually loads from the TypeDef table. So what can happen is that the fielddef table read can happen, then the typedef table read can happen, then the CPU does the if check to see if its ok to do the fielddef table read, then we use the value from the fielddef table read which happened earlier. This fix should fix that problem by using a VolatileLoad for all reads from the lookup maps. This is slightly slower but avoids the extremely easy to mess up dependency ordering rules. Fixes dotnet#120754 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Aaron R Robinson <arobins@microsoft.com>
There was a problem hiding this comment.
Pull request overview
Backport to release/10.0 of the CoreCLR VM fix that addresses a memory-ordering hazard when reading from token lookup maps (TypeDef → MethodTable followed by MethodDef/FieldDef → Desc), which could lead to incorrect early reads and crashes on weakly-ordered CPUs (notably Arm64).
Changes:
- Replace
VolatileLoadWithoutBarrierwithVolatileLoadwhen reading values fromLookupMaptables to enforce proper ordering. - Expand the inline comment to document why a data dependency is insufficient for the observed access pattern.
| // LookupMap's hold pointers to data which is immutable, so normally we could use a data | ||
| // dependency instead of an explicit barrier here. However, the access pattern between |
There was a problem hiding this comment.
Grammar: “LookupMap's hold …” reads as a possessive and is ungrammatical. Consider rephrasing to “LookupMaps hold …” (or “LookupMap holds …”) to avoid confusion in this explanatory comment.
| // LookupMap's hold pointers to data which is immutable, so normally we could use a data | ||
| // dependency instead of an explicit barrier here. However, the access pattern between |
There was a problem hiding this comment.
Grammar: “LookupMap's hold …” is ungrammatical (possessive). Consider changing to “LookupMaps hold …” (or “LookupMap holds …”) for clarity.
|
Tagging subscribers to this area: @agocke |
/cc @copilot
Backport of #123557 to release/10.0
Customer Impact
We have logic which loads a typedef, and if the typedef is loaded then we assume the method can be loaded via lookup in the MethodDef/FieldDef token map tables. HOWEVER, what this boils down to is we do a load from the TypeDef table, and if that succeeds we will do a load from the FieldDef/MethodDef token map tables, but that second load is done without an explicit memory barrier. Unfortunately, through the wonder of memory ordering behavior, its legal (and not actually all that uncommon) for the CPU to do the load from the FieldDef/MethodDef tables BEFORE it actually loads from the TypeDef table. So what can happen is that the fielddef table read can happen, then the typedef table read can happen, then the CPU does the if check to see if its ok to do the fielddef table read, then we use the value from the fielddef table read which happened earlier.
This fix should fix that problem by using a VolatileLoad for all reads from the lookup maps. This is slightly slower but avoids the extremely easy to mess up dependency ordering rules.
Without this fix, builds using the C# compiler are known to crash on startup some percentage of the time.
Regression
There is an increase in usage of Arm64 hardware which is more vulnerable to memory ordering effects. So, this bug is becoming more prevalent with time.
Testing
Tested in main with no visible regressions.
Risk
Low, the largest risk is a small performance regression.
IMPORTANT: If this backport is for a servicing release, please verify that:
release/X.0-staging, notrelease/X.0.release/X.0(no-stagingsuffix).Package authoring no longer needed in .NET 9
IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.
…