Disconnect stale automation peers from UIA to fix memory leak in virtualized ItemsControls#11657
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a memory leak in WPF UI Automation (UIA) for virtualized ItemsControl scenarios by explicitly disconnecting UIA providers for removed automation peers, allowing COM references held by UIA Core to be released and enabling GC of stale peers and their visual subtrees.
Changes:
- Added a helper to disconnect an
AutomationPeer’sElementProxyfrom UIA viaUiaDisconnectProvider. - Updated
UpdateChildrenInternalto disconnect removed children even when noStructureChangedlisteners are registered. - Added the required interop import to call into
UIAutomationCore.dll.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (proxyWeakRef?.Target is ElementProxy proxy) | ||
| { | ||
| UiaDisconnectProvider(proxy); | ||
| } | ||
|
|
||
| peer._elementProxyWeakReference = null; |
…ualized ItemsControls When a UIA client enumerates automation peers via FindAll, ElementProxy CCWs are created but never released, pinning managed peers and entire visual sub-trees in memory. This change disconnects removed peers in both UpdateChildrenInternal and EnsureChildren, breaking the reference chain so containers can be GC'd. Key changes: - DisconnectPeerFromUia defers UiaDisconnectProvider via Dispatcher.BeginInvoke to avoid calling it during UIA callbacks (re-entrancy unsafe). - proxy.ClearPeer() severs the ElementProxy->peer strong reference immediately, allowing the peer to become GC-eligible before the deferred call executes. - peer._children is nulled to break the chain from disconnected item peers to cached cell peers that root DataGridCell/DataGridRow containers. - EnsureChildren disconnects stale peers gated on _elementProxyWeakReference, making it allocation-free when no UIA client has walked the subtree. Fixes dotnet#11337 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
03cdc2d to
65afa49
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
h3xds1nz
left a comment
There was a problem hiding this comment.
@amarinov-msft I'm really worried about perf with these changes. UIA implementation itself is already a really heavy mess (things like #10676 and more), and this will further increase the weight (an async DispatcherOperation for every disconnected peer), further enumeration over the intermediate collections.
Now, I don't have an exact solution here since in our wpf fork I've reworked this extensively but I think it's worth a further investigation and profiling to fix this.
| if (dispatcher != null && !dispatcher.HasShutdownStarted) | ||
| { | ||
| dispatcher.BeginInvoke(DispatcherPriority.Background, | ||
| new Action(() => UiaDisconnectProvider(proxy))); |
There was a problem hiding this comment.
This creates a closure unnecessarily and a new action everytime. Ideally use the parameter variant, it's an object you're passing so no boxing here.
| // UiaDisconnectProvider must NOT be called during a UIA callback (e.g. during Navigate/FindAll handling). | ||
| // UpdateChildrenInternal is invoked from within UIA callbacks, so defer the actual COM disconnect to a separate dispatcher operation. | ||
| Dispatcher dispatcher = peer.Dispatcher; | ||
| if (dispatcher != null && !dispatcher.HasShutdownStarted) |
There was a problem hiding this comment.
I'm not sure this is even required/true; the UIA callbacks should already run on the Dispatcher synchronously iirc, so that check seems redundant. If invoked from proxy directly, this would already throw if it was the case.
| /// </remarks> | ||
| private static void DisconnectPeerFromUia(AutomationPeer peer) | ||
| { | ||
| if (peer == null) |
There was a problem hiding this comment.
This check is redundant since you're already checking in the calling function.
| } | ||
|
|
||
| [DllImport("UIAutomationCore.dll", EntryPoint = "UiaDisconnectProvider", CharSet = CharSet.Unicode)] | ||
| private static extern int UiaDisconnectProvider(IRawElementProviderSimple provider); |
There was a problem hiding this comment.
This shouldn't be defined in this file.
| // This must happen regardless of StructureChanged event registration. | ||
| if (removedCount > 0) | ||
| { | ||
| foreach (AutomationPeer removedChild in hs) |
There was a problem hiding this comment.
This is just a sigh for another PR but this whole thing should be turned into a simple merge walk to differentiate old and new and avoid the HashSet allocations altogether. Sometimes this is tens of MBs for something that could have been a simple loop.
The least that can be done here is to repeat this loop over the HashSet only once; by either doing all the StructureChanged stuff or not.
| if (old._elementProxyWeakReference == null) | ||
| continue; | ||
|
|
||
| if (_children == null || !_children.Contains(old)) |
There was a problem hiding this comment.
This basically O(n*m), if this is a top automation peer, it's gonna be pretty heavy.
- Cache static DispatcherOperationCallback; drop closure/Action allocation per call. - Remove redundant Dispatcher null/HasShutdownStarted guard (matches InvalidatePeer pattern). - Remove redundant peer null check inside DisconnectPeerFromUia (callers guarantee non-null). - Move UiaDisconnectProvider P/Invoke onto ElementProxy as a single Disconnect() method; collapse ClearPeer() into it. Use DllImport.UIAutomationCore string constant per codebase convention. - Build HashSet of new children lazily in EnsureChildren to avoid O(n*m) Contains scan; common path stays allocation-free. - Drop redundant disconnect loop from UpdateChildrenInternal (EnsureChildren already handles it on this path); revert the merged loop back to events-only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@h3xds1nz thank you for your review. I tried addressing your comments. While the fundamental question about this issue is resolved (performance vs behavior), please check the updates when you have the opportunity and let me know if they are in the direction that you had in mind. |
Fixes #11337
Description
When UIA clients enumerate a virtualized ItemsControl (e.g. DataGrid), ElementProxy CCWs are created for each automation peer. Previously, when children were replaced (e.g. due to collection rebinding), the old peers were never disconnected from UIA Core. The COM references held by UIA kept the CCW ref count > 0, pinning the managed peers and their entire visual sub-trees in memory indefinitely.
This fix calls
UiaDisconnectProvideron removed children's ElementProxy CCWs duringUpdateChildrenInternal, causing UIA Core to release its COM references. This allows the CCW ref count to drop to zero so the managed peers can be garbage collected.Note: UIA clients may observe
ElementNotAvailableExceptionwhen accessing properties of stale elements after disconnection. This is standard UIA behavior that well-behaved clients already handle.Customer Impact
WPF applications using virtualized ItemsControl (e.g., DataGrid, ListView) that are exposed to UI Automation clients (screen readers, automated testing tools, accessibility inspectors) experience an unbounded memory leak of approximately 260 MB/min in the customer's repro scenario (200k-row DataGrid with periodic rebinding). The leak occurs because stale AutomationPeer instances are never disconnected from UIA Core - their ElementProxy CCW ref counts never reach zero, permanently pinning the managed peers and their visual sub-trees in memory. This leads to
OutOfMemoryExceptionand application crashes in long-running scenarios. Any application with accessibility enabled (which is the default when assistive technology or test automation is present) is affected.Regression
No.
Testing
Tested using the customer`s repro scenario:
Base line memory leak rate: +260 MB/min
With fix: +15.3 MB/min (94% reduction)
Risk
Low overall.
UpdateChildrenInternalinAutomationPeer.cs- a single code path that runs when automation children are replaced.UiaDisconnectProvideris the documented Windows API for this exact purpose — telling UIA Core to release COM references to providers that are no longer valid. This is the same mechanism used by other frameworks (e.g., WinForms) to manage automation peer lifetimes.UpdateChildrenInternaldetects removed children, which only happens when automation is active and the tree structure changes.FindAllcall may now receiveElementNotAvailableExceptionwhen accessing properties of disconnected elements. This is standard UIA contract behavior that well-behaved clients (including Narrator, NVDA, and JAWS) already handle. Poorly written test automation tools that do not catch this exception may surface errors they previously did not encounter.Microsoft Reviewers: Open in CodeFlow