Skip to content

Disconnect stale automation peers from UIA to fix memory leak in virtualized ItemsControls#11657

Open
amarinov-msft wants to merge 3 commits into
dotnet:mainfrom
amarinov-msft:amarinov-fix/automation-peer-disconnect-stale
Open

Disconnect stale automation peers from UIA to fix memory leak in virtualized ItemsControls#11657
amarinov-msft wants to merge 3 commits into
dotnet:mainfrom
amarinov-msft:amarinov-fix/automation-peer-disconnect-stale

Conversation

@amarinov-msft

@amarinov-msft amarinov-msft commented May 21, 2026

Copy link
Copy Markdown

Fixes #11337

Description

When UIA clients enumerate a virtualized ItemsControl (e.g. DataGrid), ElementProxy CCWs are created for each automation peer. Previously, when children were replaced (e.g. due to collection rebinding), the old peers were never disconnected from UIA Core. The COM references held by UIA kept the CCW ref count > 0, pinning the managed peers and their entire visual sub-trees in memory indefinitely.

This fix calls UiaDisconnectProvider on removed children's ElementProxy CCWs during UpdateChildrenInternal, causing UIA Core to release its COM references. This allows the CCW ref count to drop to zero so the managed peers can be garbage collected.

Note: UIA clients may observe ElementNotAvailableException when accessing properties of stale elements after disconnection. This is standard UIA behavior that well-behaved clients already handle.

Customer Impact

WPF applications using virtualized ItemsControl (e.g., DataGrid, ListView) that are exposed to UI Automation clients (screen readers, automated testing tools, accessibility inspectors) experience an unbounded memory leak of approximately 260 MB/min in the customer's repro scenario (200k-row DataGrid with periodic rebinding). The leak occurs because stale AutomationPeer instances are never disconnected from UIA Core - their ElementProxy CCW ref counts never reach zero, permanently pinning the managed peers and their visual sub-trees in memory. This leads to OutOfMemoryException and application crashes in long-running scenarios. Any application with accessibility enabled (which is the default when assistive technology or test automation is present) is affected.

Regression

No.

Testing

Tested using the customer`s repro scenario:
Base line memory leak rate: +260 MB/min
With fix: +15.3 MB/min (94% reduction)

Risk

Low overall.

  1. Narrow scope: The change affects only UpdateChildrenInternal in AutomationPeer.cs - a single code path that runs when automation children are replaced.
  2. Standard API usage: UiaDisconnectProvider is the documented Windows API for this exact purpose — telling UIA Core to release COM references to providers that are no longer valid. This is the same mechanism used by other frameworks (e.g., WinForms) to manage automation peer lifetimes.
  3. Non-recursive by design: The disconnect is intentionally non-recursive to avoid disconnecting shared container peers in virtualized controls that may have been recycled to serve new items.
  4. No behavioral change for apps without UIA clients: The disconnect logic only executes when UpdateChildrenInternal detects removed children, which only happens when automation is active and the tree structure changes.
  5. Expected UIA client-side behavior change: UIA clients that hold stale element references from a previous FindAll call may now receive ElementNotAvailableException when accessing properties of disconnected elements. This is standard UIA contract behavior that well-behaved clients (including Narrator, NVDA, and JAWS) already handle. Poorly written test automation tools that do not catch this exception may surface errors they previously did not encounter.
Microsoft Reviewers: Open in CodeFlow

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a memory leak in WPF UI Automation (UIA) for virtualized ItemsControl scenarios by explicitly disconnecting UIA providers for removed automation peers, allowing COM references held by UIA Core to be released and enabling GC of stale peers and their visual subtrees.

Changes:

  • Added a helper to disconnect an AutomationPeer’s ElementProxy from UIA via UiaDisconnectProvider.
  • Updated UpdateChildrenInternal to disconnect removed children even when no StructureChanged listeners are registered.
  • Added the required interop import to call into UIAutomationCore.dll.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1838 to +1843
if (proxyWeakRef?.Target is ElementProxy proxy)
{
UiaDisconnectProvider(proxy);
}

peer._elementProxyWeakReference = null;
…ualized ItemsControls

When a UIA client enumerates automation peers via FindAll, ElementProxy CCWs
are created but never released, pinning managed peers and entire visual
sub-trees in memory.  This change disconnects removed peers in both
UpdateChildrenInternal and EnsureChildren, breaking the reference chain so
containers can be GC'd.

Key changes:
- DisconnectPeerFromUia defers UiaDisconnectProvider via Dispatcher.BeginInvoke
  to avoid calling it during UIA callbacks (re-entrancy unsafe).
- proxy.ClearPeer() severs the ElementProxy->peer strong reference immediately,
  allowing the peer to become GC-eligible before the deferred call executes.
- peer._children is nulled to break the chain from disconnected item peers to
  cached cell peers that root DataGridCell/DataGridRow containers.
- EnsureChildren disconnects stale peers gated on _elementProxyWeakReference,
  making it allocation-free when no UIA client has walked the subtree.

Fixes dotnet#11337

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@amarinov-msft amarinov-msft force-pushed the amarinov-fix/automation-peer-disconnect-stale branch from 03cdc2d to 65afa49 Compare June 16, 2026 19:24
@amarinov-msft amarinov-msft requested a review from a team as a code owner June 16, 2026 19:24
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@h3xds1nz h3xds1nz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amarinov-msft I'm really worried about perf with these changes. UIA implementation itself is already a really heavy mess (things like #10676 and more), and this will further increase the weight (an async DispatcherOperation for every disconnected peer), further enumeration over the intermediate collections.

Now, I don't have an exact solution here since in our wpf fork I've reworked this extensively but I think it's worth a further investigation and profiling to fix this.

if (dispatcher != null && !dispatcher.HasShutdownStarted)
{
dispatcher.BeginInvoke(DispatcherPriority.Background,
new Action(() => UiaDisconnectProvider(proxy)));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a closure unnecessarily and a new action everytime. Ideally use the parameter variant, it's an object you're passing so no boxing here.

// UiaDisconnectProvider must NOT be called during a UIA callback (e.g. during Navigate/FindAll handling).
// UpdateChildrenInternal is invoked from within UIA callbacks, so defer the actual COM disconnect to a separate dispatcher operation.
Dispatcher dispatcher = peer.Dispatcher;
if (dispatcher != null && !dispatcher.HasShutdownStarted)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is even required/true; the UIA callbacks should already run on the Dispatcher synchronously iirc, so that check seems redundant. If invoked from proxy directly, this would already throw if it was the case.

/// </remarks>
private static void DisconnectPeerFromUia(AutomationPeer peer)
{
if (peer == null)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is redundant since you're already checking in the calling function.

}

[DllImport("UIAutomationCore.dll", EntryPoint = "UiaDisconnectProvider", CharSet = CharSet.Unicode)]
private static extern int UiaDisconnectProvider(IRawElementProviderSimple provider);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be defined in this file.

// This must happen regardless of StructureChanged event registration.
if (removedCount > 0)
{
foreach (AutomationPeer removedChild in hs)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a sigh for another PR but this whole thing should be turned into a simple merge walk to differentiate old and new and avoid the HashSet allocations altogether. Sometimes this is tens of MBs for something that could have been a simple loop.

The least that can be done here is to repeat this loop over the HashSet only once; by either doing all the StructureChanged stuff or not.

if (old._elementProxyWeakReference == null)
continue;

if (_children == null || !_children.Contains(old))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically O(n*m), if this is a top automation peer, it's gonna be pretty heavy.

- Cache static DispatcherOperationCallback; drop closure/Action allocation per call.
- Remove redundant Dispatcher null/HasShutdownStarted guard (matches InvalidatePeer pattern).
- Remove redundant peer null check inside DisconnectPeerFromUia (callers guarantee non-null).
- Move UiaDisconnectProvider P/Invoke onto ElementProxy as a single Disconnect() method; collapse ClearPeer() into it. Use DllImport.UIAutomationCore string constant per codebase convention.
- Build HashSet of new children lazily in EnsureChildren to avoid O(n*m) Contains scan; common path stays allocation-free.
- Drop redundant disconnect loop from UpdateChildrenInternal (EnsureChildren already handles it on this path); revert the merged loop back to events-only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@amarinov-msft

Copy link
Copy Markdown
Author

@h3xds1nz thank you for your review. I tried addressing your comments. While the fundamental question about this issue is resolved (performance vs behavior), please check the updates when you have the opportunity and let me know if they are in the direction that you had in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Included in test pass PR metadata: Label to tag PRs, to facilitate with triage Status:Completed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WPF leaks ElementProxy instances when UI Automation is used

3 participants