Fix flaky CanRunOnIdleTask by polling instead of sleeping#2314
Open
andyleejordan wants to merge 1 commit into
Open
Fix flaky CanRunOnIdleTask by polling instead of sleeping#2314andyleejordan wants to merge 1 commit into
CanRunOnIdleTask by polling instead of sleeping#2314andyleejordan wants to merge 1 commit into
Conversation
`CanRunOnIdleTask` (and its twin `CanRunOnIdleInProfileTask`) were flaky on the net462 (Windows PowerShell 5.1) CI leg — the former was just caught failing on PR #2298's Windows job. The root cause is that `PsesInternalHost.OnPowerShellIdle` calls `Events.GenerateEvent(PSEngineEvent.OnIdle, ...)`, which only *enqueues* the event. For a subscriber registered with `-Action {...}`, PowerShell doesn't run the action scriptblock inline; it becomes a pending action that the engine dispatches asynchronously on the pipeline thread, around subsequent pipeline invocations. So the action's execution was never synchronized with the test's `$handled` read, and the fixed `Thread.Sleep(2000)` was just a timing guess — sometimes too short on the slower WinPS leg, leaving `$global:handled` still `$false` at the assertion. The key realization is that each *additional* pipeline execution gives the engine another chance to drain the pending action, so re-reading the handler variable in a loop both waits for *and* drives completion. I replaced the sleep with a shared `WaitForHandledAsync` helper that polls the variable (~200ms apart, ~15s ceiling) until it reports `$true`, returning the last observed value on timeout so the assertion still fails loudly. This keeps the tests' intent intact and isn't merely a longer sleep. I validated both tests on net8.0 (green across repeated runs, ~0.4s each vs. the old fixed 2s); net462 can't run on macOS, but the mechanism is identical across targets and the 15s ceiling self-terminates on success, so it's strictly safer on the slow leg without slowing the fast one. Drafted by Copilot (Claude Opus 4.8). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes flaky CanRunOnIdleTask and CanRunOnIdleInProfileTask tests that intermittently failed on the net462 (Windows PowerShell 5.1) CI leg. The root cause was a hard-coded Thread.Sleep(2000) that was insufficiently long for the asynchronous event dispatch on slower runners. The fix replaces both sleeps with a shared polling helper that actively drives event processing while waiting.
Changes:
- Adds a new
OnIdleTestHelpers.WaitForHandledAsyncstatic helper that polls a PowerShell handler variable viaExecutePSCommandAsync<bool>in a loop (200 ms between polls, 15 s timeout ceiling), returning the last observed value on timeout so assertions still fail loudly. - Replaces the
Thread.Sleep(2000)+ manualExecutePSCommandAsyncread in bothCanRunOnIdleTaskandCanRunOnIdleInProfileTaskwith calls to the new helper, also removing the// TODO: Why is this racy?comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CanRunOnIdleTask(and its twinCanRunOnIdleInProfileTask) inPsesInternalHostTests.cswere intermittently failing on the net462 (WindowsPowerShell 5.1) CI leg.
CanRunOnIdleTaskwas just caught failing on PR #2298'sWindows job (
Assert.Collection() … Assert.False/Assert.Trueat line ~200). Bothtests carried a
// TODO: Why is this racy?above a hard-codedThread.Sleep(2000).Root cause
PsesInternalHost.OnPowerShellIdlecalls_mainRunspaceEngineIntrinsics.Events.GenerateEvent(PSEngineEvent.OnIdle, ...),which only enqueues the OnIdle event. For a subscriber registered with
-Action { ... }(exactly what these tests register), PowerShell does not runthe action scriptblock inline at
GenerateEventtime — it becomes a pendingaction that the engine's event manager dispatches asynchronously, on the
pipeline thread, at the next process-pending-actions point (around subsequent
pipeline invocations).
OnPowerShellIdlethen runs a tiny artificial pipeline(
…param() 0) to nudge event processing, but the action's actual execution isnever synchronized with that pipeline returning or with the test's later
$handledread.So the fixed
Thread.Sleep(2000)was only a timing guess. On a fast runner theaction finishes first; on the slower WinPS leg 2 seconds is sometimes not enough,
leaving
$global:handledstill$falseat the assertion — hence the intermittentfailure.
The key realization: each additional pipeline execution gives the engine
another chance to drain the pending action, so re-reading the handler variable in
a loop both waits for and drives completion. That makes polling a real
deterministic fix rather than just a longer sleep.
Change
WaitForHandledAsync(psesHost, variableName)helper that polls thehandler variable via
ExecutePSCommandAsync<bool>until it reports$true(~200 ms between polls, ~15 s ceiling via
CancellationTokenSource). On timeoutit returns the last observed value so the existing
Assert.Collection(handled, Assert.True)still fails loudly.CanRunOnIdleTask($handled) andCanRunOnIdleInProfileTask(
$handledInProfile), replacing bothThread.Sleep(2000)calls and removing the// TODO: Why is this racy?comments. Registration, the pre-assert(
Assert.False), and theOnPowerShellIdledelegate call are unchanged.No production code or test-profile script changes — test file only.
Validation
Invoke-Build TestPS74 -TestFilter 'FullyQualifiedName~CanRunOnIdle'): green across repeated runs, ~0.4 s each vs.the old fixed 2 s.
targets — only latency differs. The 15 s ceiling is ~7.5× the old 2 s window and
self-terminates on success, so it's strictly safer on the slow leg without
slowing the fast one.