[fix] Fix race condition in BlameCollector: skip hang dump when testhost hasn't launched yet#16065
[fix] Fix race condition in BlameCollector: skip hang dump when testhost hasn't launched yet#16065nohwnd wants to merge 9 commits into
Conversation
When the inactivity timer fires before the testhost process has launched, _testHostProcessId is 0 (default int). Previously this caused ProcessDumpUtility.StartHangBasedProcessDump to attempt to dump PID 0 (the Idle process on Windows / Swapper on Linux), resulting in an empty or incorrect dump file. The fix adds an early-return guard in CollectDumpAndAbortTesthost: if _testHostProcessId == 0, log a warning and skip the dump/kill. Also updates three existing hang dump unit tests to properly simulate the happy-path scenario (testhost launches before the timer fires) by: - Using a 50 ms timeout instead of 0 ms so the TestHostLaunched event can be raised before the timer callback runs - Raising TestHostLaunched with PID 1234 before the timer fires Adds a new test that verifies StartHangBasedProcessDump is NOT called when the timer fires before TestHostLaunched. Fixes #15588 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a race in BlameCollector hang-dump collection where the inactivity timer can fire before TestHostLaunched sets the testhost PID, causing the collector to attempt a dump/kill against PID 0. The PR adds a guard to skip hang-dump collection when the testhost hasn’t launched yet, and updates unit tests/localized resources accordingly.
Changes:
- Add early-return in
CollectDumpAndAbortTesthostwhen_testHostProcessId == 0, with warning logs. - Make hang-dump unit tests deterministic by avoiding
TimeSpan.Zeroin several tests and explicitly raisingTestHostLaunched. - Add localized resource string
TestHostNotLaunchedCannotCollectHangDumpacross.resx, designer, and all 13.xlffiles.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Microsoft.TestPlatform.Extensions.BlameDataCollector.UnitTests/BlameCollectorTests.cs | Updates hang-dump timer tests and adds a regression test for “timer fires before testhost launched”. |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/BlameCollector.cs | Skips hang-dump collection when testhost PID is not yet known (PID 0). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/Resources.resx | Adds new resource string for the “testhost not launched” warning. |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/Resources.Designer.cs | Adds strongly-typed accessor for the new resource string. |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.cs.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.de.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.es.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.fr.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.it.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.ja.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.ko.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.pl.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.pt-BR.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.ru.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.tr.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.zh-Hans.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
| src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/xlf/Resources.zh-Hant.xlf | Adds TestHostNotLaunchedCannotCollectHangDump localization entry (state=new). |
Files not reviewed (1)
- src/Microsoft.TestPlatform.Extensions.BlameDataCollector/Resources/Resources.Designer.cs: Language not supported
| // Do NOT raise TestHostLaunched — _testHostProcessId stays 0. | ||
| warningLogged.Wait(1000, TestContext.CancellationToken); | ||
|
|
There was a problem hiding this comment.
Moot — reworked the fix to not start the timer until testhost launches. The PID==0 guard and warning log are gone; the test now verifies the timer never fires when testhost hasn't launched (no signal to wait for).
|
|
||
| _blameDataCollector.Initialize( | ||
| GetDumpConfigurationElement(false, false, true, 0), | ||
| GetDumpConfigurationElement(false, false, true, 50), |
There was a problem hiding this comment.
Moot — with the timer starting only after TestHostLaunched, there's no race between the timer and the PID assignment. The 50ms timeout is fine.
|
|
||
| _blameDataCollector.Initialize( | ||
| GetDumpConfigurationElement(false, false, true, 0), | ||
| GetDumpConfigurationElement(false, false, true, 50), |
There was a problem hiding this comment.
Same as above — no race with the new approach.
|
|
||
| _blameDataCollector.Initialize( | ||
| GetDumpConfigurationElement(false, false, true, 0), | ||
| GetDumpConfigurationElement(false, false, true, 50), |
There was a problem hiding this comment.
Same as above — no race with the new approach.
nohwnd
left a comment
There was a problem hiding this comment.
Review: Fix race condition in BlameCollector
Dimensions activated: Crash & Hang Dump Reliability · Error Reporting & Diagnostic Clarity · Parallel Execution & Scheduling Safety
Summary
The fix is correct and well-placed. Setting _inactivityTimerAlreadyFired = true before the guard ensures the timer is permanently defused regardless of the early-return path, and the test coverage faithfully exercises the new code path with TimeSpan.Zero.
One finding: _testHostProcessId is not volatile. The new guard makes this field load-bearing across a thread boundary for the first time — the timer thread now returns early based on reading 0. On x86/x64 this is unlikely to matter in practice, but volatile int is a one-liner fix that makes the cross-thread contract explicit and covers ARM64. See the inline comment for details.
Description alignment
PR description matches the diff exactly. Localization coverage (13 XLF files + resx + Resources.Designer.cs) is complete.
🧠 Reviewed by Expert Code Reviewer
🧠 Reviewed by Expert Code Reviewer 🧠
| } | ||
|
|
||
| // If testhost has not launched yet, we cannot dump or kill it. | ||
| if (_testHostProcessId == 0) |
There was a problem hiding this comment.
[Crash & Hang Dump Reliability / Parallel Execution & Scheduling Safety] — _testHostProcessId is not volatile
_testHostProcessId is written on the data-collection events thread in TestHostLaunchedHandler and now load-bearing read here on the timer-callback thread. The C# memory model does not guarantee that a plain int write on one thread is visible as a non-stale read on another without a memory barrier.
Without volatile (or Interlocked), the JIT is allowed to cache the 0 in a register and satisfy this guard with a stale value, causing the guard to fire and silently skip the hang dump even when testhost has already launched. On x86/x64 (TSO) this is unlikely to manifest, but it is technically a data race and can bite on ARM64 CI agents.
The field was already read without synchronization at lines 261, 271, 319, and 627 (pre-existing), but this new guard is the first place where the value of 0 has a meaningful semantic effect (early return vs. proceeding). Marking the field volatile would make the cross-thread visibility guarantee explicit and eliminate the theoretical false-positive:
// before
private int _testHostProcessId;
// after
private volatile int _testHostProcessId;volatile int is safe here — the field is only ever assigned once (in TestHostLaunchedHandler) and only read elsewhere, so no compare-and-swap semantics are needed.
There was a problem hiding this comment.
Done — marked _testHostProcessId as volatile to make the cross-thread visibility guarantee explicit and eliminate the theoretical false-positive on weakly-ordered CPUs (ARM64).
🔧 Iterated by PR Iteration Agent 🔧
There was a problem hiding this comment.
would use interlocked instead as that is more common, and less scary.
There was a problem hiding this comment.
Went a step further — eliminated the cross-thread concern entirely. Timer now starts only after TestHostLaunchedHandler sets the PID, so the field is always written before any timer callback reads it. Reverted to plain int.
Ensures the hang-dump guard added in the parent commit sees a fresh value on the timer-callback thread (ARM64 / weakly-ordered CPUs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Commit pushed:
|
Instead of guarding against PID 0 in the timer callback, don't start the inactivity timer until TestHostLaunched fires. This eliminates the race at the source — the timer can never fire before we know which process to dump. - Remove ResetInactivityTimer() call from Initialize; timer is created but not started until TestHostLaunchedHandler sets the PID and calls ResetInactivityTimer. - Swap order in TestHostLaunchedHandler: set PID before starting timer. - Remove PID==0 guard and resource string (no longer needed). - Revert volatile — no cross-thread visibility concern with proper ordering. - Update tests to reflect that timer starts on TestHostLaunched, not Initialize. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Poisons runtimeconfig.json with net9999.0 so testhost can't launch, then runs with CollectHangDump. Verifies vstest exits cleanly without hanging or producing a dump against PID 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| if (_collectProcessDumpOnHang) | ||
| { | ||
| _inactivityTimer ??= new InactivityTimer(CollectDumpAndAbortTesthost); | ||
| ResetInactivityTimer(); | ||
| // Don't start the timer here — wait until TestHostLaunched so we know | ||
| // which process to dump. ResetInactivityTimer is called from TestHostLaunchedHandler. | ||
| } |
| [NetCoreTargetFrameworkDataSource] | ||
| public void HangDumpShouldNotHangWhenTestHostFailsToStart(RunnerInfo runnerInfo) | ||
| { | ||
| // When testhost can't start (e.g. wrong runtime version), the inactivity timer | ||
| // must not try to dump PID 0. It should exit cleanly because the timer only starts | ||
| // after TestHostLaunched, which never fires here. | ||
| SetTestEnvironment(_testEnvironment, runnerInfo); | ||
| const string testAssetProjectName = "SimpleTestProjectMessedUpTargetFramework"; | ||
| var assemblyPath = GetTestDllForFramework(testAssetProjectName + ".dll", Core11TargetFramework); | ||
|
|
||
| // Make testhost fail immediately by targeting a non-existent runtime. | ||
| var runtimeConfigJson = Path.Combine(Path.GetDirectoryName(assemblyPath)!, testAssetProjectName + ".runtimeconfig.json"); | ||
| var fileContent = File.ReadAllText(runtimeConfigJson); | ||
| var updatedContent = fileContent.Replace("\"version\": \"11.0.0", "\"version\": \"9999.0.0"); | ||
| File.WriteAllText(runtimeConfigJson, updatedContent); | ||
|
|
||
| var arguments = PrepareArguments(assemblyPath, GetTestAdapterPath(), string.Empty, string.Empty, runnerInfo.InIsolationValue); | ||
| arguments = string.Concat(arguments, $" /ResultsDirectory:{TempDirectory.Path}"); | ||
| arguments = string.Concat(arguments, $@" /Blame:""CollectHangDump;HangDumpType=mini;TestTimeout=30s"""); | ||
| InvokeVsTest(arguments); | ||
|
|
||
| // vstest should exit with failure (testhost didn't start), but not hang and not crash. | ||
| ExitCodeEquals(1); | ||
| Assert.DoesNotContain(".dmp", StdOut, "no dump should be collected when testhost never launched"); | ||
| } |
| // Do NOT raise TestHostLaunched — timer should never start. | ||
| // Wait long enough that a started timer with timeout 0 would have fired. | ||
| Thread.Sleep(100); |
| @@ -1,4 +1,4 @@ | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| @@ -1,4 +1,4 @@ | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| @@ -1,4 +1,4 @@ | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| @@ -1,4 +1,4 @@ | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| @@ -1,4 +1,4 @@ | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| @@ -1,4 +1,4 @@ | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| @@ -1,4 +1,4 @@ | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
nohwnd
left a comment
There was a problem hiding this comment.
Review: Fix race condition in BlameCollector (rework)
Dimensions checked: Crash & Hang Dump Reliability · Parallel Execution & Scheduling Safety · Algorithmic Correctness · Resource Management
Summary
The rework is a clear improvement over the previous iteration. The structural fix — don't start the timer until TestHostLaunchedHandler fires — completely eliminates the data-race concern that motivated the earlier volatile suggestion. Once the timer can only fire after _testHostProcessId is written (on the same thread), the System.Threading.Timer ThreadPool dispatch provides an implicit memory barrier, so plain int is correct here.
Specific code changes:
Initialize: Timer is created (??=) but no longer started — right fix, right place.TestHostLaunchedHandler: PID assigned beforeResetInactivityTimer()— the invariant is now enforced by construction.CollectDumpAndAbortTesthost:TPDebug.Assert(_testHostProcessId != 0, ...)documents the invariant in debug builds. Any path that somehow bypassed the architectural guarantee would still hit the try/catch aroundStartHangBasedProcessDumpin release, so the fallback handling is adequate.- XLF files: The 13 xlf changes are a BOM/XML-declaration normalisation — no new resource strings were added, consistent with the absence of a
.resxdiff.
Previous finding — resolved ✅
The volatile int _testHostProcessId finding from the prior review is fully resolved. The architectural fix is the right call — Interlocked/volatile would have been patch-over-the-symptom; not starting the timer early removes the cross-thread read at the source.
Description alignment
PR description matches the diff exactly. Localization coverage (13 XLF files, BOM-only change with no new strings) is consistent with the resx being unmodified.
🧠 Reviewed by Expert Code Reviewer 🧠
🧠 Reviewed by Expert Code Reviewer 🧠
The test mutates SimpleTestProjectMessedUpTargetFramework's runtimeconfig.json, which is shared with ProcessesInteractionTests. Wrap in try/finally to restore the original content. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
nohwnd
left a comment
There was a problem hiding this comment.
Review: Fix race condition in BlameCollector (follow-up — test commits)
Dimensions checked: Crash & Hang Dump Reliability · Algorithmic Correctness · Resource Management
New commits since last review (14:02 UTC)
Two test-only commits were pushed after the previous review:
-
Unit test additions (
InitializeWithDumpForHangShouldNotStartTimerIfTestHostHasNotLaunchedYet): Directly verifies the new invariant — timer does not fire whenTestHostLaunchedis never raised. Timeout changed from0→50msin the three existing timing-sensitive tests; this provides sufficient margin forThread.Sleep(100)to distinguish a correctly dormant timer from one that actually fires. ✅ -
Integration test (
HangDumpShouldNotHangWhenTestHostFailsToStart): Poisonsruntimeconfig.jsonwith"9999.0.0"to prevent testhost from launching, then asserts vstest exits with code 1 and emits no.dmpfile. The final commit wraps the file mutation intry/finallyto restore the original content — correct and necessary sinceSimpleTestProjectMessedUpTargetFrameworkis shared withProcessesInteractionTests. TheAssert.DoesNotContain(".dmp", StdOut, ...)parameter order is correct (needle first). ✅
Previously reviewed findings — still resolved ✅
The architectural fix (timer only starts in TestHostLaunchedHandler after PID is set) and the TPDebug.Assert(_testHostProcessId != 0, ...) invariant guard were both confirmed in the prior review. Nothing in the new commits regresses those findings.
🧠 Reviewed by Expert Code Reviewer 🧠
🧠 Reviewed by Expert Code Reviewer 🧠
|
macOS leg timed out after 1h (infrastructure). Windows, Ubuntu, and Source-Build all passed. Ready to retry or merge — the fix is Windows-specific (hang dump with PID 0). |
Reviewer preference: use Interlocked for cross-thread visibility instead of plain int assignment, as it is more common and explicit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Commit pushed:
|
| if (_collectProcessDumpOnHang) | ||
| { | ||
| _inactivityTimer ??= new InactivityTimer(CollectDumpAndAbortTesthost); | ||
| ResetInactivityTimer(); | ||
| // Don't start the timer here — wait until TestHostLaunched so we know | ||
| // which process to dump. ResetInactivityTimer is called from TestHostLaunchedHandler. | ||
| } |
| // If testhost has not launched yet, the timer should not have been started, | ||
| // so this callback should not fire. Assert to catch unexpected paths. | ||
| TPDebug.Assert(_testHostProcessId != 0, "CollectDumpAndAbortTesthost called but testhost has not launched yet."); | ||
|
|
| [TestMethod] | ||
| [NetCoreTargetFrameworkDataSource] | ||
| public void HangDumpShouldNotHangWhenTestHostFailsToStart(RunnerInfo runnerInfo) |
| @@ -1,4 +1,4 @@ | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
| <?xml version="1.0" encoding="utf-8"?> | |||
nohwnd
left a comment
There was a problem hiding this comment.
Review: Fix race condition in BlameCollector (follow-up — Interlocked.Exchange commit)
Dimensions checked: Parallel Execution & Scheduling Safety · Algorithmic Correctness
New commit since last review (2026-05-26T14:37Z)
Interlocked.Exchange(ref _testHostProcessId, args.TestHostProcessId) — replaces the plain int assignment in TestHostLaunchedHandler.
This is correct and clean. The architectural guarantee (timer cannot fire until ResetInactivityTimer() is called, which happens on the next line after the Exchange) already ensures the timer thread will always observe a non-zero PID. Interlocked.Exchange makes the full memory fence explicit and is a reasonable stylistic choice for cross-thread writes, consistent with the reviewer's preference.
No new issues introduced. All previously reviewed findings remain resolved. ✅
🧠 Reviewed by Expert Code Reviewer 🧠
🧠 Reviewed by Expert Code Reviewer 🧠
| var originalContent = File.ReadAllText(runtimeConfigJson); | ||
| try | ||
| { | ||
| var updatedContent = originalContent.Replace("\"version\": \"11.0.0", "\"version\": \"9999.0.0"); |
There was a problem hiding this comment.
regex, so we don't have to update every time we update runtime.
Make the test do-not-parallelize, just so we avoid running in conflict with other possible test that edits the project.
There was a problem hiding this comment.
Done — the test now uses Regex.Replace to match any "version": "x.y.z" pattern (so the test won't need updating as the runtime version changes), and the [DoNotParallelize] attribute was added to avoid conflicts with other tests that might read the same runtimeconfig.json file on disk.
🔧 Iterated by PR Iteration Agent 🔧
| InvokeVsTest(arguments); | ||
|
|
||
| // vstest should exit with failure (testhost didn't start), but not hang and not crash. | ||
| ExitCodeEquals(1); |
There was a problem hiding this comment.
try ensuring harder the crash was because we did not find the runtime.
There was a problem hiding this comment.
Done — added Assert.MatchesRegex after ExitCodeEquals(1) that checks the combined stdout+stderr for the runtime-not-found message (matches "framework.*9999.0.0", "9999.0.0.*not found", or "compatible framework version"). This verifies the failure was specifically due to the missing runtime, not some other error.
🔧 Iterated by PR Iteration Agent 🔧
… runtime-not-found error Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Commit pushed:
|
nohwnd
left a comment
There was a problem hiding this comment.
Review: Fix race condition in BlameCollector (follow-up — test hardening commit)
Dimensions checked: Algorithmic Correctness · Crash & Hang Dump Reliability · Resource Management
New commit since last review (2026-05-28T07:19Z)
test: use regex for version replacement, add DoNotParallelize, assert runtime-not-found error (a30e0c2) — test-only change to the integration test.
Three improvements to HangDumpShouldNotHangWhenTestHostFailsToStart:
-
Regex.Replacefor version patching — replaces the prior literal string substitution withRegex.Replace(content, @"""version""\s*:\s*""[\d.]+""", ...). More robust against whitespace variations in the JSON. ✅ -
[DoNotParallelize]— ensures the runtimeconfig.json mutation is not interleaved with other tests that use the sameSimpleTestProjectMessedUpTargetFrameworkasset. ✅ -
Runtime-not-found assertion —
Assert.MatchesRegex(new Regex(@"framework.*9999\.0\.0|...", IgnoreCase), StdOut + StdErr, ...)verifies the failure is specifically "runtime not found", not some other exit-code-1 scenario. Parameter order (pattern first, haystack second) is correct. ✅
Assert.DoesNotContain(".dmp", StdOut, ...) parameter order (needle first) remains correct from the prior commit. ✅
Previously reviewed findings — still resolved ✅
No regressions to prior findings. All architecture invariants (timer only starts in TestHostLaunchedHandler, Interlocked.Exchange for PID write) remain intact.
🧠 Reviewed by Expert Code Reviewer 🧠
🧠 Reviewed by Expert Code Reviewer 🧠
🧠 Reviewed by Expert Code Reviewer 🧠
Use the same assertion approach as ProcessesInteractionTests which is proven to work on Windows. The previous Assert.MatchesRegex on combined StdOut+StdErr may fail due to multiline regex behavior differences. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Commit pushed:
|
|
Fixed the CI assertion in The previous The multiline hostfxr error message (containing
|
nohwnd
left a comment
There was a problem hiding this comment.
Review: Fix race condition in BlameCollector (follow-up — StdError assertion commit)
Dimensions checked: Algorithmic Correctness · Crash & Hang Dump Reliability
New commit since last review (2026-05-29T15:59Z)
test: use StdErrorRegexIsMatch for runtime-not-found assertion (5895ebd) — test-only, one-line change.
Replaces Assert.MatchesRegex(new Regex(@"framework.*9999\.0\.0|...", IgnoreCase), StdOut + StdErr, ...) with StdErrorRegexIsMatch("9999\\.0\\.0"). This aligns with the pattern already used in ProcessesInteractionTests, which is proven to work on Windows. The multiline regex over a combined string was a potential source of cross-platform regex behavior differences; the targeted StdErrorRegexIsMatch helper is cleaner and more reliable. ✅
Previously reviewed findings — still resolved ✅
All prior findings (architectural timer fix, Interlocked.Exchange, try/finally restore, [DoNotParallelize], Regex.Replace version patching) remain intact and unaffected.
🧠 Reviewed by Expert Code Reviewer 🧠
🧠 Reviewed by Expert Code Reviewer 🧠
| // If testhost has not launched yet, the timer should not have been started, | ||
| // so this callback should not fire. Assert to catch unexpected paths. | ||
| TPDebug.Assert(_testHostProcessId != 0, "CollectDumpAndAbortTesthost called but testhost has not launched yet."); | ||
|
|
| var updatedContent = Regex.Replace(originalContent, @"""version""\s*:\s*""[\d.]+""", @"""version"": ""9999.0.0"""); | ||
| File.WriteAllText(runtimeConfigJson, updatedContent); | ||
|
|
| // Don't start the timer here — wait until TestHostLaunched so we know | ||
| // which process to dump. ResetInactivityTimer is called from TestHostLaunchedHandler. | ||
| } |
The .NET runtime-not-found error message can appear in either stdout or stderr depending on the platform and vstest host configuration. Check both streams to make the assertion robust across environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Commit pushed:
|
nohwnd
left a comment
There was a problem hiding this comment.
Review: Fix race condition in BlameCollector (follow-up — check both stdout and stderr)
Dimensions checked: Algorithmic Correctness · Crash & Hang Dump Reliability
New commit since last review (2026-05-31T06:17Z)
test: check both stdout and stderr for runtime-not-found assertion (ecb07c8) — test-only, replaces StdErrorRegexIsMatch("9999\\.0\\.0") with:
Assert.IsTrue(
Regex.IsMatch(StdErr, "9999\\.0\\.0") || Regex.IsMatch(StdOut, "9999\\.0\\.0"),
$"Expected '9999.0.0' in output.\nStdErr: {StdErr}\nStdOut: {StdOut}");This is correct and a genuine improvement for cross-platform robustness. On some platforms/host configurations the .NET runtime-not-found error surfaces in stdout rather than stderr. The || guards both cases. The regex pattern is properly escaped (\\.), the parameter orders to Regex.IsMatch are correct (pattern second), and the diagnostic message includes both streams for debugging failed CI runs. ✅
Previously reviewed findings — still resolved ✅
All prior findings (architectural timer fix, Interlocked.Exchange, try/finally restore, [DoNotParallelize], Regex.Replace version patching) remain intact and unaffected.
🧠 Reviewed by Expert Code Reviewer 🧠
🧠 Reviewed by Expert Code Reviewer 🧠
Fix #15588
The inactivity timer in
BlameCollectorstarts duringInitialize, but_testHostProcessIdis only set later whenTestHostLaunchedHandlerfires. If the timer fires before testhost launches (short timeout, slow start, or in unit tests withTimeSpan.Zero), we try to dump PID 0 — the Windows Idle process.The fix: don't start the timer until testhost actually launches.
Initializecreates the timer,TestHostLaunchedHandlersets the PID and then starts it. The timer can never fire before we know who to dump.ResetInactivityTimer()fromInitializeTestHostLaunchedHandler: set PID before starting timerruntimeconfig.jsonwithnet9999.0so testhost fails to start, then runs withCollectHangDump— verifies vstest exits cleanly without hanging or dumping PID 0TestHostLaunched, notInitialize