Handle ServerNotAvailableException in CollectLinuxCommand process probing#5705
Handle ServerNotAvailableException in CollectLinuxCommand process probing#5705
Conversation
8fe26f7 to
fe76ee6
Compare
…lpers Add UserEventsProbeResult enum (Supported/NotSupported) to replace boolean return. Introduce ProbeProcess helper for probing a single process. Add GetAndProbeAllProcesses helper that enumerates and probes all published processes. Update callers in CollectLinux and SupportsCollectLinux to use new helpers. Update BuildProcessSupportCsv to use UserEventsProbeResult enum.
…cess probing Add ProcessNotFound and ConnectionFailed values to UserEventsProbeResult enum. Update ProbeProcess to catch DiagnosticToolException (process resolution failed) and ServerNotAvailableException (diagnostic endpoint not accessible) separately. Add FormatProcessIdentifier helper for clean display of process ID/name. Add unknownProcesses/unknownCsv tracking for processes that could not be probed. Update probe mode output to show 'Processes that could not be probed' section. Include 'unknown' value in CSV output for unprobed processes. Update non-probe mode to show distinct errors for each failure type. Change '.NET process' to 'Process' in messages since arbitrary PIDs may not be .NET. Fixes #5694
Document that results are categorized as supported, not supported, or unknown. Clarify that unknown status occurs when diagnostic endpoint is not accessible.
…iled handling Update test expectations to match new behavior: - Add FormatProcessNotFoundError and FormatProcessIdentifier helpers - Update ResolveProcessExceptions test data for ProcessNotFound handling - Update probe error test cases for process resolution errors - Tests now expect ReturnCode.TracingError for failures in non-probe mode - Tests expect ReturnCode.Ok for probe mode with informational output
fe76ee6 to
0cbcf44
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves dotnet-trace collect-linux resilience by handling process-resolution and diagnostics-connection failures during “process probing” so the command no longer crashes when a target process can’t be resolved or connected to (e.g., exits between enumeration and probing, cross-container endpoint issues).
Changes:
- Replaced boolean “supports” probing with a 4-state probe result (Supported/NotSupported/ProcessNotFound/ConnectionFailed) and updated user-facing output.
- Updated machine-wide probe to track and report “unknown/unprobed” processes and emit
unknownin CSV. - Adjusted functional tests to match new probe behaviors/messages (partially—some existing expectations still appear outdated).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
src/Tools/dotnet-trace/CommandLine/Commands/CollectLinuxCommand.cs |
Introduces multi-state probing, catches DiagnosticToolException/ServerNotAvailableException, updates probe messaging and CSV output. |
src/tests/dotnet-trace/CollectLinuxCommandFunctionalTests.cs |
Updates/extends tests for new probe outcomes and adds helpers for the new process identifier/message formatting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return (int)ReturnCode.TracingError; | ||
| case UserEventsProbeResult.ProcessNotFound: | ||
| Console.Error.WriteLine($"[ERROR] Could not resolve process '{FormatProcessIdentifier(resolvedProcessId, resolvedProcessName)}'."); | ||
| return (int)ReturnCode.TracingError; |
There was a problem hiding this comment.
CollectLinux now maps ResolveProcess failures (invalid PID/name, both PID and name, etc.) to ReturnCode.TracingError and a generic "Could not resolve process ..." message. This is a behavioral change from the previous DiagnosticToolException flow (which returned ArgumentError with a more specific message) and may break callers/scripts relying on ArgumentError for invalid input. Consider preserving DiagnosticToolException.ReturnCode (or at least returning ArgumentError for invalid arguments like negative PID / both options) while still handling ServerNotAvailableException gracefully.
| return (int)ReturnCode.TracingError; | |
| return (int)ReturnCode.ArgumentError; |
| catch (DiagnosticToolException) | ||
| { | ||
| resolvedPid = originalPid; | ||
| resolvedName = originalName; |
There was a problem hiding this comment.
ProbeProcess catches DiagnosticToolException from CommandUtils.ResolveProcess and always returns ProcessNotFound, which replaces specific, actionable error messages like "Only one of the --name or --process-id options may be specified." with the generic "Could not resolve process ...". Consider distinguishing argument-validation failures from true "process not found" (e.g., by surfacing the original DiagnosticToolException message/ReturnCode for invalid option combinations).
| catch (DiagnosticToolException) | |
| { | |
| resolvedPid = originalPid; | |
| resolvedName = originalName; | |
| catch (DiagnosticToolException ex) | |
| { | |
| // Restore original values in case ResolveProcess mutated the out parameters | |
| resolvedPid = originalPid; | |
| resolvedName = originalName; | |
| // Distinguish argument-validation failures from true "process not found" cases. | |
| // For argument/command-line errors, rethrow so the caller can surface the | |
| // specific, actionable message and return code. | |
| if (ex.ReturnCode == ReturnCode.InvalidArgument || | |
| ex.ReturnCode == ReturnCode.CommandLineError) | |
| { | |
| throw; | |
| } | |
| // For all other failures from ResolveProcess, treat as "process not found". |
| Console.WriteLine($"Probing processes for support of the EventPipe UserEvents IPC command used by collect-linux. Requires runtime '{minRuntimeSupportingUserEventsIPCCommand}' or later."); | ||
| } | ||
| StringBuilder supportedProcesses = new(); | ||
| StringBuilder unsupportedProcesses = new(); | ||
| StringBuilder unknownProcesses = new(); | ||
|
|
||
| IEnumerable<int> pids = DiagnosticsClient.GetPublishedProcesses(); | ||
| foreach (int pid in pids) | ||
| { | ||
| if (pid == Environment.ProcessId) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| bool supports = ProcessSupportsUserEventsIpcCommand(pid, string.Empty, out int resolvedPid, out string resolvedName, out string detectedRuntimeVersion); | ||
| BuildProcessSupportCsv(resolvedPid, resolvedName, supports, supportedCsv, unsupportedCsv); | ||
| if (supports) | ||
| { | ||
| supportedProcesses.AppendLine($"{resolvedPid} {resolvedName}"); | ||
| } | ||
| else | ||
| { | ||
| unsupportedProcesses.AppendLine($"{resolvedPid} {resolvedName} - Detected runtime: '{detectedRuntimeVersion}'"); | ||
| } | ||
| } | ||
| GetAndProbeAllProcesses(supportedProcesses, unsupportedProcesses, unknownProcesses, supportedCsv, unsupportedCsv, unknownCsv); | ||
|
|
||
| if (mode == ProbeOutputMode.Console) | ||
| { | ||
| Console.WriteLine($".NET processes that support the command:"); | ||
| Console.WriteLine($"Processes that support the command:"); | ||
| Console.WriteLine(supportedProcesses.ToString()); | ||
| Console.WriteLine($".NET processes that do NOT support the command:"); | ||
| Console.WriteLine($"Processes that do NOT support the command:"); | ||
| Console.WriteLine(unsupportedProcesses.ToString()); |
There was a problem hiding this comment.
SupportsCollectLinux console output strings were changed from ".NET process(es)" to "Process(es)", but existing functional tests still assert the old ".NET processes" headings (e.g., CollectLinuxCommand_Probe_ListsProcesses_WhenNoArgs). This will cause test failures unless the test expectations are updated (or the output wording is kept compatible).
| [ConditionalTheory(nameof(IsCollectLinuxSupported))] | ||
| [MemberData(nameof(ResolveProcessExceptions))] | ||
| public void CollectLinuxCommand_ResolveProcessExceptions(object testArgs, string[] expectedError) | ||
| public void CollectLinuxCommand_ResolveProcessExceptions(object testArgs, string[] expectedError, int expectedExitCode) | ||
| { | ||
| MockConsole console = new(200, 30, _outputHelper); | ||
|
|
||
| int exitCode = Run(testArgs, console); | ||
|
|
||
| Assert.Equal((int)ReturnCode.ArgumentError, exitCode); | ||
| Assert.Equal(expectedExitCode, exitCode); | ||
| console.AssertSanitizedLinesEqual(null, expectedError); |
There was a problem hiding this comment.
The functional tests for non-probe mode (CollectLinuxCommand_ReportsResolveProcessErrors / _ReportsResolveProcessNameErrors) still assert ReturnCode.ArgumentError and the raw ResolveProcess exception text, but CollectLinux now routes these cases through ProbeProcess and returns TracingError with a different message. Update those tests to match the new behavior, otherwise the test suite will fail.
| break; | ||
| case UserEventsProbeResult.ProcessNotFound: | ||
| case UserEventsProbeResult.ConnectionFailed: | ||
| unknownProcesses?.AppendLine($"{resolvedPid} {resolvedName} - Unable to connect"); |
There was a problem hiding this comment.
GetAndProbeAllProcesses appends "- Unable to connect" for both ProcessNotFound and ConnectionFailed results. For ProcessNotFound (e.g., process exited between enumeration and probe) this message is misleading; consider wording that covers both cases (e.g., "Exited or diagnostic endpoint not accessible") or differentiating the two outcomes.
| unknownProcesses?.AppendLine($"{resolvedPid} {resolvedName} - Unable to connect"); | |
| unknownProcesses?.AppendLine($"{resolvedPid} {resolvedName} - Exited or diagnostic endpoint not accessible"); |
| new("--probe") | ||
| { | ||
| Description = "Probe .NET processes for support of the EventPipe UserEvents IPC command used by collect-linux, without collecting a trace. Results list supported processes first. Use '-o stdout' to print CSV (pid,processName,supportsCollectLinux) to the console, or '-o <file>' to write the CSV. Probe a single process with -n|--name or -p|--process-id.", | ||
| Description = "Probe processes for support of the EventPipe UserEvents IPC command used by collect-linux, without collecting a trace. Results are categorized as supported, not supported, or unknown (when the process's diagnostic endpoint is not accessible). Use '-o stdout' to print CSV (pid,processName,supportsCollectLinux) to the console, or '-o <file>' to write the CSV. Probe a single process with -n|--name or -p|--process-id.", |
There was a problem hiding this comment.
--probe help text describes results as "supported, not supported, or unknown", but single-process probe mode also emits a distinct "Could not resolve process ..." case (ProcessNotFound). Consider updating the option description to mention the ProcessNotFound outcome (or clarifying that it is included under "unknown") so the CLI help matches observed behavior.
| Description = "Probe processes for support of the EventPipe UserEvents IPC command used by collect-linux, without collecting a trace. Results are categorized as supported, not supported, or unknown (when the process's diagnostic endpoint is not accessible). Use '-o stdout' to print CSV (pid,processName,supportsCollectLinux) to the console, or '-o <file>' to write the CSV. Probe a single process with -n|--name or -p|--process-id.", | |
| Description = "Probe processes for support of the EventPipe UserEvents IPC command used by collect-linux, without collecting a trace. Results are categorized as supported, not supported, or unknown (for example, when the process's diagnostic endpoint is not accessible or the process cannot be found). Use '-o stdout' to print CSV (pid,processName,supportsCollectLinux) to the console, or '-o <file>' to write the CSV. Probe a single process with -n|--name or -p|--process-id.", |
| Console.WriteLine("pid,processName,supportsCollectLinux"); | ||
| Console.Write(supportedCsv?.ToString()); | ||
| Console.Write(unsupportedCsv?.ToString()); | ||
| Console.Write(unknownCsv?.ToString()); |
There was a problem hiding this comment.
Condition is always not null because of ... == ....
| writer.Write(supportedCsv?.ToString()); | ||
| writer.Write(unsupportedCsv?.ToString()); | ||
| writer.Write(unknownCsv?.ToString()); |
There was a problem hiding this comment.
Condition is always not null because of ... == ....
| writer.Write(supportedCsv?.ToString()); | |
| writer.Write(unsupportedCsv?.ToString()); | |
| writer.Write(unknownCsv?.ToString()); | |
| writer.Write(supportedCsv.ToString()); | |
| writer.Write(unsupportedCsv.ToString()); | |
| writer.Write(unknownCsv.ToString()); |
| { | ||
| MockConsole console = new(200, 30, _outputHelper); | ||
| var args = TestArgs(processId: -1, probe: true); | ||
| var args = TestArgs(processId: -1, probe: true, output: new FileInfo(CommonOptions.DefaultTraceName)); |
There was a problem hiding this comment.
Does the argument to the test need to change? I would expect the args to match what we expect the user to pass in at the command-line and I don't imagine users will be adding a trace name to their probe commands.
| // ProcessNotFound shows just PID when no name is provided | ||
| string[] expected = ExpectPreviewWithMessages( | ||
| new[] { | ||
| $"Could not resolve process '{FormatProcessIdentifier(-1, string.Empty)}'.", |
There was a problem hiding this comment.
The error text seems less clear than it was previously. "Resolving a process" sounds more like our app's internal terminology rather than terminology users will universally understand.
Also I think its easier to review these UI test cases if the test includes the literal error text rather than calls into other formatting functions that have to be reasoned about. If changing app behavior causes 5 or 10 test cases to all have a similar looking edit that is useful to see.
| // ProcessNotFound shows "name (pid)" when name is provided - pid is 0 from input | ||
| string[] expected = ExpectPreviewWithMessages( | ||
| new[] { | ||
| $"Could not resolve process '{FormatProcessIdentifier(0, "process-that-should-not-exist")}'.", |
There was a problem hiding this comment.
Previous error text seems clearer again
| // ProcessNotFound shows "name (pid)" when both are provided | ||
| string[] expected = ExpectPreviewWithMessages( | ||
| new[] { | ||
| $"Could not resolve process '{FormatProcessIdentifier(1, "dummy")}'.", |
There was a problem hiding this comment.
Previous error text seems clearer.
| { | ||
| TestArgs(processId: -1, name: string.Empty), | ||
| FormatException("-1 is not a valid process ID") | ||
| FormatProcessNotFoundError(FormatProcessIdentifier(-1, string.Empty)), |
There was a problem hiding this comment.
This is another spot where it would be useful to inline the exact error text rather than needing to read through multiple layers of formatting helpers to figure it out. I'm also guessing like above the new error messages are less precise than before.
| [ConditionalTheory(nameof(IsCollectLinuxSupported))] | ||
| [MemberData(nameof(ResolveProcessExceptions))] | ||
| public void CollectLinuxCommand_ResolveProcessExceptions(object testArgs, string[] expectedError) | ||
| public void CollectLinuxCommand_ResolveProcessExceptions(object testArgs, string[] expectedError, int expectedExitCode) |
There was a problem hiding this comment.
I don't think I noticed any test case for the output when handling a ServerNotAvailableException. That seems important to add.
There was a problem hiding this comment.
Is there an easy instrumentable way to have a .NET process that isn't ready to accept a diagnostics IPC command, or is shutting down by the time the probe tries to send the command? I haven't brainstormed anything simple yet. My big assumption is the tests are racing to probe a .NET process that is in the middle of shutting down, hence the flakey failure on CI
Summary
Handle
ServerNotAvailableExceptionandDiagnosticToolExceptioninCollectLinuxCommandprocess probing to gracefully handle processes that cannot be resolved or connected to.Fixes #5694
Problem
DiagnosticsClient.GetProcessInfo()throwsServerNotAvailableExceptionwhen the diagnostics server is unavailable (process exits between enumeration and probing, connection failures, etc.). Additionally,CommandUtils.ResolveProcess()throwsDiagnosticToolExceptionfor invalid process IDs or names. These unhandled exceptions caused probe operations to crash.Solution
Introduce a four-state probe result to handle all outcomes:
name)
Behavior Changes
Non-probe mode (
dotnet-trace collect-linux -p <pid>):ProcessNotFound:[ERROR] Could not resolve process '<id>'.ConnectionFailed:[ERROR] Unable to connect to process '<id>'. The process may have exited or its diagnostic endpoint is not accessible.TracingErrorSingle-process probe mode (
dotnet-trace collect-linux --probe -p <pid>):ProcessNotFound:Could not resolve process '<id>'.ConnectionFailed:Process '<id>' could not be probed. Unable to connect to the process's diagnostic endpoint.Ok(informational output)Machine-wide probe mode (
dotnet-trace collect-linux --probe):unknownvalue for unprobed processesOther Changes
FormatProcessIdentifierhelper - showsname (pid)when name is provided, justpidotherwise--probeoption help text to document result categories