Skip to content

Handle ServerNotAvailableException in CollectLinuxCommand process probing#5705

Open
Copilot wants to merge 4 commits intomainfrom
copilot/fix-diagnostics-client-exception
Open

Handle ServerNotAvailableException in CollectLinuxCommand process probing#5705
Copilot wants to merge 4 commits intomainfrom
copilot/fix-diagnostics-client-exception

Conversation

Copy link
Contributor

Copilot AI commented Feb 5, 2026

Summary

Handle ServerNotAvailableException and DiagnosticToolException inCollectLinuxCommand process probing to gracefully handle processes that cannot be resolved or connected to.

Fixes #5694

Problem

DiagnosticsClient.GetProcessInfo() throws ServerNotAvailableException when the diagnostics server is unavailable (process exits between enumeration and probing, connection failures, etc.). Additionally, CommandUtils.ResolveProcess() throws DiagnosticToolException for invalid process IDs or names. These unhandled exceptions caused probe operations to crash.

Solution

Introduce a four-state probe result to handle all outcomes:

  • Supported: Process supports UserEvents IPC command
  • NotSupported: Process does not support UserEvents IPC command (runtime too old)
  • ProcessNotFound: Process could not be resolved (invalid PID, no process with given
    name)
  • ConnectionFailed: Process resolved but unable to connect to diagnostic endpoint

Behavior Changes

Non-probe mode (dotnet-trace collect-linux -p <pid>):

  • ProcessNotFound: [ERROR] Could not resolve process '<id>'.
  • ConnectionFailed: [ERROR] Unable to connect to process '<id>'. The process may have exited or its diagnostic endpoint is not accessible.
  • Both return TracingError

Single-process probe mode (dotnet-trace collect-linux --probe -p <pid>):

  • ProcessNotFound: Could not resolve process '<id>'.
  • ConnectionFailed: Process '<id>' could not be probed. Unable to connect to the process's diagnostic endpoint.
  • Returns Ok (informational output)

Machine-wide probe mode (dotnet-trace collect-linux --probe):

  • Shows "Processes that could not be probed" section when applicable
  • CSV output includes unknown value for unprobed processes
  • Processes that exit between enumeration and probing are handled gracefully

Other Changes

  • Added FormatProcessIdentifier helper - shows name (pid) when name is provided, just pid otherwise
  • Changed ".NET process" to "Process" in messages (probe accepts arbitrary PIDs)
  • Updated --probe option help text to document result categories

Copilot AI changed the title [WIP] Investigate DiagnosticsClient.GetProcessInfo exception Handle ServerNotAvailableException in CollectLinuxCommand process probing Feb 5, 2026
Copilot AI requested a review from mdh1418 February 5, 2026 19:03
@mdh1418 mdh1418 force-pushed the copilot/fix-diagnostics-client-exception branch from 8fe26f7 to fe76ee6 Compare February 6, 2026 17:21
…lpers

Add UserEventsProbeResult enum (Supported/NotSupported) to replace boolean return.
Introduce ProbeProcess helper for probing a single process.
Add GetAndProbeAllProcesses helper that enumerates and probes all published processes.
Update callers in CollectLinux and SupportsCollectLinux to use new helpers.
Update BuildProcessSupportCsv to use UserEventsProbeResult enum.
…cess probing

Add ProcessNotFound and ConnectionFailed values to UserEventsProbeResult enum.
Update ProbeProcess to catch DiagnosticToolException (process resolution failed) and
ServerNotAvailableException (diagnostic endpoint not accessible) separately.
Add FormatProcessIdentifier helper for clean display of process ID/name.
Add unknownProcesses/unknownCsv tracking for processes that could not be probed.
Update probe mode output to show 'Processes that could not be probed' section.
Include 'unknown' value in CSV output for unprobed processes.
Update non-probe mode to show distinct errors for each failure type.
Change '.NET process' to 'Process' in messages since arbitrary PIDs may not be .NET.

Fixes #5694
Document that results are categorized as supported, not supported, or unknown.
Clarify that unknown status occurs when diagnostic endpoint is not accessible.
…iled handling

Update test expectations to match new behavior:
- Add FormatProcessNotFoundError and FormatProcessIdentifier helpers
- Update ResolveProcessExceptions test data for ProcessNotFound handling
- Update probe error test cases for process resolution errors
- Tests now expect ReturnCode.TracingError for failures in non-probe mode
- Tests expect ReturnCode.Ok for probe mode with informational output
@mdh1418 mdh1418 force-pushed the copilot/fix-diagnostics-client-exception branch from fe76ee6 to 0cbcf44 Compare February 6, 2026 19:54
@mdh1418 mdh1418 marked this pull request as ready for review February 6, 2026 20:02
@mdh1418 mdh1418 requested a review from a team as a code owner February 6, 2026 20:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves dotnet-trace collect-linux resilience by handling process-resolution and diagnostics-connection failures during “process probing” so the command no longer crashes when a target process can’t be resolved or connected to (e.g., exits between enumeration and probing, cross-container endpoint issues).

Changes:

  • Replaced boolean “supports” probing with a 4-state probe result (Supported/NotSupported/ProcessNotFound/ConnectionFailed) and updated user-facing output.
  • Updated machine-wide probe to track and report “unknown/unprobed” processes and emit unknown in CSV.
  • Adjusted functional tests to match new probe behaviors/messages (partially—some existing expectations still appear outdated).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
src/Tools/dotnet-trace/CommandLine/Commands/CollectLinuxCommand.cs Introduces multi-state probing, catches DiagnosticToolException/ServerNotAvailableException, updates probe messaging and CSV output.
src/tests/dotnet-trace/CollectLinuxCommandFunctionalTests.cs Updates/extends tests for new probe outcomes and adds helpers for the new process identifier/message formatting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return (int)ReturnCode.TracingError;
case UserEventsProbeResult.ProcessNotFound:
Console.Error.WriteLine($"[ERROR] Could not resolve process '{FormatProcessIdentifier(resolvedProcessId, resolvedProcessName)}'.");
return (int)ReturnCode.TracingError;
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectLinux now maps ResolveProcess failures (invalid PID/name, both PID and name, etc.) to ReturnCode.TracingError and a generic "Could not resolve process ..." message. This is a behavioral change from the previous DiagnosticToolException flow (which returned ArgumentError with a more specific message) and may break callers/scripts relying on ArgumentError for invalid input. Consider preserving DiagnosticToolException.ReturnCode (or at least returning ArgumentError for invalid arguments like negative PID / both options) while still handling ServerNotAvailableException gracefully.

Suggested change
return (int)ReturnCode.TracingError;
return (int)ReturnCode.ArgumentError;

Copilot uses AI. Check for mistakes.
Comment on lines +327 to +330
catch (DiagnosticToolException)
{
resolvedPid = originalPid;
resolvedName = originalName;
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProbeProcess catches DiagnosticToolException from CommandUtils.ResolveProcess and always returns ProcessNotFound, which replaces specific, actionable error messages like "Only one of the --name or --process-id options may be specified." with the generic "Could not resolve process ...". Consider distinguishing argument-validation failures from true "process not found" (e.g., by surfacing the original DiagnosticToolException message/ReturnCode for invalid option combinations).

Suggested change
catch (DiagnosticToolException)
{
resolvedPid = originalPid;
resolvedName = originalName;
catch (DiagnosticToolException ex)
{
// Restore original values in case ResolveProcess mutated the out parameters
resolvedPid = originalPid;
resolvedName = originalName;
// Distinguish argument-validation failures from true "process not found" cases.
// For argument/command-line errors, rethrow so the caller can surface the
// specific, actionable message and return code.
if (ex.ReturnCode == ReturnCode.InvalidArgument ||
ex.ReturnCode == ReturnCode.CommandLineError)
{
throw;
}
// For all other failures from ResolveProcess, treat as "process not found".

Copilot uses AI. Check for mistakes.
Comment on lines +243 to 256
Console.WriteLine($"Probing processes for support of the EventPipe UserEvents IPC command used by collect-linux. Requires runtime '{minRuntimeSupportingUserEventsIPCCommand}' or later.");
}
StringBuilder supportedProcesses = new();
StringBuilder unsupportedProcesses = new();
StringBuilder unknownProcesses = new();

IEnumerable<int> pids = DiagnosticsClient.GetPublishedProcesses();
foreach (int pid in pids)
{
if (pid == Environment.ProcessId)
{
continue;
}

bool supports = ProcessSupportsUserEventsIpcCommand(pid, string.Empty, out int resolvedPid, out string resolvedName, out string detectedRuntimeVersion);
BuildProcessSupportCsv(resolvedPid, resolvedName, supports, supportedCsv, unsupportedCsv);
if (supports)
{
supportedProcesses.AppendLine($"{resolvedPid} {resolvedName}");
}
else
{
unsupportedProcesses.AppendLine($"{resolvedPid} {resolvedName} - Detected runtime: '{detectedRuntimeVersion}'");
}
}
GetAndProbeAllProcesses(supportedProcesses, unsupportedProcesses, unknownProcesses, supportedCsv, unsupportedCsv, unknownCsv);

if (mode == ProbeOutputMode.Console)
{
Console.WriteLine($".NET processes that support the command:");
Console.WriteLine($"Processes that support the command:");
Console.WriteLine(supportedProcesses.ToString());
Console.WriteLine($".NET processes that do NOT support the command:");
Console.WriteLine($"Processes that do NOT support the command:");
Console.WriteLine(unsupportedProcesses.ToString());
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SupportsCollectLinux console output strings were changed from ".NET process(es)" to "Process(es)", but existing functional tests still assert the old ".NET processes" headings (e.g., CollectLinuxCommand_Probe_ListsProcesses_WhenNoArgs). This will cause test failures unless the test expectations are updated (or the output wording is kept compatible).

Copilot uses AI. Check for mistakes.
Comment on lines 101 to 110
[ConditionalTheory(nameof(IsCollectLinuxSupported))]
[MemberData(nameof(ResolveProcessExceptions))]
public void CollectLinuxCommand_ResolveProcessExceptions(object testArgs, string[] expectedError)
public void CollectLinuxCommand_ResolveProcessExceptions(object testArgs, string[] expectedError, int expectedExitCode)
{
MockConsole console = new(200, 30, _outputHelper);

int exitCode = Run(testArgs, console);

Assert.Equal((int)ReturnCode.ArgumentError, exitCode);
Assert.Equal(expectedExitCode, exitCode);
console.AssertSanitizedLinesEqual(null, expectedError);
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functional tests for non-probe mode (CollectLinuxCommand_ReportsResolveProcessErrors / _ReportsResolveProcessNameErrors) still assert ReturnCode.ArgumentError and the raw ResolveProcess exception text, but CollectLinux now routes these cases through ProbeProcess and returns TracingError with a different message. Update those tests to match the new behavior, otherwise the test suite will fail.

Copilot uses AI. Check for mistakes.
break;
case UserEventsProbeResult.ProcessNotFound:
case UserEventsProbeResult.ConnectionFailed:
unknownProcesses?.AppendLine($"{resolvedPid} {resolvedName} - Unable to connect");
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetAndProbeAllProcesses appends "- Unable to connect" for both ProcessNotFound and ConnectionFailed results. For ProcessNotFound (e.g., process exited between enumeration and probe) this message is misleading; consider wording that covers both cases (e.g., "Exited or diagnostic endpoint not accessible") or differentiating the two outcomes.

Suggested change
unknownProcesses?.AppendLine($"{resolvedPid} {resolvedName} - Unable to connect");
unknownProcesses?.AppendLine($"{resolvedPid} {resolvedName} - Exited or diagnostic endpoint not accessible");

Copilot uses AI. Check for mistakes.
new("--probe")
{
Description = "Probe .NET processes for support of the EventPipe UserEvents IPC command used by collect-linux, without collecting a trace. Results list supported processes first. Use '-o stdout' to print CSV (pid,processName,supportsCollectLinux) to the console, or '-o <file>' to write the CSV. Probe a single process with -n|--name or -p|--process-id.",
Description = "Probe processes for support of the EventPipe UserEvents IPC command used by collect-linux, without collecting a trace. Results are categorized as supported, not supported, or unknown (when the process's diagnostic endpoint is not accessible). Use '-o stdout' to print CSV (pid,processName,supportsCollectLinux) to the console, or '-o <file>' to write the CSV. Probe a single process with -n|--name or -p|--process-id.",
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--probe help text describes results as "supported, not supported, or unknown", but single-process probe mode also emits a distinct "Could not resolve process ..." case (ProcessNotFound). Consider updating the option description to mention the ProcessNotFound outcome (or clarifying that it is included under "unknown") so the CLI help matches observed behavior.

Suggested change
Description = "Probe processes for support of the EventPipe UserEvents IPC command used by collect-linux, without collecting a trace. Results are categorized as supported, not supported, or unknown (when the process's diagnostic endpoint is not accessible). Use '-o stdout' to print CSV (pid,processName,supportsCollectLinux) to the console, or '-o <file>' to write the CSV. Probe a single process with -n|--name or -p|--process-id.",
Description = "Probe processes for support of the EventPipe UserEvents IPC command used by collect-linux, without collecting a trace. Results are categorized as supported, not supported, or unknown (for example, when the process's diagnostic endpoint is not accessible or the process cannot be found). Use '-o stdout' to print CSV (pid,processName,supportsCollectLinux) to the console, or '-o <file>' to write the CSV. Probe a single process with -n|--name or -p|--process-id.",

Copilot uses AI. Check for mistakes.
Console.WriteLine("pid,processName,supportsCollectLinux");
Console.Write(supportedCsv?.ToString());
Console.Write(unsupportedCsv?.ToString());
Console.Write(unknownCsv?.ToString());
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition is always not null because of ... == ....

Copilot uses AI. Check for mistakes.
Comment on lines 277 to +279
writer.Write(supportedCsv?.ToString());
writer.Write(unsupportedCsv?.ToString());
writer.Write(unknownCsv?.ToString());
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition is always not null because of ... == ....

Suggested change
writer.Write(supportedCsv?.ToString());
writer.Write(unsupportedCsv?.ToString());
writer.Write(unknownCsv?.ToString());
writer.Write(supportedCsv.ToString());
writer.Write(unsupportedCsv.ToString());
writer.Write(unknownCsv.ToString());

Copilot uses AI. Check for mistakes.
{
MockConsole console = new(200, 30, _outputHelper);
var args = TestArgs(processId: -1, probe: true);
var args = TestArgs(processId: -1, probe: true, output: new FileInfo(CommonOptions.DefaultTraceName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the argument to the test need to change? I would expect the args to match what we expect the user to pass in at the command-line and I don't imagine users will be adding a trace name to their probe commands.

// ProcessNotFound shows just PID when no name is provided
string[] expected = ExpectPreviewWithMessages(
new[] {
$"Could not resolve process '{FormatProcessIdentifier(-1, string.Empty)}'.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error text seems less clear than it was previously. "Resolving a process" sounds more like our app's internal terminology rather than terminology users will universally understand.

Also I think its easier to review these UI test cases if the test includes the literal error text rather than calls into other formatting functions that have to be reasoned about. If changing app behavior causes 5 or 10 test cases to all have a similar looking edit that is useful to see.

// ProcessNotFound shows "name (pid)" when name is provided - pid is 0 from input
string[] expected = ExpectPreviewWithMessages(
new[] {
$"Could not resolve process '{FormatProcessIdentifier(0, "process-that-should-not-exist")}'.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous error text seems clearer again

// ProcessNotFound shows "name (pid)" when both are provided
string[] expected = ExpectPreviewWithMessages(
new[] {
$"Could not resolve process '{FormatProcessIdentifier(1, "dummy")}'.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous error text seems clearer.

{
TestArgs(processId: -1, name: string.Empty),
FormatException("-1 is not a valid process ID")
FormatProcessNotFoundError(FormatProcessIdentifier(-1, string.Empty)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another spot where it would be useful to inline the exact error text rather than needing to read through multiple layers of formatting helpers to figure it out. I'm also guessing like above the new error messages are less precise than before.

[ConditionalTheory(nameof(IsCollectLinuxSupported))]
[MemberData(nameof(ResolveProcessExceptions))]
public void CollectLinuxCommand_ResolveProcessExceptions(object testArgs, string[] expectedError)
public void CollectLinuxCommand_ResolveProcessExceptions(object testArgs, string[] expectedError, int expectedExitCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I noticed any test case for the output when handling a ServerNotAvailableException. That seems important to add.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an easy instrumentable way to have a .NET process that isn't ready to accept a diagnostics IPC command, or is shutting down by the time the probe tries to send the command? I haven't brainstormed anything simple yet. My big assumption is the tests are racing to probe a .NET process that is in the middle of shutting down, hence the flakey failure on CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[dotnet-trace][collect-linux] Unable to specifically trace a cross-container process

3 participants