Add centralized OpenTelemetry tracing for MCP tool execution#3111
Add centralized OpenTelemetry tracing for MCP tool execution#3111
Conversation
Co-authored-by: souvikghosh04 <210500244+souvikghosh04@users.noreply.github.com>
Co-authored-by: souvikghosh04 <210500244+souvikghosh04@users.noreply.github.com>
…EntityName property Co-authored-by: souvikghosh04 <210500244+souvikghosh04@users.noreply.github.com>
Co-authored-by: souvikghosh04 <210500244+souvikghosh04@users.noreply.github.com>
Co-authored-by: souvikghosh04 <210500244+souvikghosh04@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds centralized OpenTelemetry tracing around MCP tool execution so both built-in and dynamic custom MCP tools emit consistent spans across stdio and HTTP transports.
Changes:
- Added
McpTelemetryHelper.ExecuteWithTelemetryAsyncto start/finish spans, set tags, and record exceptions with MCP-specific error codes. - Wired telemetry wrapper into both MCP dispatch entry points (
McpStdioServerand HTTPCallToolHandler). - Added unit tests for telemetry helpers and exposed
DynamicCustomTool.EntityNameto avoid reflection.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Azure.DataApiBuilder.Mcp/Utils/McpTelemetryHelper.cs | Central telemetry wrapper + operation inference + error-code mapping + custom-tool metadata extraction |
| src/Azure.DataApiBuilder.Mcp/Utils/McpTelemetryErrorCodes.cs | Defines MCP telemetry error-code constants |
| src/Core/Telemetry/TelemetryTracesHelper.cs | Adds Activity extension methods for MCP tool tags/status/exception recording |
| src/Azure.DataApiBuilder.Mcp/Core/McpStdioServer.cs | Wraps stdio tool execution with telemetry helper |
| src/Azure.DataApiBuilder.Mcp/Core/McpServerConfiguration.cs | Wraps HTTP tool execution with telemetry helper |
| src/Azure.DataApiBuilder.Mcp/Core/DynamicCustomTool.cs | Exposes EntityName property and updates internal usage |
| src/Service.Tests/UnitTests/McpTelemetryTests.cs | Adds unit tests validating MCP telemetry tagging/status/error mapping |
| src/Service.Tests/Azure.DataApiBuilder.Service.Tests.csproj | Adds project reference to MCP project for unit testing |
| src/Azure.DataApiBuilder.Mcp/Azure.DataApiBuilder.Mcp.csproj | Adds InternalsVisibleTo for Azure.DataApiBuilder.Service.Tests |
| // Check if the tool returned an error result (tools catch exceptions internally | ||
| // and return CallToolResult with IsError=true instead of throwing) | ||
| if (result.IsError == true) | ||
| { | ||
| activity?.SetStatus(ActivityStatusCode.Error, "Tool returned an error result"); | ||
| activity?.SetTag("mcp.tool.error", true); | ||
| } |
There was a problem hiding this comment.
When CallToolResult.IsError == true, the span is marked as error but no error.code / error.message tags are set. Since MCP tools standardize errors via McpResponseBuilder.BuildErrorResult (type/message in the JSON payload), consider extracting those values and setting error.code (and optionally error.message) so non-exception failures (auth/validation/etc.) are diagnosable in traces.
| try | ||
| { | ||
| // Access public properties instead of reflection | ||
| string? entityName = customTool.EntityName; | ||
|
|
||
| if (entityName != null) | ||
| { | ||
| // Try to get the stored procedure name from the runtime configuration | ||
| RuntimeConfigProvider? runtimeConfigProvider = serviceProvider.GetService<RuntimeConfigProvider>(); | ||
| if (runtimeConfigProvider != null) | ||
| { | ||
| RuntimeConfig config = runtimeConfigProvider.GetConfig(); | ||
| if (config.Entities.TryGetValue(entityName, out Entity? entityConfig)) | ||
| { | ||
| string? dbProcedure = entityConfig.Source.Object; | ||
| return (entityName, dbProcedure); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return (entityName, null); | ||
| } | ||
| catch (Exception ex) when (ex is InvalidOperationException || ex is ArgumentException) | ||
| { | ||
| // If configuration access fails due to invalid state or arguments, return null values | ||
| // This is expected during startup or configuration changes | ||
| return (null, null); | ||
| } |
There was a problem hiding this comment.
ExtractCustomToolMetadata calls runtimeConfigProvider.GetConfig(), which can throw DataApiBuilderException when the runtime config isn't set up. Because this helper runs inside the telemetry wrapper, an exception here would fail the tool invocation even though metadata extraction should be best-effort. Consider catching broader exceptions (at least DataApiBuilderException) and returning (customTool.EntityName, null) so telemetry can't break tool execution.
| try | |
| { | |
| // Access public properties instead of reflection | |
| string? entityName = customTool.EntityName; | |
| if (entityName != null) | |
| { | |
| // Try to get the stored procedure name from the runtime configuration | |
| RuntimeConfigProvider? runtimeConfigProvider = serviceProvider.GetService<RuntimeConfigProvider>(); | |
| if (runtimeConfigProvider != null) | |
| { | |
| RuntimeConfig config = runtimeConfigProvider.GetConfig(); | |
| if (config.Entities.TryGetValue(entityName, out Entity? entityConfig)) | |
| { | |
| string? dbProcedure = entityConfig.Source.Object; | |
| return (entityName, dbProcedure); | |
| } | |
| } | |
| } | |
| return (entityName, null); | |
| } | |
| catch (Exception ex) when (ex is InvalidOperationException || ex is ArgumentException) | |
| { | |
| // If configuration access fails due to invalid state or arguments, return null values | |
| // This is expected during startup or configuration changes | |
| return (null, null); | |
| } | |
| // Access public properties instead of reflection | |
| string? entityName = customTool.EntityName; | |
| if (entityName == null) | |
| { | |
| return (null, null); | |
| } | |
| try | |
| { | |
| // Try to get the stored procedure name from the runtime configuration | |
| RuntimeConfigProvider? runtimeConfigProvider = serviceProvider.GetService<RuntimeConfigProvider>(); | |
| if (runtimeConfigProvider != null) | |
| { | |
| RuntimeConfig config = runtimeConfigProvider.GetConfig(); | |
| if (config.Entities.TryGetValue(entityName, out Entity? entityConfig)) | |
| { | |
| string? dbProcedure = entityConfig.Source.Object; | |
| return (entityName, dbProcedure); | |
| } | |
| } | |
| } | |
| catch (Exception) | |
| { | |
| // If configuration access fails for any reason, fall back to returning only the entity name. | |
| // Telemetry must not prevent tool execution, and metadata extraction is best-effort. | |
| } | |
| return (entityName, null); |
| activity.TrackMcpToolExecutionFinishedWithException(testException, errorCode: "ExecutionFailed"); | ||
|
|
||
| Activity recorded = StopAndGetRecordedActivity(activity); | ||
| Assert.AreEqual(ActivityStatusCode.Error, recorded.Status); | ||
| Assert.AreEqual("Test exception", recorded.StatusDescription); | ||
| Assert.AreEqual("InvalidOperationException", recorded.GetTagItem("error.type")); | ||
| Assert.AreEqual("Test exception", recorded.GetTagItem("error.message")); | ||
| Assert.AreEqual("ExecutionFailed", recorded.GetTagItem("error.code")); |
There was a problem hiding this comment.
Use McpTelemetryErrorCodes.EXECUTION_FAILED instead of the string literal "ExecutionFailed" here to avoid test drift if the constant value ever changes.
| activity.TrackMcpToolExecutionFinishedWithException(testException, errorCode: "ExecutionFailed"); | |
| Activity recorded = StopAndGetRecordedActivity(activity); | |
| Assert.AreEqual(ActivityStatusCode.Error, recorded.Status); | |
| Assert.AreEqual("Test exception", recorded.StatusDescription); | |
| Assert.AreEqual("InvalidOperationException", recorded.GetTagItem("error.type")); | |
| Assert.AreEqual("Test exception", recorded.GetTagItem("error.message")); | |
| Assert.AreEqual("ExecutionFailed", recorded.GetTagItem("error.code")); | |
| activity.TrackMcpToolExecutionFinishedWithException(testException, errorCode: McpTelemetryErrorCodes.EXECUTION_FAILED); | |
| Activity recorded = StopAndGetRecordedActivity(activity); | |
| Assert.AreEqual(ActivityStatusCode.Error, recorded.Status); | |
| Assert.AreEqual("Test exception", recorded.StatusDescription); | |
| Assert.AreEqual("InvalidOperationException", recorded.GetTagItem("error.type")); | |
| Assert.AreEqual("Test exception", recorded.GetTagItem("error.message")); | |
| Assert.AreEqual(McpTelemetryErrorCodes.EXECUTION_FAILED, recorded.GetTagItem("error.code")); |
| return toolName.ToLowerInvariant() switch | ||
| { | ||
| string s when s.Contains("read") || s.Contains("get") || s.Contains("list") || s.Contains("describe") => "read", | ||
| string s when s.Contains("create") || s.Contains("insert") => "create", | ||
| string s when s.Contains("update") || s.Contains("modify") => "update", | ||
| string s when s.Contains("delete") || s.Contains("remove") => "delete", | ||
| string s when s.Contains("execute") => "execute", | ||
| _ => "execute" | ||
| }; |
There was a problem hiding this comment.
InferOperationFromToolName uses substring matching (e.g., Contains("get")), which can misclassify tool names where those sequences occur inside other words (e.g., target_entity contains get). Since this value is emitted as dab.operation, consider tokenizing on _ (or checking prefixes) so only verb tokens like get, list, create, etc. drive the inference.
| return toolName.ToLowerInvariant() switch | |
| { | |
| string s when s.Contains("read") || s.Contains("get") || s.Contains("list") || s.Contains("describe") => "read", | |
| string s when s.Contains("create") || s.Contains("insert") => "create", | |
| string s when s.Contains("update") || s.Contains("modify") => "update", | |
| string s when s.Contains("delete") || s.Contains("remove") => "delete", | |
| string s when s.Contains("execute") => "execute", | |
| _ => "execute" | |
| }; | |
| if (string.IsNullOrWhiteSpace(toolName)) | |
| { | |
| return "execute"; | |
| } | |
| string normalizedName = toolName.ToLowerInvariant(); | |
| string[] tokens = normalizedName.Split( | |
| new[] { '_', '-', ' ' }, | |
| StringSplitOptions.RemoveEmptyEntries); | |
| static bool HasVerbToken(string[] tokens, params string[] verbs) | |
| { | |
| foreach (string token in tokens) | |
| { | |
| foreach (string verb in verbs) | |
| { | |
| if (token == verb) | |
| { | |
| return true; | |
| } | |
| } | |
| } | |
| return false; | |
| } | |
| if (HasVerbToken(tokens, "read", "get", "list", "describe")) | |
| { | |
| return "read"; | |
| } | |
| if (HasVerbToken(tokens, "create", "insert")) | |
| { | |
| return "create"; | |
| } | |
| if (HasVerbToken(tokens, "update", "modify")) | |
| { | |
| return "update"; | |
| } | |
| if (HasVerbToken(tokens, "delete", "remove")) | |
| { | |
| return "delete"; | |
| } | |
| if (HasVerbToken(tokens, "execute")) | |
| { | |
| return "execute"; | |
| } | |
| return "execute"; |
| public class McpTelemetryTests | ||
| { | ||
| private static ActivityListener? _activityListener; | ||
| private static List<Activity> _recordedActivities = new(); |
There was a problem hiding this comment.
Field '_recordedActivities' can be 'readonly'.
| private static List<Activity> _recordedActivities = new(); | |
| private static readonly List<Activity> _recordedActivities = new(); |
| public const string EXECUTION_FAILED = "ExecutionFailed"; | ||
|
|
||
| /// <summary> | ||
| /// Authentication or authorization failure error code. |
There was a problem hiding this comment.
There should be a separate failure code for authorization like
AUTHORIZATION_FAILED
| { | ||
| // Extract telemetry metadata | ||
| string? entityName = ExtractEntityNameFromArguments(arguments); | ||
| string? operation = InferOperationFromToolName(toolName); |
There was a problem hiding this comment.
Operation should be inferred only for built in tools.
For dynamic custom tools, it should always be "execute"
| /// <returns>The inferred operation type.</returns> | ||
| public static string InferOperationFromToolName(string toolName) | ||
| { | ||
| return toolName.ToLowerInvariant() switch |
There was a problem hiding this comment.
The operation shouldn't be inferred from toolname,
All the built in tools have types associated with them, so could check the tool type and infer the action. create_record tool is create action. String comparison should not be needed.
Aniruddh25
left a comment
There was a problem hiding this comment.
Need to address changes related to inferring tool names, and unit tests.
| { | ||
| return toolName.ToLowerInvariant() switch | ||
| { | ||
| string s when s.Contains("read") || s.Contains("get") || s.Contains("list") || s.Contains("describe") => "read", |
There was a problem hiding this comment.
list and describe_entites tools are not typical dab operations - crude. the operation for these tools could simply be list/describe
| return ex switch | ||
| { | ||
| OperationCanceledException => McpTelemetryErrorCodes.OPERATION_CANCELLED, | ||
| UnauthorizedAccessException => McpTelemetryErrorCodes.AUTHENTICATION_FAILED, |
There was a problem hiding this comment.
UnthorizedAccess should be AUTHORIZATION_FAILED
Isnt there an exception for Unauthenticated Access?
| RuntimeConfig config = runtimeConfigProvider.GetConfig(); | ||
| if (config.Entities.TryGetValue(entityName, out Entity? entityConfig)) | ||
| { | ||
| string? dbProcedure = entityConfig.Source.Object; |
There was a problem hiding this comment.
Does Source.Object give the procedure name including the schema?
| /// including when optional parameters are null. | ||
| /// </summary> | ||
| [DataTestMethod] | ||
| [DataRow("read_records", "books", "read", null, DisplayName = "Sets entity, operation; no procedure")] |
There was a problem hiding this comment.
should test all dml tools, not just read
| [DataTestMethod] | ||
| // Built-in tool names | ||
| [DataRow("read_records", "read")] | ||
| [DataRow("get_book", "read")] |
There was a problem hiding this comment.
get_book should be execute - since its likely a stored procedure that has been called - its not a built-in tool name
| { | ||
| // Arrange & Act | ||
| using Activity activity = CreateActivity(); | ||
| activity.TrackMcpToolExecutionStarted( |
There was a problem hiding this comment.
can we unit test ExecuteWithTelemetryAsync? Since, here we are simply testing whether the Activity's TrackXX function is accurately implemented or not - which is not what this PR is implementing
|
|
||
| #endregion | ||
|
|
||
| #region Test Fakes |
There was a problem hiding this comment.
Call it Mock instead of Fake.
Why make this change?
MCP tool executions (both built-in DML tools and dynamic custom stored-procedure tools) currently lack explicit OpenTelemetry tracing. While
ILoggerlogging exists, there are no tracing spans emitted for MCP tool invocations. This makes it difficult to observe and diagnose tool execution in distributed tracing systems.What is this change?
Added centralized OpenTelemetry tracing at the MCP dispatcher level (where tools are resolved and
ExecuteAsyncis invoked), ensuring consistent spans for all MCP tools without code duplication.Span Attributes
mcp.tool.namedab.entitydab.operationexecute)db.procedureOutcome Recording
ActivityStatusCode.OkActivityStatusCode.ErrorwithRecordException()anderror.code/error.messagetagsCoverage
Spans cover all paths:
Implementation Details
TelemetryTracesHelper.DABActivitySource("DataApiBuilder")ExecutionHelperwhereDABActivitySource.StartActivity()wraps query executionHow was this tested?