Skip to content

fix: correct mock MCP tool names to match real M365 server contracts#301

Open
pratapladhani wants to merge 1 commit intomicrosoft:mainfrom
pratapladhani:fix/mock-tool-fidelity-contract
Open

fix: correct mock MCP tool names to match real M365 server contracts#301
pratapladhani wants to merge 1 commit intomicrosoft:mainfrom
pratapladhani:fix/mock-tool-fidelity-contract

Conversation

@pratapladhani
Copy link
Contributor

Summary

Fixes #300 — All four mock tool definition files had drifted from the tool catalogs exposed by real M365 MCP servers. Agents developed against the mocks would encounter tool-not-found errors when switched to real servers. There was no mechanism to detect or prevent this drift.

This PR:

  • Corrects all four mock files to match the real M365 MCP server tool catalogs
  • Introduces a snapshot-based fidelity contract that prevents future drift via CI-enforced tests

Changes

Mock file corrections

Server Changes
CalendarTools Renamed 9 tools camelCase→PascalCase (e.g. createEventCreateEvent, deleteEventDeleteEventById). Removed 3 phantom tools not on real server (getEvent, getOrganization, getSchedule). Added 4 missing tools (TentativelyAcceptEvent, ForwardEvent, GetUserDateAndTimeZoneSettings, GetRooms).
MailTools Stripped incorrect Async suffix from all 20 tool names (e.g. SendEmailWithAttachmentsAsyncSendEmailWithAttachments). Added missing FlagEmail tool.
MeServer Renamed all 5 tools to match real server (e.g. getMyProfileGetMyDetails, listUsersGetMultipleUsersDetails).
KnowledgeTools Replaced 3 disabled placeholder entries with 5 real snake_case tools from the live server.

Fidelity infrastructure (new)

Component Purpose
snapshots/*.snapshot.json (4 files) Authoritative tool catalogs captured from live M365 MCP servers — the source of truth.
MockToolFidelityTests.cs CI gate — asserts bidirectional coverage between mocks and snapshots. Runs on every PR with no credentials needed. 8 test cases (4 servers x 2 directions).
MockToolSnapshotCaptureTests.cs Developer tool — queries live M365 servers to detect drift or refresh snapshots. Requires MCP_BEARER_TOKEN env var (skips gracefully when absent).
snapshots/README.md Schema reference and update instructions.

Documentation updates

  • MockToolingServer/README.md — Updated server table with correct tool names, added Fidelity Contract and Keeping Mocks Current sections
  • MockToolingServer/design.md — Added fidelity architecture to flowchart and design description

How the fidelity contract works

Real M365 servers ──(capture tests)──▶ Snapshots ──(fidelity tests)──▶ Mock files
                   (developer-run)                 (CI-enforced)
  • Fidelity tests run in CI on every PR — no credentials needed. If mocks drift from snapshots, CI fails.
  • Capture tests are developer-run when real servers change — requires MCP_BEARER_TOKEN. Detects drift or writes updated snapshots:
# Detect drift (read-only)
MCP_BEARER_TOKEN=<token> dotnet test --filter "FullyQualifiedName~MockToolSnapshotCaptureTests"

# Refresh snapshot files
MCP_BEARER_TOKEN=<token> MCP_UPDATE_SNAPSHOTS=true dotnet test --filter "FullyQualifiedName~MockToolSnapshotCaptureTests"

Test plan

  • MockToolFidelityTests — 8/8 passing (4 servers x 2 directions)
  • MockToolSnapshotCaptureTests — 4/4 passing (skip gracefully with no token)
  • Full test suite — 1121 passed, 0 failed, 17 skipped (pre-existing skips)
  • No existing tests reference old tool names (zero impact verified)
  • CLI built from source and tested with mock server + MeetingAssistant agent (PlaygroundMock profile, all automated tests pass)
  • Code review completed — 10 findings, 0 critical, all high/medium items resolved

🤖 Generated with Claude Code

All four mock tool definition files had drifted from the tool catalogs
exposed by real M365 MCP servers, causing agents developed against the
mocks to encounter tool-not-found errors when switched to real servers.

Mock corrections:
- CalendarTools: rename 9 tools camelCase->PascalCase, remove 3 phantom
  tools (getEvent, getOrganization, getSchedule), add 4 missing tools
  (TentativelyAcceptEvent, ForwardEvent, GetUserDateAndTimeZoneSettings,
  GetRooms)
- MailTools: strip Async suffix from all 20 tool names, add FlagEmail
- MeServer: rename all 5 tools to match real server names
- KnowledgeTools: replace 3 disabled placeholders with 5 real tools

Fidelity infrastructure to prevent future drift:
- snapshots/*.snapshot.json: authoritative tool catalogs captured from
  live M365 MCP servers (contract layer)
- MockToolFidelityTests: CI gate asserting bidirectional coverage
  between mocks and snapshots (no credentials needed)
- MockToolSnapshotCaptureTests: developer tool for drift detection and
  snapshot refresh (requires MCP_BEARER_TOKEN env var)

Fixes microsoft#300

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pratapladhani pratapladhani requested a review from a team as a code owner February 28, 2026 22:54
Copilot AI review requested due to automatic review settings February 28, 2026 22:54
@pratapladhani pratapladhani requested a review from a team as a code owner February 28, 2026 22:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Mock Tooling Server’s MCP mock catalogs to better align with the live M365 MCP tool catalogs, and adds snapshot-driven tests intended to prevent future drift.

Changes:

  • Added snapshot capture + fidelity test suites for mock-vs-live tool catalog alignment.
  • Added checked-in snapshot JSON catalogs for four MCP servers.
  • Updated mock JSON tool definitions and Mock Tooling Server documentation to reflect the new contract approach.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/MockTools/MockToolSnapshotCaptureTests.cs New developer-run integration tests to query live MCP servers and refresh/verify snapshots.
src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/MockTools/MockToolFidelityTests.cs New CI-facing tests to check mock/snapshot coverage.
src/Microsoft.Agents.A365.DevTools.MockToolingServer/snapshots/*.snapshot.json New authoritative tool catalogs captured from live servers (4 files).
src/Microsoft.Agents.A365.DevTools.MockToolingServer/snapshots/README.md New documentation describing snapshot schema and update flow.
src/Microsoft.Agents.A365.DevTools.MockToolingServer/mocks/mcp_CalendarTools.json Renamed/adjusted CalendarTools tool entries.
src/Microsoft.Agents.A365.DevTools.MockToolingServer/mocks/mcp_MailTools.json Renamed MailTools tool entries and added FlagEmail.
src/Microsoft.Agents.A365.DevTools.MockToolingServer/mocks/mcp_MeServer.json Renamed MeServer tools to match live names.
src/Microsoft.Agents.A365.DevTools.MockToolingServer/mocks/mcp_KnowledgeTools.json Replaced placeholder tools with live server snake_case tools.
src/Microsoft.Agents.A365.DevTools.MockToolingServer/README.md Updated server/tool documentation and described the fidelity contract.
src/Microsoft.Agents.A365.DevTools.MockToolingServer/design.md Updated design doc/diagram to include snapshots + fidelity testing.
Comments suppressed due to low confidence (2)

src/Microsoft.Agents.A365.DevTools.MockToolingServer/mocks/mcp_MailTools.json:489

  • The mock SearchMessages tool schema/description still reflects a KQL-based Graph search (queryString, optional fields), but the real-server snapshot defines a natural-language message field that is required (plus conversationId). To prevent schema-mismatch when switching from mock to real servers, update the mock’s inputSchema/description to match the snapshot contract.
    "name": "SearchMessages",
    "description": "Search Outlook messages using Microsoft Graph Search API with KQL-style queries.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "queryString": {
          "type": "string",
          "description": "KQL-style search string (e.g., 'contoso OR from:user@example.com')"
        },

src/Microsoft.Agents.A365.DevTools.MockToolingServer/mocks/mcp_MeServer.json:72

  • The mock GetUserDetails tool uses userId as the required parameter, but the snapshot contract for mcp_MeServer requires userIdentifier (name/email/id) and includes additional optional fields (select, expand). This mismatch can cause agents built against the mock to send the wrong argument name to the real server. Update the mock inputSchema to match the snapshot.
    "name": "GetUserDetails",
    "description": "Get the profile of a specific user by their ID or userPrincipalName.",
    "inputSchema": {
      "type": "object",
      "properties": {
        "userId": {
          "type": "string",
          "description": "The unique identifier (GUID) or userPrincipalName (email) of the user to retrieve."
        },
        "select": {
          "type": "string",
          "description": "Comma-separated list of properties to return."
        }
      },
      "required": ["userId"]
    },

"name": "cancelEvent",
"description": "Cancel an event in a specified user's calendar and notify attendees.",
"name": "ListCalendarView",
"description": "Retrieve events from a user's calendar view. Use this tool whenever you need to retrieve one meeting instance of a recurrening event(not master series) occurring in a window (e.g., 'tomorrow morning' or 'between 2 PM and 4 PM') before calling any tool that modifies, updates, or cancels a meeting.",
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListCalendarView’s description has a typo: “recurrening” should be “recurring”.

Suggested change
"description": "Retrieve events from a user's calendar view. Use this tool whenever you need to retrieve one meeting instance of a recurrening event(not master series) occurring in a window (e.g., 'tomorrow morning' or 'between 2 PM and 4 PM') before calling any tool that modifies, updates, or cancels a meeting.",
"description": "Retrieve events from a user's calendar view. Use this tool whenever you need to retrieve one meeting instance of a recurring event(not master series) occurring in a window (e.g., 'tomorrow morning' or 'between 2 PM and 4 PM') before calling any tool that modifies, updates, or cancels a meeting.",

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +76
"name": "ListCalendarView",
"description": "Retrieve events from a user's calendar view. Use this tool whenever you need to retrieve one meeting instance of a recurrening event(not master series) occurring in a window (e.g., 'tomorrow morning' or 'between 2 PM and 4 PM') before calling any tool that modifies, updates, or cancels a meeting.",
"inputSchema": {
"type": "object",
"properties": {
"userId": {
"type": "string",
"description": "The ID or userPrincipalName of the user who owns the event.",
"description": "",
"x-ms-location": "path",
"x-ms-path": "userId"
},
"eventId": {
"startDateTime": {
"type": "string",
"description": "The unique identifier of the event to cancel.",
"x-ms-location": "path",
"x-ms-path": "eventId"
"description": "Start of the time range (ISO 8601). Should be today / after today.",
"x-ms-location": "query",
"x-ms-path": "startDateTime"
},
"comment": {
"endDateTime": {
"type": "string",
"description": "Optional message to include in the cancellation notification to attendees.",
"x-ms-location": "body",
"x-ms-path": "comment"
"description": "End of the time range (ISO 8601).should be after startDateTime.",
"x-ms-location": "query",
"x-ms-path": "endDateTime"
},
"top": {
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock ListCalendarView inputSchema uses userId and requires startDateTime/endDateTime, but the snapshot contract expects a required userIdentifier and different optional fields (timeZone, subject, select, etc.). This mismatch can still produce schema-mismatch errors when an agent switches from mock to the real MCP server; align the mock schema with the snapshot.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +24
"$schema": "mock-snapshot-schema",
"capturedAt": "<ISO 8601 UTC timestamp, or \"UNPOPULATED\">",
"serverName": "<MCP server name, e.g. mcp_CalendarTools>",
"sourceNote": "Run MockToolSnapshotCaptureTests with MCP_BEARER_TOKEN set to populate this file.",
"tools": [
{
"name": "<tool name>",
"description": "<tool description>",
"inputSchema": { <JSON Schema object> }
}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented snapshot schema includes a sourceNote field, but the checked-in snapshot JSON files (and WriteSnapshot in MockToolSnapshotCaptureTests) don’t write or include it. Either update the schema documentation to match reality or add sourceNote to the snapshot writer and files so the README stays accurate.

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +20
"name": "ListEvents",
"description": "Retrieve a list of events in a user's calendar.For recurring meetings, only return one / first record with full recurrence details (pattern, start, end) to the agent.For searching by meeting title, filter using contains(subject,'X'); avoid 'eq' or startswith(subject,'X') filter for this case.Use this tool to find existing meetings whenever the user refers to a meeting by day, date, time , or title (e.g., \"add someone to the architecture review at 2 PM\"), before calling any tool that modifies, updates, or cancels a meeting.",
"inputSchema": {
"type": "object",
"properties": {
"userId": {
"type": "string",
"description": "The ID or userPrincipalName of the user.",
"description": "If no organizer is specified, use current user. If organizer is explicitly mentioned - retrieve their user principal name and use that value.",
"x-ms-location": "path",
"x-ms-path": "userId"
},
"eventId": {
"startDateTime": {
"type": "string",
"description": "The ID of the event to accept.",
"x-ms-location": "path",
"x-ms-path": "eventId"
"description": "The start of the time range for the events (ISO 8601 format). Should be today / after today.",
"x-ms-location": "query",
"x-ms-path": "startDateTime"
},
"comment": {
"endDateTime": {
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock ListEvents schema doesn’t match the real-server snapshot (it includes userId, OData $filter/$orderby, and omits snapshot fields like meetingTitle, attendeeEmails, timeZone, select). If the goal is to prevent schema drift/tool call failures when switching to live MCP servers, the mock inputSchema here should be aligned with the snapshot contract.

Copilot uses AI. Check for mistakes.
using System.Text.Json;
using System.Text.Json.Serialization;
using FluentAssertions;
using Microsoft.Agents.A365.DevTools.MockToolingServer.MockTools;
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Microsoft.Agents.A365.DevTools.MockToolingServer.MockTools using appears to be unused in this test file. Because the repo builds with TreatWarningsAsErrors=true, the unnecessary using directive will fail the build (CS8019). Remove the unused using or reference a type from it if intended.

Suggested change
using Microsoft.Agents.A365.DevTools.MockToolingServer.MockTools;

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +133
var content = new StringContent(requestBody, System.Text.Encoding.UTF8, "application/json");
var response = await client.PostAsync($"{McpBaseUrl}/agents/servers/{serverName}", content);
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HttpResponseMessage (and the StringContent request body) are not disposed in FetchLiveToolsAsync. With analyzers and TreatWarningsAsErrors, this can trigger disposal warnings (and it can leak resources when iterating servers). Use using var content = ...; and using var response = await client.PostAsync(...); (or await using) so both are disposed after reading the body.

Suggested change
var content = new StringContent(requestBody, System.Text.Encoding.UTF8, "application/json");
var response = await client.PostAsync($"{McpBaseUrl}/agents/servers/{serverName}", content);
using var content = new StringContent(requestBody, System.Text.Encoding.UTF8, "application/json");
using var response = await client.PostAsync($"{McpBaseUrl}/agents/servers/{serverName}", content);

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +70
var mockTools = LoadEnabledMockTools(snapshot.ServerName);
var mockToolNames = new HashSet<string>(mockTools.Select(t => t.Name));

// Act & Assert - every snapshot tool must exist in the mock
foreach (var snapshotTool in snapshot.Tools)
{
mockToolNames.Should().Contain(
snapshotTool.Name,
$"Snapshot tool '{snapshotTool.Name}' for server '{snapshot.ServerName}' " +
$"is missing from the mock definition. Add it to mocks/{snapshot.ServerName}.json.");
}
}

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fidelity tests currently only assert tool name coverage. They do not validate that mock inputSchema (especially required fields / property names) matches the snapshot, even though the PR/docs describe preventing schema drift. This leaves real schema-mismatch regressions undetected (e.g., MailTools SearchMessages snapshot requires message, but the mock schema uses queryString). Consider extending these tests to compare required fields (and ideally the full inputSchema) for each tool.

Suggested change
var mockTools = LoadEnabledMockTools(snapshot.ServerName);
var mockToolNames = new HashSet<string>(mockTools.Select(t => t.Name));
// Act & Assert - every snapshot tool must exist in the mock
foreach (var snapshotTool in snapshot.Tools)
{
mockToolNames.Should().Contain(
snapshotTool.Name,
$"Snapshot tool '{snapshotTool.Name}' for server '{snapshot.ServerName}' " +
$"is missing from the mock definition. Add it to mocks/{snapshot.ServerName}.json.");
}
}
var mockTools = LoadEnabledMockTools(snapshot.ServerName).ToList();
var mockToolsByName = mockTools.ToDictionary(t => t.Name, StringComparer.OrdinalIgnoreCase);
// Act & Assert - every snapshot tool must exist in the mock and have a compatible inputSchema
foreach (var snapshotTool in snapshot.Tools)
{
mockToolsByName.Should().ContainKey(
snapshotTool.Name,
$"Snapshot tool '{snapshotTool.Name}' for server '{snapshot.ServerName}' " +
$"is missing from the mock definition. Add it to mocks/{snapshot.ServerName}.json.");
var mockTool = mockToolsByName[snapshotTool.Name];
if (TryGetInputSchema(snapshotTool, out var snapshotSchema))
{
TryGetInputSchema(mockTool, out var mockSchema).Should().BeTrue(
$"Mock tool '{snapshotTool.Name}' for server '{snapshot.ServerName}' " +
"must define an inputSchema when the snapshot tool does.");
GetSchemaRequiredAndPropertyNames(snapshotSchema, out var snapshotRequired, out var snapshotPropertyNames);
GetSchemaRequiredAndPropertyNames(mockSchema, out var mockRequired, out var mockPropertyNames);
mockRequired.Should().BeEquivalentTo(
snapshotRequired,
$"Required fields for tool '{snapshotTool.Name}' on server '{snapshot.ServerName}' " +
"must match between snapshot and mock inputSchema.");
mockPropertyNames.Should().BeEquivalentTo(
snapshotPropertyNames,
$"Property names for tool '{snapshotTool.Name}' on server '{snapshot.ServerName}' " +
"must match between snapshot and mock inputSchema.");
}
}
}
private static bool TryGetInputSchema(object tool, out JsonElement inputSchema)
{
inputSchema = default;
if (tool is null)
{
return false;
}
var inputSchemaProperty = tool.GetType().GetProperty("InputSchema");
if (inputSchemaProperty is null)
{
return false;
}
var value = inputSchemaProperty.GetValue(tool);
if (value is JsonElement element)
{
if (element.ValueKind == JsonValueKind.Undefined || element.ValueKind == JsonValueKind.Null)
{
return false;
}
inputSchema = element.Clone();
return true;
}
if (value is string json && !string.IsNullOrWhiteSpace(json))
{
using var document = JsonDocument.Parse(json, JsonOptions);
inputSchema = document.RootElement.Clone();
return true;
}
return false;
}
private static void GetSchemaRequiredAndPropertyNames(
JsonElement schema,
out HashSet<string> required,
out HashSet<string> propertyNames)
{
required = new HashSet<string>(StringComparer.Ordinal);
propertyNames = new HashSet<string>(StringComparer.Ordinal);
if (schema.ValueKind != JsonValueKind.Object)
{
return;
}
if (schema.TryGetProperty("required", out var requiredElement) &&
requiredElement.ValueKind == JsonValueKind.Array)
{
foreach (var item in requiredElement.EnumerateArray())
{
if (item.ValueKind == JsonValueKind.String)
{
var name = item.GetString();
if (!string.IsNullOrEmpty(name))
{
required.Add(name);
}
}
}
}
if (schema.TryGetProperty("properties", out var propertiesElement) &&
propertiesElement.ValueKind == JsonValueKind.Object)
{
foreach (var property in propertiesElement.EnumerateObject())
{
if (!string.IsNullOrEmpty(property.Name))
{
propertyNames.Add(property.Name);
}
}
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +43
### What the mock guarantees

Every tool exposed by a real M365 MCP server is present in the corresponding mock with the same name, same casing, and same required input fields. This ensures that agents developed against the mock will not encounter missing-tool or schema-mismatch errors when switched to a real server.

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section claims the mock matches the real server’s “required input fields”, but the current CI fidelity tests only check tool names and several mock inputSchema definitions differ from snapshots (e.g., MailTools SearchMessages, MeServer GetUserDetails). Either tighten the fidelity contract/tests to validate schemas/required fields, or adjust this documentation to reflect what is actually enforced.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: mock MCP tool names have drifted from real M365 MCP server contracts

2 participants