Skip to content

.NET: [BREAKING] Support archive-type skills in AgentMcpSkillsSource#6631

Open
SergeyMenshykh wants to merge 7 commits into
microsoft:mainfrom
SergeyMenshykh:sergeymenshykh/mcp-archive-skills
Open

.NET: [BREAKING] Support archive-type skills in AgentMcpSkillsSource#6631
SergeyMenshykh wants to merge 7 commits into
microsoft:mainfrom
SergeyMenshykh:sergeymenshykh/mcp-archive-skills

Conversation

@SergeyMenshykh

@SergeyMenshykh SergeyMenshykh commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Motivation & Context

The MCP skills source previously discovered only skill-md index entries, where a skill's SKILL.md and sibling resources are fetched on demand from the server. The Agent Skills discovery spec also defines archive entries, where a skill is distributed as a single packaged archive (ZIP / TAR / gzip-compressed TAR) that unpacks into the skill's namespace. Without archive support, skills published in that format are silently unusable.

This change adds archive-type skill support so an MCP server can advertise packaged skills, which are downloaded, safely unpacked to a local directory, and served like file-based skills - while keeping the strict guarantee that MCP-delivered scripts are never executed.

Description & Review Guide

  • What are the major changes?

    • AgentMcpSkillsSource now dispatches index entries to per-type loaders via a new IMcpSkillEntryLoader strategy: SkillMdEntryLoader (existing) and ArchiveEntryLoader (new).
    • ArchiveEntryLoader downloads, extracts, and serves each archive skill through an internal AgentFileSkillsSource, pruning directories the server no longer advertises.
    • AgentMcpSkillArchiveExtractor does the hardened unpacking: path-traversal guards, link skipping, and size/count limits.
    • New AgentMcpSkillsSourceOptions for the extraction directory, resource extensions/depth, and limits.
  • What is the impact of these changes?

    • Additive only. skill-md discovery is unchanged; entry types with no registered loader are skipped as before. Archive-bundled scripts are surfaced as readable resources only - the inner file source is created with no allowed script extensions and no script runner, so they are never discovered as runnable scripts.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • All unit tests pass, and I have added new tests where possible
  • The PR follows the Contribution Guidelines
  • This PR is linked to an issue and there is no other open PR for this issue (see Related Issue above).
  • This is not a breaking change. If it is a breaking change, add the breaking change label (or add "[BREAKING]" to the title prefix, before or after any language prefix) - a workflow keeps the label and title prefix in sync automatically.

Closes: #6077

Add archive-type skill discovery to the MCP skills source. Index entries
are dispatched to per-type loaders (skill-md and archive) via a new
IMcpSkillEntryLoader strategy. The archive loader downloads, safely
unpacks, and serves packaged skills through an internal file skills
source, while ensuring MCP-delivered scripts are never executed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 19, 2026 15:44
@moonbox3 moonbox3 added the .NET Issues related to the .NET codebase label Jun 19, 2026
@SergeyMenshykh SergeyMenshykh self-assigned this Jun 19, 2026
@SergeyMenshykh SergeyMenshykh added the skills Issues related to skills label Jun 19, 2026
@SergeyMenshykh SergeyMenshykh moved this to In Review in Agent Framework Jun 19, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Flagged issue

ArchiveEntryLoader.cs:81-84 delegates each extracted archive root to AgentFileSkillsSource, which recursively treats nested SKILL.md files as separate skills (AgentFileSkillsSource.cs:160-176). This conflicts with the archive loader's single-namespace contract (lines 17-21) and can cause one archive entry to surface multiple unadvertised skills.


Source: automated DevFlow PR review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 82%

✓ Security Reliability

The archive extraction security hardening is comprehensive and well-implemented. Zip-slip protection (path containment checks), symlink/hardlink skipping in TAR, decompression-bomb mitigation (streaming byte budget via CopyWithLimit), file-count caps, archive-size limits, and script-execution prevention (empty AllowedScriptExtensions + null scriptRunner) are all correctly applied. The concurrent-refresh pattern using CAS is sound. One low-severity reliability concern exists in the cancellation token handling of the coalesced-refresh pattern.

✓ Test Coverage

The test suite is comprehensive for the core archive extraction and integration scenarios (zip-slip, decompression bombs, link skipping, pruning, size limits). However, there are notable gaps: (1) DetectFormat has no dedicated unit tests despite having ~50 lines of branching logic covering magic bytes, MIME types, and URL fallback; (2) the new RefreshInterval caching mechanism introduced in AgentMcpSkillsSource has zero test coverage; (3) TAR path-traversal is not explicitly tested (only ZIP zip-slip and TAR link-skipping are covered).

✓ Failure Modes

The PR is well-structured with thorough security hardening (zip-slip guards, link skipping, decompression-bomb limits) and comprehensive test coverage. The main operational concern is the CAS-based refresh coalescing pattern in GetSkillsAsync, which propagates one caller's OperationCanceledException to all concurrent waiters as a faulted (not cancelled) exception. This is a known trade-off in coalescing patterns and is mitigated by the fact that calers of GetSkillsAsync catching OperationCanceledException will still handle it correctly in most scenarios. No blocking issues found.

✗ Design Approach

I found one design-level issue in the archive flow: a single archive index entry is supposed to map to one extracted skill namespace, but the new implementation hands each extracted root to AgentFileSkillsSource, which recursively discovers every nested SKILL.md under that tree. That means some valid archives will silently materialize extra skills that were never advertised by the MCP index.

Flagged Issues

  • ArchiveEntryLoader.cs:81-84 delegates each extracted archive root to AgentFileSkillsSource, which recursively treats nested SKILL.md files as separate skills (AgentFileSkillsSource.cs:160-176). This conflicts with the archive loader's single-namespace contract (lines 17-21) and can cause one archive entry to surface multiple unadvertised skills.

Suggestions

  • In GetSkillsAsync (AgentMcpSkillsSource.cs:106-109), when the winning thread's CancellationToken fires, the OperationCanceledException is propagated to all coalesced waiters via tcs.TrySetException, producing a faulted task rather than a cancelled one. Consider using tcs.TrySetCanceled() for OperationCanceledException so coalesced waiters see a properly-cancelled Task. This matters for callers that distinguish task.IsCanceled from task.IsFaulted.

Automated review by SergeyMenshykh's agents

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for MCP “archive”-distributed Agent Skills in the .NET MCP skills source, including hardened extraction, on-disk reconciliation, and new tests/options for controlling extraction and resource discovery.

Changes:

  • Introduces an IMcpSkillEntryLoader strategy model and dispatches index entries by type (skill-md, archive).
  • Adds archive download + safe extraction (zip, tar, tar.gz) with path-traversal/link skipping and size/count limits, backed by an internal file-skill source.
  • Adds options and unit tests covering archive discovery, pruning behavior, and extraction hardening.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
dotnet/src/Microsoft.Agents.AI.Mcp/Skills/AgentMcpSkillsSource.cs Dispatches index entries to per-type loaders; adds refresh/caching logic.
dotnet/src/Microsoft.Agents.AI.Mcp/Skills/AgentMcpSkillsSourceOptions.cs Adds configuration for archive extraction directory, resource filtering, limits, and refresh interval.
dotnet/src/Microsoft.Agents.AI.Mcp/Skills/Loaders/IMcpSkillEntryLoader.cs Defines the loader strategy interface for index entry materialization.
dotnet/src/Microsoft.Agents.AI.Mcp/Skills/Loaders/SkillMdEntryLoader.cs Implements skill-md loading using existing on-demand MCP resource reads.
dotnet/src/Microsoft.Agents.AI.Mcp/Skills/Loaders/ArchiveEntryLoader.cs Implements archive loading: download, extract, prune, and discover via file skills source.
dotnet/src/Microsoft.Agents.AI.Mcp/Skills/Loaders/ArchiveFormat.cs Adds supported archive format enum used by extraction/detection.
dotnet/src/Microsoft.Agents.AI.Mcp/Skills/Loaders/AgentMcpSkillArchiveExtractor.cs Implements hardened extraction with zip-slip guards and size/count limits.
dotnet/src/Microsoft.Agents.AI.Mcp/Skills/AgentSkillsProviderBuilderMcpExtensions.cs Extends builder integration to pass MCP skills source options.
dotnet/src/Microsoft.Agents.AI/Skills/AgentSkillsProviderBuilder.cs Adds a UseSource overload that can receive the builder logger factory at build time.
dotnet/src/Microsoft.Agents.AI/Skills/File/AgentFileSkillsSource.cs Changes visibility to allow MCP archive loader to reuse file-based discovery.
dotnet/tests/Microsoft.Agents.AI.Mcp.UnitTests/Skills/AgentMcpSkillsSourceTests.cs Updates/adjusts MCP skills source unit tests for new archive behavior.
dotnet/tests/Microsoft.Agents.AI.Mcp.UnitTests/Skills/AgentMcpSkillsSourceArchiveTests.cs Adds comprehensive tests for archive discovery, pruning, and extraction hardening.

Comment thread dotnet/src/Microsoft.Agents.AI.Mcp/Skills/AgentMcpSkillsSource.cs
Cast null! to AgentSkillsSource to disambiguate from the new
Func<ILoggerFactory?, AgentSkillsSource> overload.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@SergeyMenshykh SergeyMenshykh changed the title .NET: Support archive-type skills in AgentMcpSkillsSource .NET: [Breaking] Support archive-type skills in AgentMcpSkillsSource Jun 19, 2026
@moonbox3 moonbox3 added the breaking change Introduces changes that are not backward compatible and may require updates to dependent code. label Jun 19, 2026
@github-actions github-actions Bot changed the title .NET: [Breaking] Support archive-type skills in AgentMcpSkillsSource .NET: [BREAKING] Support archive-type skills in AgentMcpSkillsSource Jun 19, 2026
…sException in Dispose

- Remove hardcoded '50' from test comment; it now says 'default cap'
  without citing a specific number that can drift from the constant.
- Catch UnauthorizedAccessException alongside IOException in test
  Dispose for robust cleanup.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
semenshi and others added 2 commits June 19, 2026 17:32
Use CancellationToken.None for the shared refresh so one caller's
cancellation does not abort work for all concurrent waiters. Waiters
use WaitAsync(cancellationToken) to cancel independently. The refresh
owner checks its own token after publishing the result.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@SergeyMenshykh

Copy link
Copy Markdown
Contributor Author

ArchiveEntryLoader.cs:81-84 delegates each extracted archive root to AgentFileSkillsSource, which recursively treats nested SKILL.md files as separate skills (AgentFileSkillsSource.cs:160-176). This conflicts with the archive loader's single-namespace contract (lines 17-21) and can cause one archive entry to surface multiple unadvertised skills.

The fix for this issue is outside the scope of this PR and will be addressed separately.

@SergeyMenshykh SergeyMenshykh marked this pull request as ready for review June 19, 2026 17:43

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 84%

✓ Correctness

The PR is well-implemented with strong security hardening (zip-slip guards, link skipping, size/count limits, CopyWithLimit as authoritative decompression bomb defense). The concurrency design in GetSkillsAsync properly decouples refresh execution from per-caller cancellation. The resolved review comments (breaking change acknowledgment, UnauthorizedAccessException in Dispose, cancellation decoupling) have all been addressed. No significant correctness bugs found.

✓ Test Coverage

The archive extraction and skill discovery paths are thoroughly tested with good security coverage (zip-slip, link skipping, decompression bombs, size limits). However, the new caching/refresh-interval logic and the concurrent-refresh CAS pattern in AgentMcpSkillsSource have no test coverage at all — this is non-trivial stateful concurrency code that could regress silently. The DetectFormat method's many branches and the UseSource factory overload also lack direct tests.

✓ Failure Modes

The PR is well-implemented overall with solid security hardening (path traversal, link skipping, decompression bomb limits) and proper cancellation handling (resolved review comment). The main structural concern is that the loader dispatch loop in GetCoreSkillsAsync lacks fault isolation: an unhandled exception from one loader (e.g., ArchiveEntryLoader failing at Directory.CreateDirectory) propagates and discards skills already loaded by other loaders.

✗ Design Approach

I found one design-level correctness issue in the archive reconciliation flow: pruning is driven by the post-validation archive list, so a still-advertised archive skill with transiently malformed metadata can be treated as "no longer advertised" and have its extracted directory deleted. I also have one non-blocking API-surface concern: the change makes AgentFileSkillsSource public to satisfy an internal cross-assembly dependency, while the repo already uses friend-assembly access for that pattern elsewhere.

Flagged Issues

  • ArchiveEntryLoader.LoadAsync reconciles/prunes on-disk directories from the filtered archiveEntries set, so an archive skill that is still present in skill://index.json but temporarily fails validation (e.g., missing url) is treated as unadvertised and its extracted directory is deleted. That conflicts with the documented ArchiveSkillsDirectory contract in AgentMcpSkillsSourceOptions.cs:25-28, which says pruning applies only to subdirectories the MCP server no longer advertises.

Suggestions

  • Consider keeping AgentFileSkillsSource internal and granting Microsoft.Agents.AI.Mcp friend access instead of widening the public API. The repo already uses that pattern for cross-assembly implementation sharing (e.g., Microsoft.Agents.AI.Workflows.csproj:28-31).

Automated review by SergeyMenshykh's agents

@github-actions

Copy link
Copy Markdown
Contributor

Flagged issue

ArchiveEntryLoader.LoadAsync reconciles/prunes on-disk directories from the filtered archiveEntries set, so an archive skill that is still present in skill://index.json but temporarily fails validation (e.g., missing url) is treated as unadvertised and its extracted directory is deleted. That conflicts with the documented ArchiveSkillsDirectory contract in AgentMcpSkillsSourceOptions.cs:25-28, which says pruning applies only to subdirectories the MCP server no longer advertises.


Source: automated DevFlow PR review

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@SergeyMenshykh

Copy link
Copy Markdown
Contributor Author

ArchiveEntryLoader.LoadAsync reconciles/prunes on-disk directories from the filtered archiveEntries set, so an archive skill that is still present in skill://index.json but temporarily fails validation (e.g., missing url) is treated as unadvertised and its extracted directory is deleted. That conflicts with the documented ArchiveSkillsDirectory contract in AgentMcpSkillsSourceOptions.cs:25-28, which says pruning applies only to subdirectories the MCP server no longer advertises.

An entry with a missing url or invalid name can't be materialized - keeping its stale directory around would mean serving potentially outdated content for a skill the server can no longer provide correctly. Pruning it and re-downloading when the entry is fixed again is the safer default. I'll update the doc comment to clarify that pruning applies to entries that aren't actionable, not just unadvertised ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change Introduces changes that are not backward compatible and may require updates to dependent code. .NET Issues related to the .NET codebase skills Issues related to skills

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

.NET: Support MCP skills of archive type

4 participants