.NET: [BREAKING] Support archive-type skills in AgentMcpSkillsSource#6631
.NET: [BREAKING] Support archive-type skills in AgentMcpSkillsSource#6631SergeyMenshykh wants to merge 7 commits into
Conversation
Add archive-type skill discovery to the MCP skills source. Index entries are dispatched to per-type loaders (skill-md and archive) via a new IMcpSkillEntryLoader strategy. The archive loader downloads, safely unpacks, and serves packaged skills through an internal file skills source, while ensuring MCP-delivered scripts are never executed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Flagged issue
Source: automated DevFlow PR review |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 82%
✓ Security Reliability
The archive extraction security hardening is comprehensive and well-implemented. Zip-slip protection (path containment checks), symlink/hardlink skipping in TAR, decompression-bomb mitigation (streaming byte budget via CopyWithLimit), file-count caps, archive-size limits, and script-execution prevention (empty AllowedScriptExtensions + null scriptRunner) are all correctly applied. The concurrent-refresh pattern using CAS is sound. One low-severity reliability concern exists in the cancellation token handling of the coalesced-refresh pattern.
✓ Test Coverage
The test suite is comprehensive for the core archive extraction and integration scenarios (zip-slip, decompression bombs, link skipping, pruning, size limits). However, there are notable gaps: (1)
DetectFormathas no dedicated unit tests despite having ~50 lines of branching logic covering magic bytes, MIME types, and URL fallback; (2) the newRefreshIntervalcaching mechanism introduced inAgentMcpSkillsSourcehas zero test coverage; (3) TAR path-traversal is not explicitly tested (only ZIP zip-slip and TAR link-skipping are covered).
✓ Failure Modes
The PR is well-structured with thorough security hardening (zip-slip guards, link skipping, decompression-bomb limits) and comprehensive test coverage. The main operational concern is the CAS-based refresh coalescing pattern in GetSkillsAsync, which propagates one caller's OperationCanceledException to all concurrent waiters as a faulted (not cancelled) exception. This is a known trade-off in coalescing patterns and is mitigated by the fact that calers of GetSkillsAsync catching OperationCanceledException will still handle it correctly in most scenarios. No blocking issues found.
✗ Design Approach
I found one design-level issue in the archive flow: a single
archiveindex entry is supposed to map to one extracted skill namespace, but the new implementation hands each extracted root toAgentFileSkillsSource, which recursively discovers every nestedSKILL.mdunder that tree. That means some valid archives will silently materialize extra skills that were never advertised by the MCP index.
Flagged Issues
-
ArchiveEntryLoader.cs:81-84delegates each extracted archive root toAgentFileSkillsSource, which recursively treats nestedSKILL.mdfiles as separate skills (AgentFileSkillsSource.cs:160-176). This conflicts with the archive loader's single-namespace contract (lines 17-21) and can cause one archive entry to surface multiple unadvertised skills.
Suggestions
- In
GetSkillsAsync(AgentMcpSkillsSource.cs:106-109), when the winning thread's CancellationToken fires, theOperationCanceledExceptionis propagated to all coalesced waiters viatcs.TrySetException, producing a faulted task rather than a cancelled one. Consider usingtcs.TrySetCanceled()forOperationCanceledExceptionso coalesced waiters see a properly-cancelled Task. This matters for callers that distinguishtask.IsCanceledfromtask.IsFaulted.
Automated review by SergeyMenshykh's agents
There was a problem hiding this comment.
Pull request overview
Adds support for MCP “archive”-distributed Agent Skills in the .NET MCP skills source, including hardened extraction, on-disk reconciliation, and new tests/options for controlling extraction and resource discovery.
Changes:
- Introduces an
IMcpSkillEntryLoaderstrategy model and dispatches index entries bytype(skill-md,archive). - Adds archive download + safe extraction (
zip,tar,tar.gz) with path-traversal/link skipping and size/count limits, backed by an internal file-skill source. - Adds options and unit tests covering archive discovery, pruning behavior, and extraction hardening.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| dotnet/src/Microsoft.Agents.AI.Mcp/Skills/AgentMcpSkillsSource.cs | Dispatches index entries to per-type loaders; adds refresh/caching logic. |
| dotnet/src/Microsoft.Agents.AI.Mcp/Skills/AgentMcpSkillsSourceOptions.cs | Adds configuration for archive extraction directory, resource filtering, limits, and refresh interval. |
| dotnet/src/Microsoft.Agents.AI.Mcp/Skills/Loaders/IMcpSkillEntryLoader.cs | Defines the loader strategy interface for index entry materialization. |
| dotnet/src/Microsoft.Agents.AI.Mcp/Skills/Loaders/SkillMdEntryLoader.cs | Implements skill-md loading using existing on-demand MCP resource reads. |
| dotnet/src/Microsoft.Agents.AI.Mcp/Skills/Loaders/ArchiveEntryLoader.cs | Implements archive loading: download, extract, prune, and discover via file skills source. |
| dotnet/src/Microsoft.Agents.AI.Mcp/Skills/Loaders/ArchiveFormat.cs | Adds supported archive format enum used by extraction/detection. |
| dotnet/src/Microsoft.Agents.AI.Mcp/Skills/Loaders/AgentMcpSkillArchiveExtractor.cs | Implements hardened extraction with zip-slip guards and size/count limits. |
| dotnet/src/Microsoft.Agents.AI.Mcp/Skills/AgentSkillsProviderBuilderMcpExtensions.cs | Extends builder integration to pass MCP skills source options. |
| dotnet/src/Microsoft.Agents.AI/Skills/AgentSkillsProviderBuilder.cs | Adds a UseSource overload that can receive the builder logger factory at build time. |
| dotnet/src/Microsoft.Agents.AI/Skills/File/AgentFileSkillsSource.cs | Changes visibility to allow MCP archive loader to reuse file-based discovery. |
| dotnet/tests/Microsoft.Agents.AI.Mcp.UnitTests/Skills/AgentMcpSkillsSourceTests.cs | Updates/adjusts MCP skills source unit tests for new archive behavior. |
| dotnet/tests/Microsoft.Agents.AI.Mcp.UnitTests/Skills/AgentMcpSkillsSourceArchiveTests.cs | Adds comprehensive tests for archive discovery, pruning, and extraction hardening. |
Cast null! to AgentSkillsSource to disambiguate from the new Func<ILoggerFactory?, AgentSkillsSource> overload. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sException in Dispose - Remove hardcoded '50' from test comment; it now says 'default cap' without citing a specific number that can drift from the constant. - Catch UnauthorizedAccessException alongside IOException in test Dispose for robust cleanup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use CancellationToken.None for the shared refresh so one caller's cancellation does not abort work for all concurrent waiters. Waiters use WaitAsync(cancellationToken) to cancel independently. The refresh owner checks its own token after publishing the result. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The fix for this issue is outside the scope of this PR and will be addressed separately. |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 84%
✓ Correctness
The PR is well-implemented with strong security hardening (zip-slip guards, link skipping, size/count limits, CopyWithLimit as authoritative decompression bomb defense). The concurrency design in GetSkillsAsync properly decouples refresh execution from per-caller cancellation. The resolved review comments (breaking change acknowledgment, UnauthorizedAccessException in Dispose, cancellation decoupling) have all been addressed. No significant correctness bugs found.
✓ Test Coverage
The archive extraction and skill discovery paths are thoroughly tested with good security coverage (zip-slip, link skipping, decompression bombs, size limits). However, the new caching/refresh-interval logic and the concurrent-refresh CAS pattern in AgentMcpSkillsSource have no test coverage at all — this is non-trivial stateful concurrency code that could regress silently. The DetectFormat method's many branches and the UseSource factory overload also lack direct tests.
✓ Failure Modes
The PR is well-implemented overall with solid security hardening (path traversal, link skipping, decompression bomb limits) and proper cancellation handling (resolved review comment). The main structural concern is that the loader dispatch loop in GetCoreSkillsAsync lacks fault isolation: an unhandled exception from one loader (e.g., ArchiveEntryLoader failing at Directory.CreateDirectory) propagates and discards skills already loaded by other loaders.
✗ Design Approach
I found one design-level correctness issue in the archive reconciliation flow: pruning is driven by the post-validation archive list, so a still-advertised archive skill with transiently malformed metadata can be treated as "no longer advertised" and have its extracted directory deleted. I also have one non-blocking API-surface concern: the change makes
AgentFileSkillsSourcepublic to satisfy an internal cross-assembly dependency, while the repo already uses friend-assembly access for that pattern elsewhere.
Flagged Issues
-
ArchiveEntryLoader.LoadAsyncreconciles/prunes on-disk directories from the filteredarchiveEntriesset, so an archive skill that is still present inskill://index.jsonbut temporarily fails validation (e.g., missingurl) is treated as unadvertised and its extracted directory is deleted. That conflicts with the documentedArchiveSkillsDirectorycontract inAgentMcpSkillsSourceOptions.cs:25-28, which says pruning applies only to subdirectories the MCP server no longer advertises.
Suggestions
- Consider keeping
AgentFileSkillsSourceinternal and grantingMicrosoft.Agents.AI.Mcpfriend access instead of widening the public API. The repo already uses that pattern for cross-assembly implementation sharing (e.g.,Microsoft.Agents.AI.Workflows.csproj:28-31).
Automated review by SergeyMenshykh's agents
|
Flagged issue
Source: automated DevFlow PR review |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
An entry with a missing url or invalid name can't be materialized - keeping its stale directory around would mean serving potentially outdated content for a skill the server can no longer provide correctly. Pruning it and re-downloading when the entry is fixed again is the safer default. I'll update the doc comment to clarify that pruning applies to entries that aren't actionable, not just unadvertised ones. |
Motivation & Context
The MCP skills source previously discovered only
skill-mdindex entries, where a skill'sSKILL.mdand sibling resources are fetched on demand from the server. The Agent Skills discovery spec also definesarchiveentries, where a skill is distributed as a single packaged archive (ZIP / TAR / gzip-compressed TAR) that unpacks into the skill's namespace. Without archive support, skills published in that format are silently unusable.This change adds
archive-type skill support so an MCP server can advertise packaged skills, which are downloaded, safely unpacked to a local directory, and served like file-based skills - while keeping the strict guarantee that MCP-delivered scripts are never executed.Description & Review Guide
What are the major changes?
AgentMcpSkillsSourcenow dispatches index entries to per-type loaders via a newIMcpSkillEntryLoaderstrategy:SkillMdEntryLoader(existing) andArchiveEntryLoader(new).ArchiveEntryLoaderdownloads, extracts, and serves each archive skill through an internalAgentFileSkillsSource, pruning directories the server no longer advertises.AgentMcpSkillArchiveExtractordoes the hardened unpacking: path-traversal guards, link skipping, and size/count limits.AgentMcpSkillsSourceOptionsfor the extraction directory, resource extensions/depth, and limits.What is the impact of these changes?
skill-mddiscovery is unchanged; entry types with no registered loader are skipped as before. Archive-bundled scripts are surfaced as readable resources only - the inner file source is created with no allowed script extensions and no script runner, so they are never discovered as runnable scripts.Contribution Checklist
breaking changelabel (or add "[BREAKING]" to the title prefix, before or after any language prefix) - a workflow keeps the label and title prefix in sync automatically.Closes: #6077