Pass json-schema-ref-no-deref conformance scenario (SEP-2106)#2924
Conversation
SEP-2106 permits the full JSON Schema 2020-12 vocabulary in tool schemas, including $ref. A $ref resolving to a network URI is an SSRF / fetch-DoS vector, so per the spec implementations MUST NOT auto-dereference any $ref that is not a same-document reference. Add a shared ref-walker that rejects external $refs with ExternalSchemaRefError. The server now rejects them at tool registration; the client rejects them before validating a tool result instead of attempting to resolve them. Burn down the json-schema-ref-no-deref conformance scenario in both expected-failures files and add a driver handler for it.
Move the json-schema-ref-no-deref driver handler, both expected-failures updates, and the migration guide entry out of this PR; they will land separately. This PR is now scoped to the core SEP-2106 code and tests.
Move external-$ref rejection into StrictJsonSchema (now in its own _schema_generator.py) so it happens during schema generation, driven by the schema_generator parameter of model_json_schema, rather than walking the schema afterwards. The output-schema path re-raises ExternalSchemaRefError instead of swallowing it as an unserializable-type ValueError. Drop the standalone mcp.shared.json_schema_ref helper and the client-side check (the client receives schemas, it never generates them, and already never fetches network refs).
func_metadata already defined StrictJsonSchema; keep the SEP-2106 external-$ref rejection there instead of a separate _schema_generator module.
Subclassing Exception (not ValueError) keeps it from being swallowed by the schema-generation fallback that degrades unserializable types, so the explicit re-raise is no longer needed. Drop em-dashes from the docstrings.
Add a driver handler for the scenario (the missing handler was why it failed, not actual dereferencing) and remove it from both expected-failures baselines. Document the server-side external-$ref rejection in the migration guide.
The SEP requires implementations to never auto-dereference network $refs (a MUST NOT, already satisfied: the SDK has no $ref-fetching code) and to reject a schema only when validation hits an unresolved external $ref (a conditional SHOULD). Drop the eager registration-time rejection added earlier - it forbade legitimate, never-fetched external $refs, going beyond the spec. Keep the conformance burn-down and add a client test confirming an unexercised network $ref in an output schema is not dereferenced.
$ref in tool schemas (SEP-2106)The scenario only requires the client to call tools/list against a server advertising a network $ref; ClientSession never resolves $refs, so the existing initialize handler (initialize + list_tools) already satisfies it. Register it under both names instead of duplicating it.
One @register per function: revert the stacked decorators and add a dedicated handler. It just initializes and lists tools (ClientSession never resolves $refs), with a docstring explaining why that is sufficient.
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟡
src/mcp/server/mcpserver/utilities/func_metadata.py:33-40— BecauseExternalSchemaRefErrordeliberately doesn't subclassValueError, it escapes the graceful-degradationexcepttuple in_try_create_model_and_schemaand now also fails registration of prompts and resource templates whose return type carries an external$refviajson_schema_extra— even though those output schemas are never advertised to clients (Prompt.from_function/ResourceTemplate.from_functiononly use the arg model). Consider having those call sites passstructured_output=False, catchingExternalSchemaRefErrorin_try_create_model_and_schemafor non-tool callers, or enforcing the rejection only on actual tool schema paths.Extended reasoning...
What the bug is. SEP-2106 governs tool schemas, and the migration entry only documents tool registration failing. But
ExternalSchemaRefErrornow also breaks registration of prompts and resource templates whose function return type contains an external$ref, even though those output schemas are never sent on the wire — neitherPromptnorResourceTemplatestores theFuncMetadataoutput schema; they only usearg_modelfor parameter listing.The code path.
Prompt.from_function(src/mcp/server/mcpserver/prompts/base.py:106) andResourceTemplate.from_function(src/mcp/server/mcpserver/resources/templates.py:68) callfunc_metadata(fn, skip_names=...)with the defaultstructured_output=None.func_metadatathen auto-detects structured output from the return annotation and renders the output model viamodel.model_json_schema(schema_generator=StrictJsonSchema)inside_try_create_model_and_schema. After this PR,StrictJsonSchema.generate()raisesExternalSchemaRefErrorwhen the rendered schema contains an external$ref.Why existing code doesn't prevent it. The graceful-degradation
excepttuple in_try_create_model_and_schemais(PydanticUserError, TypeError, ValueError, SchemaError, ValidationError).ExternalSchemaRefErrordeliberately subclassesException(notValueError) precisely so it escapes that fallback on the tool path — but that decision means the only containment for non-tool callers offunc_metadatano longer applies either, so the exception propagates out offunc_metadata()and out ofPrompt.from_function/ResourceTemplate.from_function, failing registration.Step-by-step proof (reproduced empirically against the PR head):
class Profile(BaseModel): data: Annotated[dict[str, Any], Field(json_schema_extra={"$ref": "https://example.com/profile.json"})] @mcp.resource("resource://users/{user_id}") def get_user(user_id: str) -> Profile: ... # ResourceTemplate.from_function -> ExternalSchemaRefError @mcp.prompt() def my_prompt(topic: str) -> Profile: ... # Prompt.from_function -> ExternalSchemaRefError
Both registrations raise
ExternalSchemaRefError: Tool schema contains external $ref(s) that MUST NOT be dereferenced (SEP-2106): https://example.com/profile.json .... Before this PR both succeeded: an external$refinjson_schema_extraproduces no pydantic warning, so the output schema was generated and simply discarded.Impact. A behavioral regression outside the PR's stated SEP-2106 tool-schema scope: prompt/resource-template registration now fails for return types whose schema is never advertised to any client, with a misleading 'Tool schema contains external $ref(s)' message and no mention in
docs/migration.md(which only covers tool registration). Mitigating factors: the trigger requires deliberately injecting an external$refinto the return-type model of a prompt/resource function (uncommon), and the failure is loud at registration time rather than silent misbehavior — hence nit severity.How to fix. Options: (a) have
Prompt.from_functionandResourceTemplate.from_functionpassstructured_output=False, since they never use the output model; (b) catchExternalSchemaRefErrorin_try_create_model_and_schemafor callers that don't advertise the schema; or (c) enforce the external-$refrejection only on the actual tool schema paths (e.g. inTool.from_functionon the rendered input schema and onFuncMetadata.output_schemawhen it will actually be used as a tool output schema). This is distinct from the already-reported input-path warnings-to-errors change and the data-keyword false-positive: the trigger here is the return type of a non-tool callable.
# Conflicts: # .github/actions/conformance/expected-failures.2026-07-28.yml # .github/actions/conformance/expected-failures.yml
Patch socket.connect / create_connection to fail on any outbound connection while validating a tool result, proving the network $ref is never fetched.
The conformance scenario already proves no-fetch end to end with a real canary HTTP server; the unit test was a redundant local mirror.
|
Why the conformance wiring is the only change needed: SEP-2106's security requirement is a MUST NOT dereference network
So the scenario was already passing behaviourally; it failed in CI only because our conformance driver ( I deliberately did not add eager registration-time rejection of external The |
There was a problem hiding this comment.
LGTM — this revision only adds a conformance driver handler and burns down the two expected-failures baselines; no SDK code is touched.
Extended reasoning...
Overview
The PR now touches only three files under .github/actions/conformance/: a new json-schema-ref-no-deref scenario handler in the conformance driver (client.py), and removal of that scenario from the two expected-failures baselines. Earlier revisions of this branch modified SDK source (func_metadata.py, json_schema_ref.py, tools/base.py), but those changes have been dropped and my prior inline comments on them are resolved/moot — the final diff contains no src/ changes at all.
Security risks
None introduced by this diff. The change is CI test-harness wiring; it does not add any $ref-fetching or schema-handling code. The maintainer comment in the thread confirms the SDK already satisfies SEP-2106's MUST NOT (the jsonschema/referencing backend raises Unresolvable rather than fetching), and the conformance scenario was failing only because the driver lacked a registered handler.
Level of scrutiny
Low. The new handler mirrors the existing initialize handler pattern exactly (connect, initialize, list tools), and the baseline edits are mechanical removals. If the scenario did not actually pass, the conformance CI job would fail on the unexpected-failure check, so the change is self-verifying. The only finding is a nit about the PR description still mentioning a unit test that was deliberately dropped (commit c7522f2) — a documentation discrepancy, not a code defect.
Other factors
The conformance runner fails on stale baseline entries, so leaving the entries in place was not an option once the handler exists. The author reports both the 2025 and 2026-07-28 client legs pass locally, and the scope decision (no eager registration-time rejection) is explicitly discussed and endorsed in the thread.
| @register("json-schema-ref-no-deref") | ||
| async def run_json_schema_ref_no_deref(server_url: str) -> None: | ||
| """Initialize and list tools; the scenario fails only if the client fetches a network $ref. | ||
|
|
||
| ClientSession never walks inputSchema or resolves $refs, so listing is enough (SEP-2106). | ||
| """ | ||
| async with streamable_http_client(url=server_url) as (read_stream, write_stream): | ||
| async with ClientSession(read_stream, write_stream) as session: | ||
| await session.initialize() | ||
| await session.list_tools() |
There was a problem hiding this comment.
🟡 The PR description states that a client test was added confirming a network $ref in an output schema is validated without being dereferenced, but that test was deliberately dropped in commit c7522f2 ('Drop the Python no-deref unit test') and the final diff contains only the conformance handler and the two expected-failures baselines. This is a description/documentation discrepancy only — either update the description (including the Verification section) to reflect that coverage now lives solely in the external conformance scenario, or re-add the in-repo test.
Extended reasoning...
What the issue is. The PR description's "What changed" section claims: "Added a client test confirming a network $ref in an output schema, under a property the structured content does not set, is validated without being dereferenced." The Verification section likewise reads as if SDK-level test coverage exists. However, the final diff touches only three files — .github/actions/conformance/client.py, expected-failures.yml, and expected-failures.2026-07-28.yml — and contains no test file at all.
How this happened. The branch history shows the test did exist at one point: commit 2c47fc4 ("Guard the no-deref test with a socket tripwire") added/modified a test in tests/client/test_output_schema_validation.py, and the final commit c7522f2 ("Drop the Python no-deref unit test") deliberately removed it, on the rationale that the conformance scenario already proves the no-fetch behavior end to end. The drop was intentional and reasonable — but the PR description was never updated to match.
Step-by-step verification.
- The changed-files list for this PR is exactly:
.github/actions/conformance/client.py,.github/actions/conformance/expected-failures.yml,.github/actions/conformance/expected-failures.2026-07-28.yml. No file undertests/is touched. git logon the branch shows 2c47fc4 adding/guarding the unit test and c7522f2 removing it (≈43 lines deleted fromtests/client/test_output_schema_validation.py).- A search of
tests/for no-deref / dereference coverage of output-schema$refs finds nothing remaining. - Yet the description still says the client test was "Added", so a reviewer reading the description would expect an in-repo regression test that does not exist in the merged content.
Why it matters. Reviewers and future maintainers use the PR description to understand what coverage exists. As written, it overstates the in-repo coverage: the only thing asserting the no-dereference behavior is the external json-schema-ref-no-deref conformance scenario, which lives in the conformance harness rather than the SDK's own test suite. If that scenario is ever reorganized or the harness pin changes, there is no local regression test backing the behavior — and anyone relying on the description would believe there is.
This is not a code defect. All verifiers agreed the test removal itself was a defensible scoping decision (the unit test was a redundant local mirror of the conformance scenario), so this is purely a documentation discrepancy and does not block the PR.
How to fix. Either (a) update the PR description — remove or reword the "Added a client test…" bullet and adjust the Verification section to say the no-deref behavior is covered by the json-schema-ref-no-deref conformance scenario only — or (b) re-add the dropped unit test from 2c47fc4 if maintainers prefer in-repo coverage.
Closes #2903.
SEP-2106 permits the full JSON Schema 2020-12 vocabulary in tool schemas, including
$ref. A$refresolving to a network URI is an SSRF / fetch-DoS vector. The SEP requires:$refthat is not a same-document reference.$refSHOULD be rejected rather than treated as permissive.The SDK already satisfies the MUST NOT: there is no
$ref-fetching code anywhere, andjsonschema'sreferencingbackend raisesUnresolvablerather than performing an HTTP GET. So no$ref-resolution change is needed.What changed
json-schema-ref-no-derefdriver handler to the conformance client. Its absence (not actual dereferencing) was why the scenario failed; the SDK was already compliant.expected-failuresbaselines.$refin an output schema, under a property the structured content does not set, is validated without being dereferenced.Scope notes
$refs. The SEP only forbids dereferencing them; an external$refthat is never fetched is legitimate 2020-12 (the SEP lists$refamong the keywords it enables). Eager rejection would forbid valid schemas and go beyond the spec. The issue text says "server rejects unresolvable external$refat registration"; this PR follows the SEP's narrower, conditional SHOULD instead. Happy to revisit if maintainers prefer the stricter reading.structuredContentwidening to any JSON value already landed in Protocol types for 2026-07-28: superset monolith, committed per-version packages, and wire-method maps #2849; I audited the consumers and found no object/truthiness assumptions.Verification
json-schema-ref-no-derefpasses on both the 2025 and 2026-07-28 client legs locally, with "Baseline check passed: all failures are expected" (no stale-entry error). Full suite green, 100% coverage,strict-no-cover/pyright/ruff clean.AI Disclaimer
This PR was developed with the assistance of either Claude or Codex. I've reviewed and verified the changes.