Guidance for AI agents (and humans) contributing to the MCP conformance test framework.
A test harness that exercises MCP SDK implementations against the protocol spec. The coverage number that matters here is spec coverage — how much of the protocol the scenarios test.
Uses npm (not pnpm/yarn). Don't commit pnpm-lock.yaml or yarn.lock.
Open an issue first — whether you've hit a bug in the harness or want to propose a new scenario. For scenarios, sketch which part of the spec you want to cover and roughly how; for bugs, include the command you ran and the output. Either way, a short discussion up front beats review churn on a PR that overlaps existing work or heads in a direction we're not going.
Don't point an agent at the repo and ask it to "find bugs." Generic bug-hunting on a test harness produces low-signal PRs (typo fixes, unused-variable cleanups, speculative refactors). If you want to contribute via an agent, give it a concrete target:
- Pick a specific MUST or SHOULD from the MCP spec that has no scenario yet, and ask the agent to draft one.
- Pick an open issue and work on that.
The valuable contribution here is spec coverage, not harness polish.
The strongest rule in this repo: prefer one scenario with many checks over many scenarios with one check each.
Why:
- Each scenario often spins up its own HTTP server. These suites run in CI on every push for every SDK, so per-scenario overhead multiplies fast.
- Less code to maintain and update when the spec shifts.
- Progress on making an SDK better shows up as "pass 7/10 checks" rather than "pass 1 test, fail another" — finer-grained signal from the same run.
Ask: "Would it make sense for someone to implement a server/client that does just this scenario?"
If two scenarios would always be implemented together, merge them. Examples:
tools/list+ a simpletools/call→ one scenario- All content-type variants (image, audio, mixed, resource) → one scenario
- Full OAuth flow with token refresh → one scenario, not separate "basic" + "refresh" scenarios. A client that passes "basic" but not "refresh" just shows up as passing N−2 checks.
Keep scenarios separate when they're genuinely independent features or when they're mutually exclusive (e.g., an SDK should support writing a server that doesn't implement certain stateful features).
- Start with one end-to-end scenario covering the happy path with many checks along the way.
- Don't add "step 1 only" and "step 1+2" as separate scenarios — the second subsumes the first.
- Register the scenario in the appropriate suite list in
src/scenarios/index.ts(core,extensions,backcompat, etc.).
- Same
idfor SUCCESS and FAIL. A check should use one slug and flipstatus+errorMessage, not branch intofoo-successvsfoo-failureslugs. - Optimize for Ctrl+F on the slug. Repetitive check blocks are fine — easier to find the failing one than to unwind a clever helper.
- Reuse
ConformanceCheckand other types fromsrc/types.tsrather than defining parallel shapes. - Include
specReferencespointing to the relevant spec section.
Be precise about what's required vs optional. A scenario description that tests optional behavior should make that clear — e.g. "Tests that a client that wants a refresh token handles offline_access scope…" not "Tests that a client handles offline_access scope…". Don't accidentally promote a MAY/SHOULD to a MUST in the prose.
When in doubt about spec details (OAuth parameters, audiences, grant types), check the actual spec in modelcontextprotocol rather than guessing.
A new scenario should come with:
- A passing example — usually by extending
examples/clients/typescript/everything-client.tsor the everything-server, not a new file. - Evidence it fails when it should — ideally a negative example (a deliberately broken client), or at minimum a manual run showing the failure mode.
Delete unused example scenarios. If a scenario key in the everything-client has no corresponding test, remove it.
Use the existing CLI runner (npx @modelcontextprotocol/conformance client|server ...). If you need a feature the runner doesn't have, add it to the runner rather than building a parallel entry point.
npm run buildpassesnpm testpasses- For non-trivial scenario changes, run against at least one real SDK (typescript-sdk or python-sdk) to see actual output. For changes to shared infrastructure (runner, tier-check), test against go-sdk or csharp-sdk too.
- Scenario is registered in the right suite in
src/scenarios/index.ts