|
| 1 | +# AGENTS.md |
| 2 | + |
| 3 | +Guidance for AI agents (and humans) contributing to the MCP conformance test framework. |
| 4 | + |
| 5 | +## What this repo is |
| 6 | + |
| 7 | +A test harness that exercises MCP SDK implementations against the protocol spec. The coverage number that matters here is **spec coverage** — how much of the protocol the scenarios test. |
| 8 | + |
| 9 | +Uses **npm** (not pnpm/yarn). Don't commit `pnpm-lock.yaml` or `yarn.lock`. |
| 10 | + |
| 11 | +## Where to start |
| 12 | + |
| 13 | +**Open an issue first** — whether you've hit a bug in the harness or want to propose a new scenario. For scenarios, sketch which part of the spec you want to cover and roughly how; for bugs, include the command you ran and the output. Either way, a short discussion up front beats review churn on a PR that overlaps existing work or heads in a direction we're not going. |
| 14 | + |
| 15 | +**Don't point an agent at the repo and ask it to "find bugs."** Generic bug-hunting on a test harness produces low-signal PRs (typo fixes, unused-variable cleanups, speculative refactors). If you want to contribute via an agent, give it a concrete target: |
| 16 | + |
| 17 | +- Pick a specific MUST or SHOULD from the [MCP spec](https://modelcontextprotocol.io/specification/) that has no scenario yet, and ask the agent to draft one. |
| 18 | +- Pick an [open issue](https://github.com/modelcontextprotocol/conformance/issues) and work on that. |
| 19 | + |
| 20 | +The valuable contribution here is **spec coverage**, not harness polish. |
| 21 | + |
| 22 | +## Scenario design: fewer scenarios, more checks |
| 23 | + |
| 24 | +**The strongest rule in this repo:** prefer one scenario with many checks over many scenarios with one check each. |
| 25 | + |
| 26 | +Why: |
| 27 | + |
| 28 | +- Each scenario often spins up its own HTTP server. These suites run in CI on every push for every SDK, so per-scenario overhead multiplies fast. |
| 29 | +- Less code to maintain and update when the spec shifts. |
| 30 | +- Progress on making an SDK better shows up as "pass 7/10 checks" rather than "pass 1 test, fail another" — finer-grained signal from the same run. |
| 31 | + |
| 32 | +### Granularity heuristic |
| 33 | + |
| 34 | +Ask: **"Would it make sense for someone to implement a server/client that does just this scenario?"** |
| 35 | + |
| 36 | +If two scenarios would always be implemented together, merge them. Examples: |
| 37 | + |
| 38 | +- `tools/list` + a simple `tools/call` → one scenario |
| 39 | +- All content-type variants (image, audio, mixed, resource) → one scenario |
| 40 | +- Full OAuth flow with token refresh → one scenario, not separate "basic" + "refresh" scenarios. A client that passes "basic" but not "refresh" just shows up as passing N−2 checks. |
| 41 | + |
| 42 | +Keep scenarios separate when they're genuinely independent features or when they're mutually exclusive (e.g., an SDK should support writing a server that _doesn't_ implement certain stateful features). |
| 43 | + |
| 44 | +### When a PR adds scenarios |
| 45 | + |
| 46 | +- Start with **one end-to-end scenario** covering the happy path with many checks along the way. |
| 47 | +- Don't add "step 1 only" and "step 1+2" as separate scenarios — the second subsumes the first. |
| 48 | +- Register the scenario in the appropriate suite list in `src/scenarios/index.ts` (`core`, `extensions`, `backcompat`, etc.). |
| 49 | + |
| 50 | +## Check conventions |
| 51 | + |
| 52 | +- **Same `id` for SUCCESS and FAIL.** A check should use one slug and flip `status` + `errorMessage`, not branch into `foo-success` vs `foo-failure` slugs. |
| 53 | +- **Optimize for Ctrl+F on the slug.** Repetitive check blocks are fine — easier to find the failing one than to unwind a clever helper. |
| 54 | +- Reuse `ConformanceCheck` and other types from `src/types.ts` rather than defining parallel shapes. |
| 55 | +- Include `specReferences` pointing to the relevant spec section. |
| 56 | + |
| 57 | +## Descriptions and wording |
| 58 | + |
| 59 | +Be precise about what's **required** vs **optional**. A scenario description that tests optional behavior should make that clear — e.g. "Tests that a client _that wants a refresh token_ handles offline_access scope…" not "Tests that a client handles offline_access scope…". Don't accidentally promote a MAY/SHOULD to a MUST in the prose. |
| 60 | + |
| 61 | +When in doubt about spec details (OAuth parameters, audiences, grant types), check the actual spec in `modelcontextprotocol` rather than guessing. |
| 62 | + |
| 63 | +## Examples: prove it passes and fails |
| 64 | + |
| 65 | +A new scenario should come with: |
| 66 | + |
| 67 | +1. **A passing example** — usually by extending `examples/clients/typescript/everything-client.ts` or the everything-server, not a new file. |
| 68 | +2. **Evidence it fails when it should** — ideally a negative example (a deliberately broken client), or at minimum a manual run showing the failure mode. |
| 69 | + |
| 70 | +Delete unused example scenarios. If a scenario key in the everything-client has no corresponding test, remove it. |
| 71 | + |
| 72 | +## Don't add new ways to run tests |
| 73 | + |
| 74 | +Use the existing CLI runner (`npx @modelcontextprotocol/conformance client|server ...`). If you need a feature the runner doesn't have, add it to the runner rather than building a parallel entry point. |
| 75 | + |
| 76 | +## Before opening a PR |
| 77 | + |
| 78 | +- `npm run build` passes |
| 79 | +- `npm test` passes |
| 80 | +- For non-trivial scenario changes, run against at least one real SDK (typescript-sdk or python-sdk) to see actual output. For changes to shared infrastructure (runner, tier-check), test against go-sdk or csharp-sdk too. |
| 81 | +- Scenario is registered in the right suite in `src/scenarios/index.ts` |
0 commit comments