docs(rfc-003): canonical MCP implementation blueprint (omnigraph-mcp crate)#182
docs(rfc-003): canonical MCP implementation blueprint (omnigraph-mcp crate)#182ragnorc wants to merge 5 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3009564437
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| - A generic `struct McpService<B>` implements rmcp's `ServerHandler`, delegating each method to `B` after extracting `&Parts` from `ctx.extensions` once (missing → `McpError::internal_error`). `get_info → backend.server_info()`; `initialize`/`ping` use rmcp's defaults. | ||
| - `pub fn mcp_router<B: McpBackend>(backend: B, body_limit: usize) -> axum::Router`: | ||
| - `let svc = StreamableHttpService::new(move || Ok(McpService::new(backend.clone())), Arc::new(LocalSessionManager::default()), config)` with `config = StreamableHttpServerConfig::default().with_stateful_mode(false).with_json_response(true)` (`StreamableHttpServerConfig` is `#[non_exhaustive]` — build from `Default`, mutate via the `with_*` setters; keep the loopback `allowed_hosts` default). |
There was a problem hiding this comment.
Configure allowed hosts for non-loopback MCP deployments
In any remote or cloud deployment, keeping rmcp's default allowed_hosts makes /mcp reject valid clients before bearer auth, because rmcp 1.7 defaults Host validation to loopback authorities only (localhost, 127.0.0.1, ::1). This RFC explicitly targets remotely reachable MCP clients, so the blueprint should require a server-configured allowed-host allowlist (or a documented reverse-proxy Host validation setup) instead of preserving the loopback default.
Useful? React with 👍 / 👎.
| - **Ad-hoc `query`/`mutate` always exposed, Cedar-only**; no `mcp.allow_adhoc`. [Open Q3 → resolved.] | ||
| - **`query`/`mutate` ids only**, no `read`/`change` aliases. [Open Q7 → resolved.] | ||
| - **Per-graph `/mcp` routing**; `graphs_list`/`omnigraph://graphs` **dropped from MCP** (graph discovery via REST `GET /graphs`); no server-scoped MCP tools. [Open Q5 → resolved.] |
There was a problem hiding this comment.
Reconcile the stale rollout with locked decisions
These locked decisions conflict with the still-active Rollout and Testing sections below, which continue to tell implementers to ship graphs_list, read/change aliases, a server-level /mcp, and an mcp.allow_adhoc switch. Since the status says the blueprint only supersedes the older §5 sketches, someone following the Rollout section would build surface area this canonical section says was dropped; please update those later sections or explicitly mark them historical too.
Useful? React with 👍 / 👎.
|
Addressed the valid review findings as correct-by-construction fixes (commit
The "missing |
| **Reuse, never reinvent.** First factor **10 thin `do_*` fns** out of the inline `server_*` HTTP handlers (each is `authorize_request(...) → engine call → DTO`) so REST and MCP dispatch one path: `do_snapshot`, `do_schema_get`, `do_branches_list`, `do_commits_list`, `do_commit_show`, `do_ingest`, `do_branch_create`, `do_branch_delete`, `do_branch_merge`, `do_schema_apply`. (No `do_graphs_list` — `graphs_list` is not an MCP tool, see B.10; `server_graphs_list` stays inline for `GET /graphs`.) Land that as a behavior-neutral refactor commit first (it keeps the REST handlers as thin wrappers; all server tests stay green). Then reuse as-is: `run_query` / `run_mutate` (already decoupled from request bodies), `authorize` → `Authz { Allowed, Denied(msg) }` (with `Err` reserved for operational 401/500), `api::query_catalog_entry` / `ParamKind` / `read_output`, `ApiError` (add `pub(crate) status_code()` + `message_str()` accessors for the error classifier). Mount in `build_app`: `.merge(mcp::mcp_router(state))` inside the `per_graph_protected` group, where the server's thin `mcp::mcp_router(state) = omnigraph_mcp::mcp_router(OmnigraphMcpBackend::new(state), INGEST_REQUEST_BODY_LIMIT_BYTES)`. | ||
|
|
||
| Represent the built-ins as a `Builtin` enum (one variant per tool; `descriptor` / `gate` / `call` as match arms) — lower liability than ~14 unit structs + `dyn` + `async-trait` per tool. Stored-query tools are a sibling populator over `handle.queries`. | ||
|
|
There was a problem hiding this comment.
B.4 call site missing
hosts: McpHostPolicy argument
The mcp_router function defined in B.3 has the signature pub fn mcp_router<B: McpBackend>(backend: B, body_limit: usize, hosts: McpHostPolicy) -> axum::Router (three parameters), but the mounting example in B.4 only passes two: omnigraph_mcp::mcp_router(OmnigraphMcpBackend::new(state), INGEST_REQUEST_BODY_LIMIT_BYTES). The hosts: McpHostPolicy argument is absent. An implementer following B.4 verbatim would get a Rust compile error at that call site. The call should be updated to pass the McpHostPolicy computed at startup per B.3.1 — for example omnigraph_mcp::mcp_router(OmnigraphMcpBackend::new(state), INGEST_REQUEST_BODY_LIMIT_BYTES, McpHostPolicy::from_bind_config(&state.config)).
…crate) Brings RFC-003 to the canonical from-scratch spec for the in-server MCP surface: a dedicated omnigraph-mcp transport crate + McpBackend trait (server -> mcp dep to avoid a Cargo cycle), the verified rmcp 1.7 design (conformance MUSTs for free), backend reuse (do_* / run_query / run_mutate / authorize / api), the 13-tool catalog + Cedar mapping, ParamKind -> JSON Schema, deny-masking + isError split, two resources, per-graph routing (graphs_list stays REST-only), auth + client-compat matrix + OAuth fast-follow, tests, and locked decisions. Build the implementation on this branch from blueprint sections B.1-B.13. The spec is informed by the #157 reference implementation.
Fold the valid #157 review findings into the blueprint as by-design fixes so a from-scratch build cannot reintroduce them: - B.3.1: host/origin policy derived from the server bind + config (loopback -> loopback hosts; non-loopback -> configured public host, else Host-allowlist off with bearer + Origin as the controls). Closes the loopback-only-default footgun that 403s every remote client before bearer auth. - B.6/B.7: stored-query params nested under a `params` object (mirrors POST /queries/{name}), so the branch/snapshot knobs cannot collide with a query param; mutation tools omit `snapshot` + additionalProperties:false, so mutation-against-a-snapshot is unrepresentable. Vector `dim` made intrinsic to the kind, killing the unwrap_or(0) zero-length-array schema. - B.8: one canonical classify(Result<_, ApiError>) for tool errors (5xx -> JSON-RPC, 4xx/409 -> isError), no second mapper to drift; list visibility uses the call-path default-branch authorization, not a branch:None probe that hid branch-scoped-grant tools. - Reconciled the stale Summary count and the section-5-era Rollout/Testing sections (banners pointing to B.12/B.4/B.13).
Rebased onto current main (0.7.0-dev) and reconciled the MCP blueprint
with the code it builds on, replacing the earlier blueprint sketch with a
spec grounded in real interfaces (each EXISTING snippet cited to file:line,
re-read from source). Corrections (delta table in §0):
- ParamKind::Vector is a unit variant; the dimension is
ParamDescriptor.vector_dim: Option<u32>, not an intrinsic struct field.
The JSON-Schema builder now omits minItems/maxItems when the dim is
absent instead of emitting 0.
- Server handlers, auth helpers, and run_query/run_mutate moved from
lib.rs to handlers.rs; stored-query config in config.rs; DTOs in api.rs.
- ingest/load unified: server_ingest calls load_as; a missing branch with
no `from` is a 404, never an implicit fork. The MCP tool is named `load`.
- Client credentials follow the shipped RFC-007 model (OperatorServer { url },
env -> credentials-file chain, no keychain step, no nested auth: block).
- Test references updated for the split server suites (mcp.rs / stored_queries.rs,
not --test server).
- Stored-query declaration source noted as cluster.yaml file-discovery with
omnigraph.yaml as the RFC-008-deprecated legacy surface.
Folds the standalone implementation-spec draft into this file so there is one
MCP RFC; cross-links rfc-005/007/009 now that they are in the corpus.
2ade97f to
61c18b6
Compare
There was a problem hiding this comment.
ragnorc has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Rewrites the MCP RFC as a single self-contained specification, validated against the live upstream surfaces it builds on: - MCP protocol revision 2025-11-25 (Streamable HTTP transport requirements, authorization model, tool/resource shapes, SEP-1303 input-validation-as- tool-error, JSON Schema 2020-12 default). - rmcp 1.7.0 (StreamableHttpServerConfig real defaults — stateful_mode defaults true and must be flipped; NeverSessionManager for stateless mode; the `local` feature must stay off; RequestContext.extensions carries the http::request::Parts passthrough; RPITIT ServerHandler). - Tool/server best practice: domain-qualified tool names, explicit annotations, structured output + text mirror, and progressive disclosure (per_query vs meta projection modes) for large stored-query catalogs. Adds a provider compatibility matrix (Claude Code/Desktop/web, Messages API connector, OpenAI Responses API/ChatGPT, Cursor, VS Code, OpenCode) with a phased static-bearer -> OAuth 2.1 + RFC 9728 auth plan and the Claude Code PRM header-override caveat, and a code-mode compatibility section describing the server-side properties that make the surface code-execution-friendly without building client-side machinery.
There was a problem hiding this comment.
ragnorc has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 20233a3. Configure here.
| Token validation is offline (cached JWKS) — on-prem/air-gapped keeps working with no request-path cloud dependency. The MCP endpoint never terminates OAuth and never holds a client secret (Resource Server only). | ||
|
|
||
| **Cloud client negotiation — additive, no MCP changes:** when MR-956 lands, the server publishes RFC 9728 `/.well-known/oauth-protected-resource` and returns `WWW-Authenticate: Bearer ..., resource_metadata="..."` on 401. A compliant MCP client (Claude) then auto-negotiates: static bearer to an on-prem endpoint; on a cloud 401 it discovers the WorkOS AS and runs OAuth/PKCE itself — **same endpoint URL, zero client-side branching.** This RFC only requires that MCP routes flow through the standard 401 path so that hook can be added later without touching MCP. | ||
| | `health` | server | none (liveness/version) | |
There was a problem hiding this comment.
Server-scoped health contradicts routing
Medium Severity
The tool catalog still lists health as server-scoped MCP, while routing and locked decisions require every MCP tool to be graph-scoped with no server-level /mcp. Unlike graphs_list, which was dropped for the same reason, health has no mount point—implementers cannot satisfy both sections without guessing intent.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 20233a3. Configure here.
| detail_level?)` (returns names + descriptions; full param schema only at higher | ||
| detail) and `stored_query_run(name, params, branch?, snapshot?)`. This keeps | ||
| `tools/list` small and mirrors the progressive-disclosure shape (`search` + `execute`) | ||
| that scales to hundreds of queries. |
There was a problem hiding this comment.
Meta list tool lacks Cedar mapping
Medium Severity
In meta stored-query mode the spec adds stored_query_list alongside stored_query_run, but the Cedar mapping table only authorizes stored_query_run. Listing can expose query names and descriptions without a defined InvokeQuery (or equivalent) gate for the list tool.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 20233a3. Configure here.
Add §15.1 making the multi-graph model explicit (it was implied across
§9/§14/§15): one isolated MCP server per graph at /graphs/{id}/mcp with the
graph id in the URL path; cross-graph discovery is REST-only via GET /graphs
(server-scoped GraphList, default-denied without a server policy); stored-query
registries, tool-name uniqueness, projection mode, and InvokeQuery are all
per-graph; clients configure one connection per graph and the connection name
namespaces the otherwise-identical tool ids.
There was a problem hiding this comment.
ragnorc has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.


Summary
Updates RFC-003 into the authoritative, build-from-scratch spec for the in-server MCP surface, with a dedicated
omnigraph-mcpcrate from the start. Docs-only (one file); no code.A reference implementation shipped in #157 (it proved rmcp 1.7 integrates on edition 2024, the auth-extension passthrough works, the conformance MUSTs are met, and the surface splits cleanly into a transport crate + a server backend). This RFC folds those learnings into a clean spec so the surface can be implemented properly from
mainwith the crate baked in — not retrofitted.What's in the new "Implementation Blueprint" (B.1–B.13)
A self-contained, ordered build guide:
crates/omnigraph-mcp(rmcp Streamable-HTTP transport + anMcpBackendtrait + rmcp re-exports);omnigraph-serverdepends on it and implements the trait. The dependency must goserver → mcp(amcp → serverdep cycles at the package level:server-bin → mcp → server-lib). The trait inverts the direction, so the crate never names an omnigraph type.&http::request::Partspassthrough (how the server-injectedResolvedActor/GraphHandlereach the backend), and the verified rmcp config that givesGET→405,Origin→403, andMCP-Protocol-Versionvalidation for free.do_*fns; reuserun_query/run_mutate/authorize/api::query_catalog_entry/ApiError.ParamKind → JSON Schema, deny-masking +isErrorsplit, stored-query double-gate, two resources./mcp;graphs_list/omnigraph://graphsdropped from MCP (graph discovery stays REST-only viaGET /graphs); explicitly rules out a single flat/mcpwithgraph_id-in-call (breaks per-graph stored-query listing + isolation).cargo tree | grep rmcpconfirms rmcp is not a direct server dep), and the locked decisions (resolves the Open Questions).Housekeeping in the same edit
McpTooltrait became aBuiltinenum + theMcpBackendcrate trait).Relationship to #157
#157 is the working reference implementation (in-server MCP without the crate split). This RFC is the spec for the clean version; the implementation will be built on a fresh branch from
mainfollowing B.1–B.13, cribbing from #157.Note
Low Risk
Documentation-only change to a single RFC file; no runtime, auth, or deployment behavior is modified. The only follow-up risk is a minor spec inconsistency in the §15 routing example versus the documented
mcp_routerarity.Overview
Rewrites RFC-003 from a shorter “Proposed” sketch into a buildable v0.8.x spec for in-server MCP on Streamable HTTP (MCP 2025-11-25, rmcp 1.7), folding in reference work from #157.
The doc now centers a
omnigraph-mcpcrate (McpBackend+ stateless transport) withomnigraph-server → omnigraph-mcpdependency direction,&http::request::Partsextension passthrough forResolvedActor/GraphHandle, and server-sideOmnigraphMcpBackenddispatch that reuses existingauthorize,run_query,run_mutate, and stored-query gates.Behavioral/product decisions are locked or reframed: domain-qualified tool ids (e.g.
graph_query), explicitToolAnnotations,per_queryvsmetastored-query projection with auto threshold,structuredContent+outputSchema, SEP-1303-styleclassifyerror mapping, deny-masking for unknown/denied tools, per-graph/mcprouting with nographs_list/omnigraph://graphson MCP (discovery viaGET /graphsonly), plus provider/auth matrices (static bearer Phase 1, OAuth/RFC 9728 Phase 2, PRM config-gated for #59467).Adds rollout, test/verification plans, and defers per-query
InvokeQueryscope and stdio→HTTP proxy collapse. Note: the §15build_appsnippet showsmcp::mcp_router(state.clone())while the prose documents a three-arghost_policy_from_bindwrapper—implementers should follow the full signature in §7/§15 text.Reviewed by Cursor Bugbot for commit de9e28e. Bugbot is set up for automated code reviews on this repo. Configure here.
Greptile Summary
Promotes RFC-003 from "Proposed" to the canonical, authoritative build spec for a clean
omnigraph-mcpcrate implementation, folding in every verified decision from the reference work in #157. One concrete spec inconsistency was found.omnigraph-mcptransport +omnigraph-serverbackend), theMcpBackendtrait seam using&http::request::Partsfor auth passthrough, host/origin policy derived from deployment context, the 13-tool Cedar-mapped catalog, stored-query projection,classifyerror mapping, and a full test/verification plan.mcp_routercall site passes only two arguments while B.3 defines the function with three (backend,body_limit,hosts: McpHostPolicy); following B.4 literally would produce a compile error.Confidence Score: 4/5
Documentation-only change; safe to merge with the B.4 call-site inconsistency corrected before the implementation branch begins.
The only changed file is an RFC markdown document with no runtime impact. The spec is internally consistent except for one call site in B.4 that omits the required
hosts: McpHostPolicyargument that B.3 defines — an implementer following that example verbatim would hit a compile error at the first build. No code, migrations, or configs are modified.docs/dev/rfc-003-mcp-server-surface.md — the B.4 mcp_router call site needs the hosts argument added before implementation begins.
Important Files Changed
Sequence Diagram
sequenceDiagram participant Client as MCP Client participant Middleware as Auth Middleware participant McpCrate as omnigraph-mcp McpService participant Backend as OmnigraphMcpBackend participant Cedar as Cedar Policy Engine participant Engine as Graph Engine Client->>Middleware: "POST /graphs/{id}/mcp Authorization: Bearer" Middleware->>McpCrate: "Extensions: ResolvedActor + Arc<GraphHandle>" McpCrate->>Backend: call_tool(parts, name, args) Backend->>Cedar: authorize(PolicyAction, actor, branch) Cedar-->>Backend: "Allowed | Denied" alt Denied Backend-->>McpCrate: -32602 unknown tool else Allowed Backend->>Engine: "do_* / run_query" Engine-->>Backend: "Ok(result) | Err(ApiError)" Backend->>Backend: classify(result) Backend-->>McpCrate: "CallToolResult | McpError" end McpCrate-->>Client: JSON-RPC responseReviews (2): Last reviewed commit: "docs(rfc-003): correct-by-construction f..." | Re-trigger Greptile