Skip to content

docs(rfc-003): canonical MCP implementation blueprint (omnigraph-mcp crate)#182

Open
ragnorc wants to merge 5 commits into
mainfrom
ragnorc/omnigraph-mcp-crate
Open

docs(rfc-003): canonical MCP implementation blueprint (omnigraph-mcp crate)#182
ragnorc wants to merge 5 commits into
mainfrom
ragnorc/omnigraph-mcp-crate

Conversation

@ragnorc

@ragnorc ragnorc commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

Updates RFC-003 into the authoritative, build-from-scratch spec for the in-server MCP surface, with a dedicated omnigraph-mcp crate from the start. Docs-only (one file); no code.

A reference implementation shipped in #157 (it proved rmcp 1.7 integrates on edition 2024, the auth-extension passthrough works, the conformance MUSTs are met, and the surface splits cleanly into a transport crate + a server backend). This RFC folds those learnings into a clean spec so the surface can be implemented properly from main with the crate baked in — not retrofitted.

What's in the new "Implementation Blueprint" (B.1–B.13)

A self-contained, ordered build guide:

  • B.1 Crate architecture + dependency rulecrates/omnigraph-mcp (rmcp Streamable-HTTP transport + an McpBackend trait + rmcp re-exports); omnigraph-server depends on it and implements the trait. The dependency must go server → mcp (a mcp → server dep cycles at the package level: server-bin → mcp → server-lib). The trait inverts the direction, so the crate never names an omnigraph type.
  • B.2–B.3 Seam + transport — the &http::request::Parts passthrough (how the server-injected ResolvedActor / GraphHandle reach the backend), and the verified rmcp config that gives GET→405, Origin→403, and MCP-Protocol-Version validation for free.
  • B.4 Backend reuse — factor 10 thin do_* fns; reuse run_query/run_mutate/authorize/api::query_catalog_entry/ApiError.
  • B.5–B.9 — the 13-tool catalog + Cedar mapping, ParamKind → JSON Schema, deny-masking + isError split, stored-query double-gate, two resources.
  • B.10 Routing (resolved) — per-graph /mcp; graphs_list / omnigraph://graphs dropped from MCP (graph discovery stays REST-only via GET /graphs); explicitly rules out a single flat /mcp with graph_id-in-call (breaks per-graph stored-query listing + isolation).
  • B.11 Auth — the decoupling invariant + a client-compatibility matrix (static bearer works for Claude Code / Cursor / VS Code / the API connector; claude.ai web + ChatGPT need the OAuth/RFC-9728 fast-follow).
  • B.12–B.13 — test plan + verification (cargo tree | grep rmcp confirms rmcp is not a direct server dep), and the locked decisions (resolves the Open Questions).

Housekeeping in the same edit

  • Marked the old "Reconciliation with shipped code" section as pre-MCP history.
  • Marked the §5 sketches as design rationale (superseded by the Blueprint where they differ — e.g. the per-tool McpTool trait became a Builtin enum + the McpBackend crate trait).
  • Resolved the Open Questions inline.

Relationship to #157

#157 is the working reference implementation (in-server MCP without the crate split). This RFC is the spec for the clean version; the implementation will be built on a fresh branch from main following B.1–B.13, cribbing from #157.


Note

Low Risk
Documentation-only change to a single RFC file; no runtime, auth, or deployment behavior is modified. The only follow-up risk is a minor spec inconsistency in the §15 routing example versus the documented mcp_router arity.

Overview
Rewrites RFC-003 from a shorter “Proposed” sketch into a buildable v0.8.x spec for in-server MCP on Streamable HTTP (MCP 2025-11-25, rmcp 1.7), folding in reference work from #157.

The doc now centers a omnigraph-mcp crate (McpBackend + stateless transport) with omnigraph-server → omnigraph-mcp dependency direction, &http::request::Parts extension passthrough for ResolvedActor / GraphHandle, and server-side OmnigraphMcpBackend dispatch that reuses existing authorize, run_query, run_mutate, and stored-query gates.

Behavioral/product decisions are locked or reframed: domain-qualified tool ids (e.g. graph_query), explicit ToolAnnotations, per_query vs meta stored-query projection with auto threshold, structuredContent + outputSchema, SEP-1303-style classify error mapping, deny-masking for unknown/denied tools, per-graph /mcp routing with no graphs_list / omnigraph://graphs on MCP (discovery via GET /graphs only), plus provider/auth matrices (static bearer Phase 1, OAuth/RFC 9728 Phase 2, PRM config-gated for #59467).

Adds rollout, test/verification plans, and defers per-query InvokeQuery scope and stdio→HTTP proxy collapse. Note: the §15 build_app snippet shows mcp::mcp_router(state.clone()) while the prose documents a three-arg host_policy_from_bind wrapper—implementers should follow the full signature in §7/§15 text.

Reviewed by Cursor Bugbot for commit de9e28e. Bugbot is set up for automated code reviews on this repo. Configure here.

Greptile Summary

Promotes RFC-003 from "Proposed" to the canonical, authoritative build spec for a clean omnigraph-mcp crate implementation, folding in every verified decision from the reference work in #157. One concrete spec inconsistency was found.

  • Implementation Blueprint (B.1–B.13) is the bulk of the addition: it specifies the crate split (omnigraph-mcp transport + omnigraph-server backend), the McpBackend trait seam using &http::request::Parts for auth passthrough, host/origin policy derived from deployment context, the 13-tool Cedar-mapped catalog, stored-query projection, classify error mapping, and a full test/verification plan.
  • Housekeeping: the old Reconciliation section is marked historical, §5 is reframed as superseded design rationale, Open Questions are resolved inline pointing to B.13, and the stdio package figures are refreshed to v0.6.1 / 16 tools.
  • Spec inconsistency: B.4's mcp_router call site passes only two arguments while B.3 defines the function with three (backend, body_limit, hosts: McpHostPolicy); following B.4 literally would produce a compile error.

Confidence Score: 4/5

Documentation-only change; safe to merge with the B.4 call-site inconsistency corrected before the implementation branch begins.

The only changed file is an RFC markdown document with no runtime impact. The spec is internally consistent except for one call site in B.4 that omits the required hosts: McpHostPolicy argument that B.3 defines — an implementer following that example verbatim would hit a compile error at the first build. No code, migrations, or configs are modified.

docs/dev/rfc-003-mcp-server-surface.md — the B.4 mcp_router call site needs the hosts argument added before implementation begins.

Important Files Changed

Filename Overview
docs/dev/rfc-003-mcp-server-surface.md Comprehensive implementation blueprint (B.1–B.13) added for the omnigraph-mcp crate; one spec inconsistency: B.4's example call to mcp_router omits the required hosts: McpHostPolicy third argument defined in B.3

Sequence Diagram

sequenceDiagram
    participant Client as MCP Client
    participant Middleware as Auth Middleware
    participant McpCrate as omnigraph-mcp McpService
    participant Backend as OmnigraphMcpBackend
    participant Cedar as Cedar Policy Engine
    participant Engine as Graph Engine

    Client->>Middleware: "POST /graphs/{id}/mcp Authorization: Bearer"
    Middleware->>McpCrate: "Extensions: ResolvedActor + Arc<GraphHandle>"
    McpCrate->>Backend: call_tool(parts, name, args)
    Backend->>Cedar: authorize(PolicyAction, actor, branch)
    Cedar-->>Backend: "Allowed | Denied"
    alt Denied
        Backend-->>McpCrate: -32602 unknown tool
    else Allowed
        Backend->>Engine: "do_* / run_query"
        Engine-->>Backend: "Ok(result) | Err(ApiError)"
        Backend->>Backend: classify(result)
        Backend-->>McpCrate: "CallToolResult | McpError"
    end
    McpCrate-->>Client: JSON-RPC response
Loading

Fix All in Claude Code

Reviews (2): Last reviewed commit: "docs(rfc-003): correct-by-construction f..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

@ragnorc ragnorc requested a review from aaltshuler as a code owner June 10, 2026 19:51

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3009564437

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/dev/rfc-003-mcp-server-surface.md Outdated

- A generic `struct McpService<B>` implements rmcp's `ServerHandler`, delegating each method to `B` after extracting `&Parts` from `ctx.extensions` once (missing → `McpError::internal_error`). `get_info → backend.server_info()`; `initialize`/`ping` use rmcp's defaults.
- `pub fn mcp_router<B: McpBackend>(backend: B, body_limit: usize) -> axum::Router`:
- `let svc = StreamableHttpService::new(move || Ok(McpService::new(backend.clone())), Arc::new(LocalSessionManager::default()), config)` with `config = StreamableHttpServerConfig::default().with_stateful_mode(false).with_json_response(true)` (`StreamableHttpServerConfig` is `#[non_exhaustive]` — build from `Default`, mutate via the `with_*` setters; keep the loopback `allowed_hosts` default).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Configure allowed hosts for non-loopback MCP deployments

In any remote or cloud deployment, keeping rmcp's default allowed_hosts makes /mcp reject valid clients before bearer auth, because rmcp 1.7 defaults Host validation to loopback authorities only (localhost, 127.0.0.1, ::1). This RFC explicitly targets remotely reachable MCP clients, so the blueprint should require a server-configured allowed-host allowlist (or a documented reverse-proxy Host validation setup) instead of preserving the loopback default.

Useful? React with 👍 / 👎.

Comment thread docs/dev/rfc-003-mcp-server-surface.md Outdated
Comment on lines +157 to +159
- **Ad-hoc `query`/`mutate` always exposed, Cedar-only**; no `mcp.allow_adhoc`. [Open Q3 → resolved.]
- **`query`/`mutate` ids only**, no `read`/`change` aliases. [Open Q7 → resolved.]
- **Per-graph `/mcp` routing**; `graphs_list`/`omnigraph://graphs` **dropped from MCP** (graph discovery via REST `GET /graphs`); no server-scoped MCP tools. [Open Q5 → resolved.]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reconcile the stale rollout with locked decisions

These locked decisions conflict with the still-active Rollout and Testing sections below, which continue to tell implementers to ship graphs_list, read/change aliases, a server-level /mcp, and an mcp.allow_adhoc switch. Since the status says the blueprint only supersedes the older §5 sketches, someone following the Rollout section would build surface area this canonical section says was dropped; please update those later sections or explicitly mark them historical too.

Useful? React with 👍 / 👎.

Comment thread docs/dev/rfc-003-mcp-server-surface.md Outdated
Comment thread docs/dev/rfc-003-mcp-server-surface.md Outdated
@ragnorc

ragnorc commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the valid review findings as correct-by-construction fixes (commit 2ade97f) so a from-scratch build can't reintroduce them:

The "missing server.md MCP section" finding was stale — reviewers saw commit 94bbf67, before the docs commit added it.

Comment thread docs/dev/rfc-003-mcp-server-surface.md Outdated
**Reuse, never reinvent.** First factor **10 thin `do_*` fns** out of the inline `server_*` HTTP handlers (each is `authorize_request(...) → engine call → DTO`) so REST and MCP dispatch one path: `do_snapshot`, `do_schema_get`, `do_branches_list`, `do_commits_list`, `do_commit_show`, `do_ingest`, `do_branch_create`, `do_branch_delete`, `do_branch_merge`, `do_schema_apply`. (No `do_graphs_list` — `graphs_list` is not an MCP tool, see B.10; `server_graphs_list` stays inline for `GET /graphs`.) Land that as a behavior-neutral refactor commit first (it keeps the REST handlers as thin wrappers; all server tests stay green). Then reuse as-is: `run_query` / `run_mutate` (already decoupled from request bodies), `authorize` → `Authz { Allowed, Denied(msg) }` (with `Err` reserved for operational 401/500), `api::query_catalog_entry` / `ParamKind` / `read_output`, `ApiError` (add `pub(crate) status_code()` + `message_str()` accessors for the error classifier). Mount in `build_app`: `.merge(mcp::mcp_router(state))` inside the `per_graph_protected` group, where the server's thin `mcp::mcp_router(state) = omnigraph_mcp::mcp_router(OmnigraphMcpBackend::new(state), INGEST_REQUEST_BODY_LIMIT_BYTES)`.

Represent the built-ins as a `Builtin` enum (one variant per tool; `descriptor` / `gate` / `call` as match arms) — lower liability than ~14 unit structs + `dyn` + `async-trait` per tool. Stored-query tools are a sibling populator over `handle.queries`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 B.4 call site missing hosts: McpHostPolicy argument

The mcp_router function defined in B.3 has the signature pub fn mcp_router<B: McpBackend>(backend: B, body_limit: usize, hosts: McpHostPolicy) -> axum::Router (three parameters), but the mounting example in B.4 only passes two: omnigraph_mcp::mcp_router(OmnigraphMcpBackend::new(state), INGEST_REQUEST_BODY_LIMIT_BYTES). The hosts: McpHostPolicy argument is absent. An implementer following B.4 verbatim would get a Rust compile error at that call site. The call should be updated to pass the McpHostPolicy computed at startup per B.3.1 — for example omnigraph_mcp::mcp_router(OmnigraphMcpBackend::new(state), INGEST_REQUEST_BODY_LIMIT_BYTES, McpHostPolicy::from_bind_config(&state.config)).

Fix in Claude Code

ragnorc added 3 commits June 13, 2026 11:47
…crate)

Brings RFC-003 to the canonical from-scratch spec for the in-server MCP surface:
a dedicated omnigraph-mcp transport crate + McpBackend trait (server -> mcp dep
to avoid a Cargo cycle), the verified rmcp 1.7 design (conformance MUSTs for
free), backend reuse (do_* / run_query / run_mutate / authorize / api), the
13-tool catalog + Cedar mapping, ParamKind -> JSON Schema, deny-masking +
isError split, two resources, per-graph routing (graphs_list stays REST-only),
auth + client-compat matrix + OAuth fast-follow, tests, and locked decisions.

Build the implementation on this branch from blueprint sections B.1-B.13. The
spec is informed by the #157 reference implementation.
Fold the valid #157 review findings into the blueprint as by-design fixes so a
from-scratch build cannot reintroduce them:

- B.3.1: host/origin policy derived from the server bind + config (loopback ->
  loopback hosts; non-loopback -> configured public host, else Host-allowlist
  off with bearer + Origin as the controls). Closes the loopback-only-default
  footgun that 403s every remote client before bearer auth.
- B.6/B.7: stored-query params nested under a `params` object (mirrors
  POST /queries/{name}), so the branch/snapshot knobs cannot collide with a
  query param; mutation tools omit `snapshot` + additionalProperties:false, so
  mutation-against-a-snapshot is unrepresentable. Vector `dim` made intrinsic to
  the kind, killing the unwrap_or(0) zero-length-array schema.
- B.8: one canonical classify(Result<_, ApiError>) for tool errors (5xx ->
  JSON-RPC, 4xx/409 -> isError), no second mapper to drift; list visibility uses
  the call-path default-branch authorization, not a branch:None probe that hid
  branch-scoped-grant tools.
- Reconciled the stale Summary count and the section-5-era Rollout/Testing
  sections (banners pointing to B.12/B.4/B.13).
Rebased onto current main (0.7.0-dev) and reconciled the MCP blueprint
with the code it builds on, replacing the earlier blueprint sketch with a
spec grounded in real interfaces (each EXISTING snippet cited to file:line,
re-read from source). Corrections (delta table in §0):

- ParamKind::Vector is a unit variant; the dimension is
  ParamDescriptor.vector_dim: Option<u32>, not an intrinsic struct field.
  The JSON-Schema builder now omits minItems/maxItems when the dim is
  absent instead of emitting 0.
- Server handlers, auth helpers, and run_query/run_mutate moved from
  lib.rs to handlers.rs; stored-query config in config.rs; DTOs in api.rs.
- ingest/load unified: server_ingest calls load_as; a missing branch with
  no `from` is a 404, never an implicit fork. The MCP tool is named `load`.
- Client credentials follow the shipped RFC-007 model (OperatorServer { url },
  env -> credentials-file chain, no keychain step, no nested auth: block).
- Test references updated for the split server suites (mcp.rs / stored_queries.rs,
  not --test server).
- Stored-query declaration source noted as cluster.yaml file-discovery with
  omnigraph.yaml as the RFC-008-deprecated legacy surface.

Folds the standalone implementation-spec draft into this file so there is one
MCP RFC; cross-links rfc-005/007/009 now that they are in the corpus.
@ragnorc ragnorc force-pushed the ragnorc/omnigraph-mcp-crate branch from 2ade97f to 61c18b6 Compare June 13, 2026 13:20

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ragnorc has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Comment thread docs/dev/rfc-003-mcp-server-surface.md Outdated
Rewrites the MCP RFC as a single self-contained specification, validated
against the live upstream surfaces it builds on:

- MCP protocol revision 2025-11-25 (Streamable HTTP transport requirements,
  authorization model, tool/resource shapes, SEP-1303 input-validation-as-
  tool-error, JSON Schema 2020-12 default).
- rmcp 1.7.0 (StreamableHttpServerConfig real defaults — stateful_mode
  defaults true and must be flipped; NeverSessionManager for stateless mode;
  the `local` feature must stay off; RequestContext.extensions carries the
  http::request::Parts passthrough; RPITIT ServerHandler).
- Tool/server best practice: domain-qualified tool names, explicit
  annotations, structured output + text mirror, and progressive disclosure
  (per_query vs meta projection modes) for large stored-query catalogs.

Adds a provider compatibility matrix (Claude Code/Desktop/web, Messages API
connector, OpenAI Responses API/ChatGPT, Cursor, VS Code, OpenCode) with a
phased static-bearer -> OAuth 2.1 + RFC 9728 auth plan and the Claude Code
PRM header-override caveat, and a code-mode compatibility section describing
the server-side properties that make the surface code-execution-friendly
without building client-side machinery.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ragnorc has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 20233a3. Configure here.

Token validation is offline (cached JWKS) — on-prem/air-gapped keeps working with no request-path cloud dependency. The MCP endpoint never terminates OAuth and never holds a client secret (Resource Server only).

**Cloud client negotiation — additive, no MCP changes:** when MR-956 lands, the server publishes RFC 9728 `/.well-known/oauth-protected-resource` and returns `WWW-Authenticate: Bearer ..., resource_metadata="..."` on 401. A compliant MCP client (Claude) then auto-negotiates: static bearer to an on-prem endpoint; on a cloud 401 it discovers the WorkOS AS and runs OAuth/PKCE itself — **same endpoint URL, zero client-side branching.** This RFC only requires that MCP routes flow through the standard 401 path so that hook can be added later without touching MCP.
| `health` | server | none (liveness/version) |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server-scoped health contradicts routing

Medium Severity

The tool catalog still lists health as server-scoped MCP, while routing and locked decisions require every MCP tool to be graph-scoped with no server-level /mcp. Unlike graphs_list, which was dropped for the same reason, health has no mount point—implementers cannot satisfy both sections without guessing intent.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 20233a3. Configure here.

detail_level?)` (returns names + descriptions; full param schema only at higher
detail) and `stored_query_run(name, params, branch?, snapshot?)`. This keeps
`tools/list` small and mirrors the progressive-disclosure shape (`search` + `execute`)
that scales to hundreds of queries.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta list tool lacks Cedar mapping

Medium Severity

In meta stored-query mode the spec adds stored_query_list alongside stored_query_run, but the Cedar mapping table only authorizes stored_query_run. Listing can expose query names and descriptions without a defined InvokeQuery (or equivalent) gate for the list tool.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 20233a3. Configure here.

Add §15.1 making the multi-graph model explicit (it was implied across
§9/§14/§15): one isolated MCP server per graph at /graphs/{id}/mcp with the
graph id in the URL path; cross-graph discovery is REST-only via GET /graphs
(server-scoped GraphList, default-denied without a server policy); stored-query
registries, tool-name uniqueness, projection mode, and InvokeQuery are all
per-graph; clients configure one connection per graph and the connection name
namespaces the otherwise-identical tool ids.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ragnorc has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant