Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,21 +104,21 @@ The XML you provide is wrapped in a minimal `w:document > w:body` structure auto

## MCP Server

Cloudflare Worker exposing two flavors of MCP tools backed by the same database.
Cloudflare Worker exposing two tool families over MCP, backed by the same database.

Semantic search over the spec PDF (powered by `spec_content`):
Prose search over the spec PDFs (powered by `spec_content`):

- `search_ecma_spec` - semantic vector search across 18,000+ spec chunks
- `get_section` - fetch a specific section by ID (e.g., "17.3.1.24")
- `list_parts` - browse the spec structure
- `ooxml_search` - semantic vector search across 18,000+ spec chunks
- `ooxml_section` - fetch a specific section by ID (e.g., "17.3.1.24")
- `ooxml_parts` - browse the spec structure

Structural queries over the XSD schema graph (powered by `xsd_*` tables):

- `ooxml_lookup_element` / `ooxml_lookup_type` - canonical symbol info
- `ooxml_element` / `ooxml_type` - canonical symbol info
- `ooxml_children` - legal children of an element/type/group, in document order
- `ooxml_attributes` - attributes including those inherited and unfolded from attributeGroup refs
- `ooxml_enum` - simpleType enumeration values
- `ooxml_namespace_info` - vocabularies and per-profile symbol counts for a namespace URI
- `ooxml_namespace` - vocabularies and per-profile symbol counts for a namespace URI

Uses PostgreSQL with pgvector (Neon serverless in production, Docker locally).

Expand Down
39 changes: 32 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<img width="300" alt="logo" src="https://github.com/user-attachments/assets/df6311a6-c050-4592-bbf1-4a2228655bc3" />

[![Web](https://img.shields.io/badge/Web-v0.1.3-blue)](https://ooxml.dev)
[![MCP Server](https://img.shields.io/badge/MCP_Server-v0.0.1-blue)](https://api.ooxml.dev/mcp)
[![Web](https://img.shields.io/github/v/tag/superdoc-dev/ooxml-dev?filter=web-v*&label=Web&color=blue)](https://ooxml.dev)
[![MCP Server](https://img.shields.io/github/v/tag/superdoc-dev/ooxml-dev?filter=mcp-v*&label=MCP%20Server&color=blue)](https://api.ooxml.dev/mcp)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

The OOXML spec, explained by people who actually implemented it.
Expand All @@ -23,16 +23,41 @@ We faced this at SuperDoc — building a document engine on native OOXML with no

## MCP Server

Ask questions in natural language and get answers grounded in the spec, or query the schema graph for precise structural answers.
Ask questions in natural language and get answers grounded in the spec, or query the schema graph for precise structural answers. Works with Claude Code, Codex CLI, Cursor, and any MCP-compatible client.

**Claude Code**

```bash
claude mcp add --transport http ooxml https://api.ooxml.dev/mcp
```

**Codex CLI**

```bash
claude mcp add --transport http ecma-spec https://api.ooxml.dev/mcp
codex mcp add ooxml --url https://api.ooxml.dev/mcp
```

Or in `~/.codex/config.toml`:

```toml
[mcp_servers.ooxml]
url = "https://api.ooxml.dev/mcp"
```

**Cursor** — add to your MCP settings:

```json
{
"mcpServers": {
"ooxml": { "url": "https://api.ooxml.dev/mcp" }
}
}
```

Works with Claude Code, Cursor, and any MCP-compatible client. Two flavors of tools share one server:
Two tool families share one server:

- **Semantic** (over the spec PDF): `search_ecma_spec`, `get_section`, `list_parts`
- **Structural** (over the parsed XSDs): `ooxml_lookup_element`, `ooxml_lookup_type`, `ooxml_children`, `ooxml_attributes`, `ooxml_enum`, `ooxml_namespace_info`
- **Prose search** (over the spec PDFs): `ooxml_search`, `ooxml_section`, `ooxml_parts`
- **Schema lookup** (over the parsed XSDs): `ooxml_element`, `ooxml_type`, `ooxml_children`, `ooxml_attributes`, `ooxml_enum`, `ooxml_namespace`

## Development

Expand Down
88 changes: 69 additions & 19 deletions apps/mcp-server/README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,85 @@
# ECMA-376 Spec MCP Server
# OOXML Reference MCP Server

**The world's first ECMA-376 MCP server** - semantic search across the entire Office Open XML specification.
Cloudflare Worker that exposes ECMA-376 (Office Open XML) over the Model Context Protocol. Two tool families share one server:

- 18,000+ chunks from all 4 parts of ECMA-376
- Vector search powered by Voyage embeddings + pgvector
- Hosted on Cloudflare Workers
- **Prose search** — semantic search across the four ECMA-376 part PDFs (~18,000 chunks, embedded with Voyage, queried with pgvector).
- **Schema lookup** — deterministic queries over the parsed XSD graph (profiles, namespaces, symbols, content models, attributes, enums).

## Connect in Claude Code
Hosted at `https://api.ooxml.dev/mcp`.

## Connect

### Claude Code

```bash
claude mcp add --transport http ooxml https://api.ooxml.dev/mcp
```

### Codex CLI

```bash
claude mcp add --transport http ecma-spec https://api.ooxml.dev/mcp
codex mcp add ooxml --url https://api.ooxml.dev/mcp
```

Or add to `~/.codex/config.toml`:

```toml
[mcp_servers.ooxml]
url = "https://api.ooxml.dev/mcp"
```

### Cursor

Add to your Cursor MCP settings:

```json
{
"mcpServers": {
"ooxml": {
"url": "https://api.ooxml.dev/mcp"
}
}
}
```

## Endpoints
### Other clients

Any MCP-compatible client that speaks Streamable HTTP can connect to the endpoint directly.

## Tools

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/mcp` | GET | MCP server info |
| `/search` | POST | Semantic search (`{query, part?, limit?}`) |
| `/section` | GET | Get section (`?id=17.3.2&part=1`) |
| `/stats` | GET | Database stats |
### Prose search

| Tool | Returns |
| --- | --- |
| `ooxml_search` | Semantic search over the spec PDFs |
| `ooxml_section` | Specific section by ID (e.g. `17.3.2`) |
| `ooxml_parts` | Spec part / section structure |

### Schema lookup

| Tool | Returns |
| --- | --- |
| `ooxml_element` | Canonical info for an element by qname |
| `ooxml_type` | Canonical info for a complexType or simpleType |
| `ooxml_children` | Legal children of an element, type, or group (walks inheritance) |
| `ooxml_attributes` | Attributes including inherited + attributeGroup refs |
| `ooxml_enum` | Enumeration values for a simpleType |
| `ooxml_namespace` | Vocabularies and per-profile symbol counts for a namespace URI |

Default profile is `transitional`. Future profiles will compose Transitional with Office extension schemas.

## Development

```bash
# Install
# Install (from repo root)
bun install

# Run locally (needs .dev.vars with DATABASE_URL, VOYAGE_API_KEY)
wrangler dev
# Local dev — needs .dev.vars with DATABASE_URL and VOYAGE_API_KEY
bun run dev:mcp

# Deploy
wrangler deploy
# Deploy (from this directory)
bun run deploy
```

Database setup, ingest pipelines, and tests live at the repo root — see the top-level `README.md`.
62 changes: 13 additions & 49 deletions apps/mcp-server/src/index.ts
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
/**
* ECMA-376 Spec MCP Server
* OOXML Reference MCP Server
*
* Cloudflare Worker that exposes ECMA-376 specification search via MCP protocol.
*
* Tools:
* - search_ecma_spec: Semantic search across the spec
* - get_section: Get specific section by ID
* - list_parts: List spec parts and sections
* Cloudflare Worker exposing two tool families over MCP:
* - prose search over ECMA-376 PDFs (ooxml_search, ooxml_section, ooxml_parts)
* - schema lookup over the parsed XSD graph (ooxml_element, ooxml_type,
* ooxml_children, ooxml_attributes, ooxml_enum, ooxml_namespace)
*/

import { createDb } from "./db";
import { embedQuery } from "./embeddings";
import { handleMcpRequest } from "./mcp";
import { handleMcpRequest, TOOLS } from "./mcp";
import { OOXML_TOOL_DEFS } from "./ooxml-tools";

export interface Env {
DATABASE_URL: string;
Expand Down Expand Up @@ -169,7 +168,7 @@ export default {
return addCorsHeaders(
new Response(
JSON.stringify({
name: "ECMA-376 Spec MCP Server",
name: "OOXML Reference MCP Server",
version: "0.1.0",
endpoints: {
mcp: "/mcp",
Expand All @@ -188,50 +187,15 @@ export default {
},
};

// MCP info endpoint (GET for debugging)
// MCP info endpoint (GET for debugging). Tool list is derived from the same
// canonical exports as the JSON-RPC tools/list response so they can't drift.
function handleMcpInfo(): Response {
return new Response(
JSON.stringify({
name: "ecma-spec",
name: "ooxml",
version: "0.1.0",
description: "ECMA-376 (Office Open XML) specification search server",
tools: [
{
name: "search_ecma_spec",
description: "Search the ECMA-376 specification semantically",
inputSchema: {
type: "object",
properties: {
query: { type: "string", description: "Natural language search query" },
part: { type: "number", description: "Filter by part number (1-4)" },
limit: { type: "number", description: "Max results (default: 5)" },
},
required: ["query"],
},
},
{
name: "get_section",
description: "Get a specific section by ID",
inputSchema: {
type: "object",
properties: {
section_id: { type: "string", description: "Section ID (e.g., '17.3.2')" },
part: { type: "number", description: "Part number (1-4)" },
},
required: ["section_id"],
},
},
{
name: "list_parts",
description: "List spec parts and sections",
inputSchema: {
type: "object",
properties: {
part: { type: "number", description: "Filter by part number (1-4)" },
},
},
},
],
description: "OOXML (ECMA-376) reference server: prose search + schema lookup",
tools: [...TOOLS, ...OOXML_TOOL_DEFS],
}),
{
headers: { "Content-Type": "application/json" },
Expand Down
30 changes: 21 additions & 9 deletions apps/mcp-server/src/mcp.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,22 @@ const PART_DESCRIPTIONS: Record<number, string> = {
4: "Transitional Migration Features",
};

/** Shape of an MCP tool definition. Shared with OOXML_TOOL_DEFS so a future
* field added to one (annotations, outputSchema, etc.) widens both arrays. */
export interface ToolDef {
name: string;
description: string;
inputSchema: {
type: "object";
properties: Record<string, unknown>;
required?: string[];
};
}

// Tool definitions
const TOOLS = [
export const TOOLS: ToolDef[] = [
{
name: "search_ecma_spec",
name: "ooxml_search",
description:
"Semantic search across the ECMA-376 (Office Open XML) specification. Returns relevant sections based on natural language queries about WordprocessingML, SpreadsheetML, PresentationML, and more.",
inputSchema: {
Expand All @@ -61,7 +73,7 @@ const TOOLS = [
},
},
{
name: "get_section",
name: "ooxml_section",
description:
"Get a specific section of the ECMA-376 specification by section ID (e.g., '17.3.2' for paragraph properties).",
inputSchema: {
Expand All @@ -77,7 +89,7 @@ const TOOLS = [
},
},
{
name: "list_parts",
name: "ooxml_parts",
description: "List ECMA-376 specification parts and their top-level sections.",
inputSchema: {
type: "object" as const,
Expand Down Expand Up @@ -124,11 +136,11 @@ function handleInitialize(id: number | string | null): JsonRpcResponse {
tools: {},
},
serverInfo: {
name: "ecma-spec",
name: "ooxml",
version: "0.1.0",
},
instructions:
"ECMA-376 (Office Open XML) specification search server. Use search_ecma_spec for semantic search, get_section for specific sections, or list_parts to browse the spec structure.",
"OOXML (ECMA-376 / Office Open XML) reference server. Two tool families: prose search over the spec PDFs (ooxml_search, ooxml_section, ooxml_parts) and deterministic schema lookup over the parsed XSDs (ooxml_element, ooxml_type, ooxml_children, ooxml_attributes, ooxml_enum, ooxml_namespace).",
},
};
}
Expand Down Expand Up @@ -173,7 +185,7 @@ async function handleToolsCall(
}

switch (name) {
case "search_ecma_spec": {
case "ooxml_search": {
const query = args?.query as string;
const part = args?.part as number | undefined;
const limit = Math.min((args?.limit as number) || 5, 20);
Expand All @@ -194,7 +206,7 @@ async function handleToolsCall(
break;
}

case "get_section": {
case "ooxml_section": {
const sectionId = args?.section_id as string;
const part = args?.part as number | undefined;

Expand All @@ -213,7 +225,7 @@ async function handleToolsCall(
break;
}

case "list_parts": {
case "ooxml_parts": {
const part = args?.part as number | undefined;

const db = createDb(env.DATABASE_URL);
Expand Down
4 changes: 2 additions & 2 deletions apps/mcp-server/src/ooxml-queries.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/**
* Read-only schema-graph queries powering the OOXML MCP tools:
* ooxml_lookup_element, ooxml_lookup_type, ooxml_children,
* ooxml_attributes, ooxml_enum, ooxml_namespace_info.
* ooxml_element, ooxml_type, ooxml_children,
* ooxml_attributes, ooxml_enum, ooxml_namespace.
*
* These take a tagged-template SQL function (Neon in the deployed Worker,
* postgres.js in local tests). All queries are profile-scoped and walk
Expand Down
Loading