feat(mcp): add ooxml_package_part for OPC part metadata#7
Merged
Conversation
The XSD schema graph answers "what's legal inside this XML body?" The prose corpus answers "what does this spec section say?" Neither answers "what kind of OPC part is /customXml/item1.xml?" That's a package-level concern: content type, source relationship type, root namespace, typical path. Agents working with .docx / .xlsx / .pptx packages reach for this constantly and have nowhere structural to land. Adds `ooxml_package_part` backed by a curated static dataset of 25 OPC part types from ECMA-376 Part 1 §11.3.x (WML), §12.3.x (SML), §13.3.x (PML), §14.2.7.10 (theme), and §15.x (cross-cutting). Word covers document, styles, settings, numbering, comments, footnotes, endnotes, header, footer; Excel covers workbook, worksheet, shared strings; PowerPoint covers presentation, slide, slide layout, slide master; cross-cutting covers core / extended / custom properties, theme, image, custom XML data storage, custom XML data storage properties. Four lookup modes: exact content_type, exact relationship_type, query substring, or no args → list-all. Where the spec prose and the XSD target namespace disagree (the custom XML data storage properties part is named .../customXmlDataProps in §15.2.6 but the shipped XSD targets .../customXml), rootNamespace pins the XSD URI so the value composes cleanly with ooxml_element. Static typed data in apps/mcp-server/src/opc-parts.ts, no DB. The set is small, static across ECMA editions, and curated; the PR diff is the audit primitive. Add a new entry by appending to OPC_PARTS; the lookup index rebuilds lazily. Tests cover dataset consistency (unique keys, non-empty required fields, every family represented), exact and substring lookups, and the four tool dispatch modes. No DB needed for any of them.
Three issues from PR review: - relationship_type lookup collapsed shared rels. The .../relationships/ officeDocument URI points at the main part for WML, SML, and PML, but the Map<string, OpcPart> index let later entries overwrite earlier ones, so a lookup returned only the Presentation part. Index is now Map<string, OpcPart[]>; the dispatcher renders multi-match as a list with a note that the relationship is shared across families and the caller has to disambiguate by the source part. - Image content type was a wildcard display string. Real [Content_Types].xml entries record a specific media type per image (image/png, image/jpeg, ...) so an exact lookup against the display string never matched. contentType is now `string | string[]`; the Image Part enumerates the spec-§15.2.13 set (png, jpeg, gif, tiff, x-emf, x-wmf, bmp). Each entry is indexed; the formatter renders multi-content-type records under a plural label with a "+N more" indicator in the list view. - initialize handler and apps/mcp-server/README.md still advertised two tool families and omitted ooxml_package_part, hurting agent discoverability. Both updated to list three tool families and describe the package-metadata corpus. New tests cover (a) every enumerated image media type resolving exactly, (b) the shared officeDocument relationship returning all three main parts, and (c) the tool's multi-match rendering for shared rels. Existing tests updated for the new helper name / array contract.
The previous fix updated the server README and initialize text but missed the web-facing surfaces, which still advertised two tool families. Bringing every surface in sync: - apps/web/src/pages/Mcp.tsx: hero copy updated, added a Package metadata section, refreshed the trailing "what is MCP" paragraph. - apps/web/public/llms.txt: feeds llms.txt/llms-full.txt that AI crawlers and the build-time SEO pipeline consume. - apps/mcp-server/src/index.ts: header comment in the worker entry. - README.md + CLAUDE.md: project-level docs. - brand.md: brand-voice copy that lists the MCP as an AI-native differentiator. No behavior change; everything in this commit is documentation / agent-discoverability surface.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The XSD schema graph and the prose corpus don't answer "what kind of OPC part is
/customXml/item1.xml?" That's package metadata: content type, source relationship type, root namespace, typical path. Agents working with .docx / .xlsx / .pptx packages need it constantly and currently have to reconstruct it from prose search.Adds
ooxml_package_partbacked by a curated static dataset of 25 OPC part types inapps/mcp-server/src/opc-parts.ts. Covers Word (document, styles, settings, numbering, comments, footnotes, endnotes, header, footer), Excel (workbook, worksheet, shared strings), PowerPoint (presentation, slide, slide layout, slide master), and cross-cutting (core / extended / custom properties, theme, image, custom XML data storage and its properties part).Four lookup modes: exact
content_type, exactrelationship_type,querysubstring, or no args → list-all. Where the spec prose and XSD target namespace disagree (custom XML data storage properties part is named.../customXmlDataPropsin §15.2.6 but the XSD targets.../customXml),rootNamespacepins the XSD URI so the value composes cleanly withooxml_element.Static typed data, no DB. The set is small, static across ECMA editions, and curated; the PR diff is the audit primitive. Adding a new entry is appending to
OPC_PARTS— the lookup index rebuilds lazily on first access.Hyperlinks are intentionally out of scope: they're a relationship type, not a package part. If needed later they'd warrant a different model.
Review: confirm the curated set covers your common cases; flag any wrong content type / relationship URI / namespace pins (these were transcribed from Part 1 §11.3.x / §12.3.x / §13.3.x / §15.x). Ignore the rest of
ooxml-tools.ts— additive only.Verified: 71 pass / 3 skip / 0 fail. Format / lint / typecheck / build all clean. (The 3 skips are the xsd-cache-gated smoke tests in
tests/ingest-xsd/, unrelated to this PR.)