Hi — first, thanks for maintaining this fork. I spent some time getting it running on Windows with Claude Code and hit a cluster of bugs that interact. Filing as one umbrella because they overlap (all affect the same startup/IO path or the same dispatch function), with a follow-up PR providing fixes. Happy to split into separate issues/PRs if you prefer.
Environment: Windows 11, Python 3.13, Claude Code as MCP client, markitdown-mcp main (commit as of 2026-04-19), markitdown[all] installed.
Bug 1 — stdio defaults break the MCP protocol on Windows
Python's text-mode stdio defaults on Windows break line-delimited JSON-RPC in two ways:
- stdout: CRLF translation (
\n → \r\n) corrupts the framing
- stdin: cp1252 encoding corrupts non-ASCII bytes (e.g. a path containing
ä arrives as ä and the subsequent file operation fails)
Repro:
- Run as an MCP server under any Windows client (Claude Code, Claude Desktop)
- Send a
convert_file request with a file path containing a non-ASCII character (Jäger, Müller, …)
- Observe:
Security violation: invalid path (the decoded path no longer matches the filesystem)
Root cause: main() relies on interpreter defaults. On Unix these are UTF-8 + LF; on Windows they're cp1252 + CRLF.
Proposed fix: reconfigure stdio at the top of main():
sys.stdout.reconfigure(encoding="utf-8", newline="\n")
sys.stdin.reconfigure(encoding="utf-8")
No-op on platforms that already default to UTF-8/LF.
Bug 2 — Path.home() can crash server init
get_safe_working_directories() calls Path.home() unconditionally. If neither HOME nor USERPROFILE is set, Path.home() raises RuntimeError and the whole MarkItDownMCPServer.__init__ aborts with an opaque traceback before the server ever processes a request.
Repro: launch the server with a cleared environment (env -i python -m markitdown_mcp.server on Unix; Claude Code currently also spawns stdio MCP servers with effectively empty env on Windows).
Proposed fix: wrap the call in try/except, log a warning, skip the home-subdir additions:
try:
home = Path.home()
except RuntimeError:
logger.warning("Could not determine user home; skipping home subdirs")
home = None
Bug 3 — Server replies to notifications (JSON-RPC 2.0 §4.1 violation)
The message dispatch in MarkItDownMCPServer.run() builds a response for every incoming message, including notifications (messages without an id). JSON-RPC 2.0 §4.1 "Notification": "The Server MUST NOT reply to a Notification, including those that are within a batch request." MCP uses notifications/initialized during the handshake, so this breaks any strict MCP client.
Repro:
- Send
{"jsonrpc":"2.0","method":"notifications/initialized","params":{}} (no id)
- Observe a response with fabricated
id: "unknown"
Current code:
request = MCPRequest(
id=message.get("id", "unknown"), # fake id invented here
...
)
response = await self.handle_request(request)
# ... always writes a response
Proposed fix:
is_notification = "id" not in message
request = MCPRequest(id=message.get("id"), ...)
response = await self.handle_request(request)
if is_notification:
continue
# else write response
(MCPRequest.id / MCPResponse.id type needs to become str | int | None — JSON-RPC allows numeric and null ids too.)
Bug 4 — anyOf at top of inputSchema rejects the convert_file tool on the Anthropic API
The convert_file tool schema uses anyOf at the top level of inputSchema to express "either file_path OR file_content+filename". The Anthropic Messages API (and thus Claude Code / Claude Desktop) rejects this with:
input_schema does not support oneOf, allOf, or anyOf at the top level
→ the tool silently fails to load for any Anthropic-based client.
Proposed fix: drop anyOf, leave required-field enforcement to the runtime (the handler already validates), and clarify the either/or rule in the tool description:
"description": (
"Convert a file to Markdown using MarkItDown. "
"Provide either 'file_path' OR both 'file_content' (base64) and 'filename'."
),
"inputSchema": {
"type": "object",
"properties": { "file_path": {...}, "file_content": {...}, "filename": {...} },
# no anyOf, no required
},
Note: the same schema is duplicated in get_tools() and inline in handle_request()'s tools/list branch — both need the fix. (Separately: consider having the inline branch call self.get_tools() so the duplication goes away.)
Bug 5 — "xml" in mime_type falsely matches openxmlformats → docx/xlsx/pptx broken
validate_file_content_security() dispatches to validate_xml_security() based on:
if (mime_type and "xml" in mime_type) or file_ext in [".xml", ".xhtml"]:
The MIME type for .docx is application/vnd.openxmlformats-officedocument.wordprocessingml.document — which contains the substring "xml". The file is then opened in text mode with errors="ignore", scanned for XML entity patterns, and written back as a "sanitized" .xml file. MarkItDown receives a broken UTF-8 text stream that started life as a ZIP container → ~400 KB of garbled ZIP bytes instead of Markdown.
Same failure mode applies to .xlsx (…spreadsheetml.sheet) and .pptx (…presentationml.presentation) — every Office OpenXML format.
The json/csv branches below have the same substring anti-pattern; less explosive in practice but worth fixing for consistency.
Proposed fix: exact MIME matching via module-level sets:
_XML_MIME_TYPES = {"text/xml", "application/xml"}
_JSON_MIME_TYPES = {"application/json", "text/json"}
_CSV_MIME_TYPES = {"text/csv", "application/csv"}
# ...
if (mime_type in _XML_MIME_TYPES) or file_ext in [".xml", ".xhtml"]:
return validate_xml_security(file_path)
# same for json/csv
PR
All five fixes are implemented locally and verified on Windows 11 (Claude Code, Python 3.13, markitdown[all]) with test files covering docx/xlsx/pdf, non-ASCII paths, and safe-dir-rejected paths. I'll open a PR referencing this issue with one commit per bug so each change can be reviewed in isolation. A separate feature-request issue will cover a configurable safe-directory env var (not filed here because it's not a bug).
Happy to split this into five issues if you'd rather have them tracked individually.
Hi — first, thanks for maintaining this fork. I spent some time getting it running on Windows with Claude Code and hit a cluster of bugs that interact. Filing as one umbrella because they overlap (all affect the same startup/IO path or the same dispatch function), with a follow-up PR providing fixes. Happy to split into separate issues/PRs if you prefer.
Environment: Windows 11, Python 3.13, Claude Code as MCP client, markitdown-mcp
main(commit as of 2026-04-19),markitdown[all]installed.Bug 1 — stdio defaults break the MCP protocol on Windows
Python's text-mode stdio defaults on Windows break line-delimited JSON-RPC in two ways:
\n→\r\n) corrupts the framingäarrives asäand the subsequent file operation fails)Repro:
convert_filerequest with a file path containing a non-ASCII character (Jäger,Müller, …)Security violation: invalid path(the decoded path no longer matches the filesystem)Root cause:
main()relies on interpreter defaults. On Unix these are UTF-8 + LF; on Windows they're cp1252 + CRLF.Proposed fix: reconfigure stdio at the top of
main():No-op on platforms that already default to UTF-8/LF.
Bug 2 —
Path.home()can crash server initget_safe_working_directories()callsPath.home()unconditionally. If neitherHOMEnorUSERPROFILEis set,Path.home()raisesRuntimeErrorand the wholeMarkItDownMCPServer.__init__aborts with an opaque traceback before the server ever processes a request.Repro: launch the server with a cleared environment (
env -i python -m markitdown_mcp.serveron Unix; Claude Code currently also spawns stdio MCP servers with effectively emptyenvon Windows).Proposed fix: wrap the call in try/except, log a warning, skip the home-subdir additions:
Bug 3 — Server replies to notifications (JSON-RPC 2.0 §4.1 violation)
The message dispatch in
MarkItDownMCPServer.run()builds a response for every incoming message, including notifications (messages without anid). JSON-RPC 2.0 §4.1 "Notification": "The Server MUST NOT reply to a Notification, including those that are within a batch request." MCP usesnotifications/initializedduring the handshake, so this breaks any strict MCP client.Repro:
{"jsonrpc":"2.0","method":"notifications/initialized","params":{}}(noid)id: "unknown"Current code:
Proposed fix:
(
MCPRequest.id/MCPResponse.idtype needs to becomestr | int | None— JSON-RPC allows numeric and null ids too.)Bug 4 —
anyOfat top ofinputSchemarejects theconvert_filetool on the Anthropic APIThe
convert_filetool schema usesanyOfat the top level ofinputSchemato express "eitherfile_pathORfile_content+filename". The Anthropic Messages API (and thus Claude Code / Claude Desktop) rejects this with:→ the tool silently fails to load for any Anthropic-based client.
Proposed fix: drop
anyOf, leave required-field enforcement to the runtime (the handler already validates), and clarify the either/or rule in the tool description:Note: the same schema is duplicated in
get_tools()and inline inhandle_request()'stools/listbranch — both need the fix. (Separately: consider having the inline branch callself.get_tools()so the duplication goes away.)Bug 5 —
"xml" in mime_typefalsely matchesopenxmlformats→ docx/xlsx/pptx brokenvalidate_file_content_security()dispatches tovalidate_xml_security()based on:The MIME type for
.docxisapplication/vnd.openxmlformats-officedocument.wordprocessingml.document— which contains the substring"xml". The file is then opened in text mode witherrors="ignore", scanned for XML entity patterns, and written back as a "sanitized".xmlfile. MarkItDown receives a broken UTF-8 text stream that started life as a ZIP container → ~400 KB of garbled ZIP bytes instead of Markdown.Same failure mode applies to
.xlsx(…spreadsheetml.sheet) and.pptx(…presentationml.presentation) — every Office OpenXML format.The
json/csvbranches below have the same substring anti-pattern; less explosive in practice but worth fixing for consistency.Proposed fix: exact MIME matching via module-level sets:
PR
All five fixes are implemented locally and verified on Windows 11 (Claude Code, Python 3.13,
markitdown[all]) with test files covering docx/xlsx/pdf, non-ASCII paths, and safe-dir-rejected paths. I'll open a PR referencing this issue with one commit per bug so each change can be reviewed in isolation. A separate feature-request issue will cover a configurable safe-directory env var (not filed here because it's not a bug).Happy to split this into five issues if you'd rather have them tracked individually.