Feature Request: Integrate MarkItDown as a lightweight document reading option for built-in tools
Problem
Currently, GoClaw agents read documents (PDF, DOCX, XLSX, PPTX, etc.) using custom Python scripts in each skill's scripts/ directory. While this works, it has several drawbacks:
- High token consumption: Each skill's extraction script may produce verbose output, consuming more LLM tokens than necessary
- Duplicated effort: Multiple skills (docx, pdf, xlsx, pptx) each maintain their own extraction logic
- Maintenance burden: Each skill script needs its own dependency management and updates
- No unified toggle: Users can't easily switch between extraction methods or disable document reading to save tokens
Proposed Solution
Integrate Microsoft MarkItDown as an optional, unified document-to-Markdown conversion engine for GoClaw's built-in tools.
What is MarkItDown?
- Lightweight Python utility from Microsoft's AutoGen team for converting files to Markdown
- Supports: PDF, PowerPoint, Word, Excel, Images (EXIF + OCR), Audio (metadata + transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPUBs, and more
- Token-efficient output: Converts to clean Markdown that LLMs understand natively
- MCP server available: Already has a
markitdown-mcp package for LLM integration
- Plugin system: Supports 3rd-party plugins (e.g.,
markitdown-ocr for image OCR in documents)
- MIT License: Permissive, suitable for integration
- CLI + Python API:
markitdown file.pdf > output.md or MarkItDown().convert("file.pdf")
Integration Design
1. Toggle in Settings
Add a settings toggle to enable/disable MarkItDown as the document reader:
{
"builtInTools": {
"documentReader": {
"engine": "markitdown", // "markitdown" | "skill-scripts" | "auto"
"enabled": true,
"markitdownOptions": {
"enablePlugins": false,
"featureGroups": ["pdf", "docx", "xlsx", "pptx"],
"llmClient": null, // Optional: for image descriptions / OCR
"llmModel": null
}
}
}
}
"markitdown": Use MarkItDown for all supported file types
"skill-scripts": Use existing per-skill Python scripts (current behavior)
"auto": Use MarkItDown if available, fall back to skill scripts
2. System Dependency
Add markitdown as a recognized Python package in the dependency installer:
pip:markitdown[all] # or selective: pip:markitdown[pdf,docx,xlsx,pptx]
3. Fallback Chain
User uploads document
→ MarkItDown enabled? → Yes → Convert to Markdown → Return
→ No → Fall back to skill scripts (docx/pdf/xlsx/pptx skills)
→ Skill scripts fail? → Return error
Benefits
| Aspect |
Current (Skill Scripts) |
With MarkItDown |
| Token usage |
Varies by skill, often verbose |
Optimized Markdown, token-efficient |
| Maintenance |
Per-skill scripts to maintain |
Single unified library |
| Format support |
Limited to what skills implement |
12+ formats out of the box |
| Toggle |
No global toggle |
Settings toggle on/off |
| LLM image desc |
Not supported |
Built-in with llm_client |
| Plugin extensibility |
Custom per skill |
Standard plugin system |
Use Cases
- Token savings: Users on tight context windows can use MarkItDown's leaner output
- Quick document preview: Convert any supported file to Markdown without loading multiple skills
- Unified pipeline: One tool handles PDF, DOCX, XLSX, PPTX, images, audio, etc.
- Disable when not needed: Toggle off to skip document reading entirely and save processing time
Implementation Notes
- MarkItDown reads from file streams, not file paths — no temporary files created
- Can be installed selectively:
pip install 'markitdown[pdf,docx]' instead of [all]
- MCP server already exists (
markitdown-mcp) — could be used directly or as reference
- Plugin system supports OCR via
markitdown-ocr for extracting text from images in documents
- Docker support available for sandboxed execution
Comparison with Existing Approach
We already have a comparison report between LiteParse and MarkItDown — both scored similarly (8/10 vs 8.5/10). This issue is specifically about using MarkItDown as a built-in tool option with a settings toggle, not replacing existing skill scripts entirely.
Labels: enhancement, document-processing, token-optimization, built-in-tools
Feature Request: Integrate MarkItDown as a lightweight document reading option for built-in tools
Problem
Currently, GoClaw agents read documents (PDF, DOCX, XLSX, PPTX, etc.) using custom Python scripts in each skill's
scripts/directory. While this works, it has several drawbacks:Proposed Solution
Integrate Microsoft MarkItDown as an optional, unified document-to-Markdown conversion engine for GoClaw's built-in tools.
What is MarkItDown?
markitdown-mcppackage for LLM integrationmarkitdown-ocrfor image OCR in documents)markitdown file.pdf > output.mdorMarkItDown().convert("file.pdf")Integration Design
1. Toggle in Settings
Add a settings toggle to enable/disable MarkItDown as the document reader:
{ "builtInTools": { "documentReader": { "engine": "markitdown", // "markitdown" | "skill-scripts" | "auto" "enabled": true, "markitdownOptions": { "enablePlugins": false, "featureGroups": ["pdf", "docx", "xlsx", "pptx"], "llmClient": null, // Optional: for image descriptions / OCR "llmModel": null } } } }"markitdown": Use MarkItDown for all supported file types"skill-scripts": Use existing per-skill Python scripts (current behavior)"auto": Use MarkItDown if available, fall back to skill scripts2. System Dependency
Add
markitdownas a recognized Python package in the dependency installer:3. Fallback Chain
Benefits
Use Cases
Implementation Notes
pip install 'markitdown[pdf,docx]'instead of[all]markitdown-mcp) — could be used directly or as referencemarkitdown-ocrfor extracting text from images in documentsComparison with Existing Approach
We already have a comparison report between LiteParse and MarkItDown — both scored similarly (8/10 vs 8.5/10). This issue is specifically about using MarkItDown as a built-in tool option with a settings toggle, not replacing existing skill scripts entirely.
Labels: enhancement, document-processing, token-optimization, built-in-tools