Skip to content

[Feature Request] Integrate microsoft/markitdown as lightweight document reader with settings toggle for built-in tools #907

@mrgoonie

Description

@mrgoonie

Feature Request: Integrate MarkItDown as a lightweight document reading option for built-in tools

Problem

Currently, GoClaw agents read documents (PDF, DOCX, XLSX, PPTX, etc.) using custom Python scripts in each skill's scripts/ directory. While this works, it has several drawbacks:

  • High token consumption: Each skill's extraction script may produce verbose output, consuming more LLM tokens than necessary
  • Duplicated effort: Multiple skills (docx, pdf, xlsx, pptx) each maintain their own extraction logic
  • Maintenance burden: Each skill script needs its own dependency management and updates
  • No unified toggle: Users can't easily switch between extraction methods or disable document reading to save tokens

Proposed Solution

Integrate Microsoft MarkItDown as an optional, unified document-to-Markdown conversion engine for GoClaw's built-in tools.

What is MarkItDown?

  • Lightweight Python utility from Microsoft's AutoGen team for converting files to Markdown
  • Supports: PDF, PowerPoint, Word, Excel, Images (EXIF + OCR), Audio (metadata + transcription), HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPUBs, and more
  • Token-efficient output: Converts to clean Markdown that LLMs understand natively
  • MCP server available: Already has a markitdown-mcp package for LLM integration
  • Plugin system: Supports 3rd-party plugins (e.g., markitdown-ocr for image OCR in documents)
  • MIT License: Permissive, suitable for integration
  • CLI + Python API: markitdown file.pdf > output.md or MarkItDown().convert("file.pdf")

Integration Design

1. Toggle in Settings

Add a settings toggle to enable/disable MarkItDown as the document reader:

{
  "builtInTools": {
    "documentReader": {
      "engine": "markitdown",  // "markitdown" | "skill-scripts" | "auto"
      "enabled": true,
      "markitdownOptions": {
        "enablePlugins": false,
        "featureGroups": ["pdf", "docx", "xlsx", "pptx"],
        "llmClient": null,  // Optional: for image descriptions / OCR
        "llmModel": null
      }
    }
  }
}
  • "markitdown": Use MarkItDown for all supported file types
  • "skill-scripts": Use existing per-skill Python scripts (current behavior)
  • "auto": Use MarkItDown if available, fall back to skill scripts

2. System Dependency

Add markitdown as a recognized Python package in the dependency installer:

pip:markitdown[all]  # or selective: pip:markitdown[pdf,docx,xlsx,pptx]

3. Fallback Chain

User uploads document
  → MarkItDown enabled? → Yes → Convert to Markdown → Return
  → No → Fall back to skill scripts (docx/pdf/xlsx/pptx skills)
  → Skill scripts fail? → Return error

Benefits

Aspect Current (Skill Scripts) With MarkItDown
Token usage Varies by skill, often verbose Optimized Markdown, token-efficient
Maintenance Per-skill scripts to maintain Single unified library
Format support Limited to what skills implement 12+ formats out of the box
Toggle No global toggle Settings toggle on/off
LLM image desc Not supported Built-in with llm_client
Plugin extensibility Custom per skill Standard plugin system

Use Cases

  • Token savings: Users on tight context windows can use MarkItDown's leaner output
  • Quick document preview: Convert any supported file to Markdown without loading multiple skills
  • Unified pipeline: One tool handles PDF, DOCX, XLSX, PPTX, images, audio, etc.
  • Disable when not needed: Toggle off to skip document reading entirely and save processing time

Implementation Notes

  • MarkItDown reads from file streams, not file paths — no temporary files created
  • Can be installed selectively: pip install 'markitdown[pdf,docx]' instead of [all]
  • MCP server already exists (markitdown-mcp) — could be used directly or as reference
  • Plugin system supports OCR via markitdown-ocr for extracting text from images in documents
  • Docker support available for sandboxed execution

Comparison with Existing Approach

We already have a comparison report between LiteParse and MarkItDown — both scored similarly (8/10 vs 8.5/10). This issue is specifically about using MarkItDown as a built-in tool option with a settings toggle, not replacing existing skill scripts entirely.


Labels: enhancement, document-processing, token-optimization, built-in-tools

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions