-
-
Notifications
You must be signed in to change notification settings - Fork 685
feat: markitdown implementation #486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 2 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/converters/main.ts">
<violation number="1" location="src/converters/main.ts:131">
P2: Inconsistent naming convention: all other converter keys use lowercase (e.g., `pandoc`, `libreoffice`, `ffmpeg`), but `MarkitDown` uses PascalCase. This could cause lookup failures if callers pass `markitdown` (following the established pattern). Consider renaming to `markitdown` for consistency.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply to cubic to teach it or ask questions. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Feel free to merge when you want :)
✨ What’s new
This PR adds Markdown (.md) output support using Microsoft MarkItDown as a new CLI-backed converter.
Supported conversions:
PDF → Markdown
DOCX → Markdown
PPTX → Markdown
HTML → Markdown
🤔 Why MarkItDown
MarkItDown provides:
High-quality, semantic Markdown output
Better structure preservation (headings, lists, links)
Automatic input type detection
A CLI interface that aligns well with ConvertX’s architecture
This makes it a strong alternative to existing document converters for Markdown output.
Summary by cubic
Adds Markdown output via Microsoft MarkItDown, integrated as a new CLI-backed converter. Converts PDF, DOCX, PPTX, HTML, and Excel files to .md with better structure preservation.
New Features
Migration
markitdownis available in PATH.markitdownis in PATH. No config changes required.Written for commit f2dcc7f. Summary will update automatically on new commits.