Document structure extraction (PDF/DOCX/HTML → Markdown)
Goal
Extract structured Markdown with headings preserved.
Responsibilities
- Convert source formats → Markdown
- Preserve hierarchy (#, ##, ###, etc.)
- Normalize headings across formats
- Produce a single structured Markdown document
- Emit warnings for elements that may not render cleanly (e.g. tables)
- Unit tests for the functionality
Document structure extraction (PDF/DOCX/HTML → Markdown)
Goal
Extract structured Markdown with headings preserved.
Responsibilities