Enhanced Markdown Support
Summary
This issue tracks the comprehensive markdown support implementation that landed this week, addressing issues #3 (bullet point merging) and #1 (punctuation handling) while introducing full markdown syntax awareness.
What's Implemented
- Complete markdown processor with semantic line breaks respecting markdown syntax
- Automatic file type detection using Magika ML-based content analysis
- Syntax-aware processing for:
Technical Details
- New processors:
sembr/processors/markdown.py (366 lines) with comprehensive AST parsing
- File type detection: Magika-based auto-detection in
sembr/processors/utils.py
- DRY architecture: Refactored processor system with base classes in
sembr/processors/base.py
- Backward compatibility: Preserves existing LaTeX and plain text support
Known Limitations
- Nested list edge cases with mixed indentation
- Footnote reference positioning
- Task list checkbox alignment
How to Test
# Install latest dev version
uv tool install sembr --from git+https://github.com/admko/sembr.git
# Test markdown files
sembr -i README.md -o README_sembr.md
# Force markdown processing
sembr -t markdown -i input.md -o output.md
Related Issues
Enhanced Markdown Support
Summary
This issue tracks the comprehensive markdown support implementation that landed this week, addressing issues #3 (bullet point merging) and #1 (punctuation handling) while introducing full markdown syntax awareness.
What's Implemented
%symbols (fixes Disruption of punctionation (%) #1)Technical Details
sembr/processors/markdown.py(366 lines) with comprehensive AST parsingsembr/processors/utils.pysembr/processors/base.pyKnown Limitations
How to Test
Related Issues
%) #1 - Punctuation disruption (%)