This document outlines a comprehensive plan to modernize and improve the mkdocs-combine project. The project currently converts MkDocs documentation into a single Markdown file, but it's built on older standards (Python 2.7) and needs significant updates to remain relevant and maintainable.
Currently, the codebase targets Python 2.7 which reached end-of-life in 2020. This creates security risks and limits the use of modern Python features.
Steps:
- Update all string handling to use Python 3's unified string type
- Replace all
unicodereferences with proper Python 3 string handling - Update print statements to print functions throughout the codebase
- Use
pathlibfor path operations instead ofos.path - Replace
codecs.open()with built-inopen()with encoding parameter - Update exception handling syntax from
except Exception, e:toexcept Exception as e: - Use type hints throughout the codebase for better maintainability
- Update minimum Python version to 3.8+ (or 3.9+ for better type hints)
- Update all dependencies to their latest compatible versions
- Add proper dependency version constraints in setup.py
The README mentions that MkDocs now supports plugins that provide better architecture. We should consider:
- Investigating the MkDocs plugin architecture
- Potentially rewriting this as a proper MkDocs plugin
- Or at minimum, align with MkDocs plugin conventions
The current MkDocsCombiner class is monolithic (300+ lines). Break it down into:
ConfigManager: Handle configuration loading and validationPageProcessor: Handle individual page processingFilterChain: Manage the filter pipelineOutputFormatter: Handle different output formats
Current filters are somewhat scattered. Create a proper filter interface:
class BaseFilter(ABC):
@abstractmethod
def process(self, lines: List[str]) -> List[str]:
passCurrently supports Markdown and basic HTML. Add:
- PDF generation: Direct PDF output using libraries like WeasyPrint or ReportLab
- EPUB generation: Direct EPUB output without requiring pandoc
- DOCX support: Microsoft Word format output
- AsciiDoc output: For technical documentation workflows
- Support for Mermaid diagrams
- Better handling of footnotes
- Support for definition lists
- Enhanced table processing with alignment preservation
Currently, image handling is basic. Improve:
- Copy referenced images to output directory
- Support for relative image paths
- Handle other assets (CSS, JS for HTML output)
- Option to embed images as base64 in HTML output
Currently no visible test suite. Implement:
- Unit tests for each filter
- Integration tests for full document processing
- Test fixtures with sample MkDocs projects
- Property-based testing for filters
- Coverage target: 80%+
- Add pre-commit hooks
- Configure black for code formatting
- Use ruff for linting
- Add mypy for type checking
- Configure GitHub Actions for CI/CD
- Comprehensive API documentation
- Usage examples for each feature
- Migration guide from current version
- Contributing guidelines
- Migrate from setup.py to pyproject.toml
- Use Poetry or PDM for dependency management
- Proper semantic versioning
- Automated releases via GitHub Actions
- Ensure pip installation works smoothly
- Consider conda-forge distribution
- Docker image for easy usage
- Homebrew formula for macOS users
- Profile current performance bottlenecks
- Implement parallel processing for multiple files
- Optimize regex operations in filters
- Consider using compiled regex patterns
- Stream processing for large documents
- Lazy loading of files
- Efficient string concatenation
- Better progress indicators
- Colored output for better readability
- Interactive mode for configuration
- Dry-run option to preview changes
- Clear error messages with actionable advice
- Validate MkDocs configuration before processing
- Handle missing files gracefully
- Provide debugging mode with detailed logs
- Test with multiple MkDocs versions
- Handle deprecated features gracefully
- Support for MkDocs themes and their specific requirements
- VS Code extension for live preview
- GitHub Action for automated documentation building
- GitLab CI templates
- Integration with Read the Docs
- Python 3 migration
- Basic test suite
- CI/CD setup
- Code formatting and linting
- Modular design implementation
- Filter system refactoring
- Comprehensive testing
- Performance optimization
- New output formats
- Enhanced Markdown processing
- Asset management
- CLI improvements
- Documentation
- Examples and tutorials
- Community building
- Release preparation
- Python 3 only codebase
- 80%+ test coverage
- Support for 3+ additional output formats
- 50% performance improvement
- Active community with regular contributions
- Compatible with latest MkDocs versions