A PDF sorting application that automatically organizes documents based on their content using their text or OCR.
- Download the installer:
OCR_File_Sorter_Installer.exe - Run the installer and follow the setup wizard
- Start sorting your PDFs!
# Clone and setup
git clone https://github.com/Friedrice04/PDF-Sorter.git
cd PDF-Sorter
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate
# Install dependencies
pip install -r config/requirements.txt
# Run the application
python src/main.pyPDF-Sorter/
βββ src/ # Main application code
β βββ main.py # Application entry point
β βββ gui.py # User interface
β βββ sorter.py # Core sorting logic
β βββ utils.py # Utility functions
β βββ icons/ # Application icons
β βββ mappings/ # Sorting rule examples
β βββ mapping_editor/ # Mapping editor module
βββ tests/ # Test suite
β βββ test_runner/ # PDF testing framework
β βββ ... # Unit and integration tests
βββ scripts/ # Build and utility scripts
β βββ build.bat # Simple build script
β βββ build_complete.bat # Complete build with installer
β βββ build_exe.py # PyInstaller build script
β βββ installer.py # Unified installer (build & install)
βββ config/ # Configuration files
β βββ requirements.txt # Runtime dependencies
β βββ requirements-build.txt # Build dependencies
β βββ requirements-dev.txt # Development dependencies
βββ docs/ # Documentation
β βββ DISTRIBUTION_GUIDE.md # Distribution instructions
β βββ INSTALLER_README.md # Installer technical details
βββ build/ # Build artifacts (ignored)
βββ dist/ # Distribution files
βββ quick-build.bat # Quick build convenience script
- Text Extraction: Direct PDF text reading with OCR fallback
- Pattern Matching: Flexible phrase-based sorting rules
- OCR Support: Handles scanned documents with Tesseract
- Robust Parsing: Handles OCR quirks and text variations
- Custom Mappings: Create your own sorting rules
- Template System: Predefined folder structures
- Batch Processing: Sort multiple files at once
- File Naming: Configurable output file naming schemes
- Drag & Drop: Easy file selection
- Progress Tracking: Real-time sorting progress
- Visual Feedback: Clear status updates and error messages
- Mapping Editor: Built-in rule editor with preview
- Comprehensive Testing: PDF testing framework included
- Easy Distribution: Single-file installer with dependencies
- Cross-Platform: Windows focus with portable codebase
- Extensible: Modular architecture for easy enhancement
# Build everything (application + installer)
quick-build.bat# 1. Install build dependencies
pip install -r config/requirements-build.txt
# 2. Build main application
cd scripts
python build_exe.py
# 3. Create installer (optional)
python installer.py --builddist/OCR File Sorter.exe- Main applicationdist/OCR_File_Sorter_Installer.exe- Complete installer
# Run all tests
python -m pytest tests/
# Test PDF sorting specifically
cd tests/test_runner
python run_pdf_tests.py --verboseThe included test runner lets you easily test PDF sorting:
- Add PDFs to
tests/test_runner/input_pdfs/ - Add mapping files to
tests/test_runner/test_mappings/ - Run
run_pdf_tests.pyto see where each PDF would be sorted
- OS: Windows 10/11 (64-bit)
- Python: 3.8+ (for source)
- Dependencies: See
config/requirements.txt
- Tesseract OCR: For scanned PDF support (auto-installed with installer)
- Document Management: Organize invoices, contracts, reports
- Office Automation: Sort incoming documents by type
- Archive Organization: Clean up document collections
- Workflow Integration: Part of larger document processing pipelines
This project is licensed under the terms specified in LICENCE.txt.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
- Distribution Guide - Complete distribution instructions
- Installer README - Installer technical details
- Test Runner Guide - PDF testing framework
- Check the documentation in the
docs/folder - Review test examples in
tests/test_runner/ - Open an issue for bugs or feature requests
Transform your document chaos into organized bliss! πβ¨