📄 OCR File Sorter

A PDF sorting application that automatically organizes documents based on their content using their text or OCR.

Quick Start

For Users

Download the installer: OCR_File_Sorter_Installer.exe
Run the installer and follow the setup wizard
Start sorting your PDFs!

For Developers

# Clone and setup
git clone https://github.com/Friedrice04/PDF-Sorter.git
cd PDF-Sorter

# Create virtual environment
python -m venv .venv
.venv\Scripts\activate

# Install dependencies
pip install -r config/requirements.txt

# Run the application
python src/main.py

Project Structure

PDF-Sorter/
├── src/                    # Main application code
│   ├── main.py               # Application entry point
│   ├── gui.py                # User interface
│   ├── sorter.py             # Core sorting logic
│   ├── utils.py              # Utility functions
│   ├── icons/             # Application icons
│   ├── mappings/          # Sorting rule examples
│   └── mapping_editor/    # Mapping editor module
├── tests/                 # Test suite
│   ├── test_runner/       # PDF testing framework
│   └── ...                   # Unit and integration tests
├── scripts/               # Build and utility scripts
│   ├── build.bat            # Simple build script
│   ├── build_complete.bat   # Complete build with installer
│   ├── build_exe.py         # PyInstaller build script
│   └── installer.py         # Unified installer (build & install)
├── config/                # Configuration files
│   ├── requirements.txt     # Runtime dependencies
│   ├── requirements-build.txt # Build dependencies
│   └── requirements-dev.txt # Development dependencies
├── docs/                  # Documentation
│   ├── DISTRIBUTION_GUIDE.md # Distribution instructions
│   └── INSTALLER_README.md  # Installer technical details
├── build/                 # Build artifacts (ignored)
├── dist/                  # Distribution files
└── quick-build.bat           # Quick build convenience script

Features

Intelligent PDF Processing

Text Extraction: Direct PDF text reading with OCR fallback
Pattern Matching: Flexible phrase-based sorting rules
OCR Support: Handles scanned documents with Tesseract
Robust Parsing: Handles OCR quirks and text variations

Smart Sorting

Custom Mappings: Create your own sorting rules
Template System: Predefined folder structures
Batch Processing: Sort multiple files at once
File Naming: Configurable output file naming schemes

User-Friendly Interface

Drag & Drop: Easy file selection
Progress Tracking: Real-time sorting progress
Visual Feedback: Clear status updates and error messages
Mapping Editor: Built-in rule editor with preview

Professional Features

Comprehensive Testing: PDF testing framework included
Easy Distribution: Single-file installer with dependencies
Cross-Platform: Windows focus with portable codebase
Extensible: Modular architecture for easy enhancement

Building

Quick Build

# Build everything (application + installer)
quick-build.bat

Manual Build Steps

# 1. Install build dependencies
pip install -r config/requirements-build.txt

# 2. Build main application
cd scripts
python build_exe.py

# 3. Create installer (optional)
python installer.py --build

Output Files

dist/OCR File Sorter.exe - Main application
dist/OCR_File_Sorter_Installer.exe - Complete installer

Testing

Run Tests

# Run all tests
python -m pytest tests/

# Test PDF sorting specifically
cd tests/test_runner
python run_pdf_tests.py --verbose

PDF Testing Framework

The included test runner lets you easily test PDF sorting:

Add PDFs to tests/test_runner/input_pdfs/
Add mapping files to tests/test_runner/test_mappings/
Run run_pdf_tests.py to see where each PDF would be sorted

Requirements

Runtime

OS: Windows 10/11 (64-bit)
Python: 3.8+ (for source)
Dependencies: See config/requirements.txt

Optional

Tesseract OCR: For scanned PDF support (auto-installed with installer)

Use Cases

Document Management: Organize invoices, contracts, reports
Office Automation: Sort incoming documents by type
Archive Organization: Clean up document collections
Workflow Integration: Part of larger document processing pipelines

License

This project is licensed under the terms specified in LICENCE.txt.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

Documentation

Distribution Guide - Complete distribution instructions
Installer README - Installer technical details
Test Runner Guide - PDF testing framework

Support

Check the documentation in the docs/ folder
Review test examples in tests/test_runner/
Open an issue for bugs or feature requests

Transform your document chaos into organized bliss! 📄✨

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 OCR File Sorter

Quick Start

For Users

For Developers

Project Structure

Features

Intelligent PDF Processing

Smart Sorting

User-Friendly Interface

Professional Features

Building

Quick Build

Manual Build Steps

Output Files

Testing

Run Tests

PDF Testing Framework

Requirements

Runtime

Optional

Use Cases

License

Contributing

Documentation

Support

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.vscode		.vscode
config		config
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENCE.txt		LICENCE.txt
README.md		README.md
quick-build.bat		quick-build.bat

Folders and files

Latest commit

History

Repository files navigation

📄 OCR File Sorter

Quick Start

For Users

For Developers

Project Structure

Features

Intelligent PDF Processing

Smart Sorting

User-Friendly Interface

Professional Features

Building

Quick Build

Manual Build Steps

Output Files

Testing

Run Tests

PDF Testing Framework

Requirements

Runtime

Optional

Use Cases

License

Contributing

Documentation

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages