classifiers

A document processing system written in Go that extracts data from documents and classifies them based on configurable rules.

How It Works

1. Classification Rules

Rules define document types and the keywords that identify them.

2. Document Processing

Extract → Classify → Organize

Extract: Pull text from various document formats

Classify: Apply rules to determine document type

Organize: Sort documents by classification

3. File Browser Interface

📁 Browse directories with visual representation

📄 Select files for processing

🔍 Filter by supported types

4. Classification Rules Management

📋 View and edit rules

🔄 Reload from external files

🔀 Select different rule sets

Supported Document Types

📊 Invoices
📜 Contracts
🧾 Receipts
📝 Reports

Project Structure

/models - Data structure definitions
/services - Core business logic
/services/extractors - Document data extraction components
/services/classifiers - Document classification logic
/ui - User interface components

Dependencies

Tesseract OCR

Required for image text extraction:

Windows:

Download from UB-Mannheim/tesseract
During installation, select the languages you need (English is 'eng', Portuguese is 'por')

macOS:

brew install tesseract tesseract-lang

For specific languages: brew install tesseract-lang-eng tesseract-lang-por

Linux (Ubuntu/Debian):

# For English language support
sudo apt update && sudo apt install -y tesseract-ocr tesseract-ocr-eng

# For Portuguese language support
sudo apt install -y tesseract-ocr-por

Language Packs:

'eng': English
'por': Portuguese
'spa': Spanish
'fra': French
'deu': German

Install the language packs appropriate for your documents' content.

Usage

Run with path: ./classifiers

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
interfaces		interfaces
models		models
rules		rules
services		services
ui		ui
.gitignore		.gitignore
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

classifiers

How It Works

1. Classification Rules

2. Document Processing

3. File Browser Interface

4. Classification Rules Management

Supported Document Types

Project Structure

Dependencies

Tesseract OCR

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

h0rck/classifiers

Folders and files

Latest commit

History

Repository files navigation

classifiers

How It Works

1. Classification Rules

2. Document Processing

3. File Browser Interface

4. Classification Rules Management

Supported Document Types

Project Structure

Dependencies

Tesseract OCR

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages