AI Document Scanner - Prototype

Complete AI-powered document scanner with alignment detection and glare warning.

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Download base images (optional - or add manually)
python scripts/1_download_documents.py

# 3. Generate synthetic training data (2-4 hours)
python scripts/2_generate_synthetic_data.py

# 4. Train model on Google Colab (recommended - free GPU)
# Upload notebooks/colab_training.ipynb to Google Colab

# 5. Test inference
python scripts/4_test_inference.py --image test.jpg
python scripts/4_test_inference.py --video 0  # Webcam

Project Structure

document_scanner/
├── data/
│   ├── source_documents/     # Base document images
│   ├── synthetic/            # Generated training data (50k+ images)
│   └── test_samples/         # Real test images
├── models/
│   ├── saved_model/          # Trained TensorFlow model
│   └── tflite/               # TFLite conversions (future)
├── scripts/
│   ├── 1_download_documents.py
│   ├── 2_generate_synthetic_data.py
│   ├── 3_train_model.py
│   ├── 4_test_inference.py
│   └── model_architecture.py
├── notebooks/
│   └── colab_training.ipynb  # Google Colab training
├── config.py                 # Configuration
└── requirements.txt          # Dependencies

Features

✅ Zero manual labeling - Automatic synthetic data generation
✅ 4 outputs: aligned_conf, glare_conf, direction_x, direction_y
✅ Lightweight model (~10-12 MB) - MobileNetV3
✅ Fast inference (20-30ms on CPU)
✅ Google Colab support (free GPU training)
✅ Real-time processing (webcam/video)

How It Works

1. Synthetic Data Generation

Computer automatically creates 50,000+ training images
Applies random transformations (position, rotation, scale, glare)
Labels calculated automatically - no human annotation needed!

2. Model Training

MobileNetV3-based CNN with multi-task head
Trains on Google Colab (free T4 GPU)
2-3 hours training time

3. Inference

Real-time document analysis
Direction guidance ("Move LEFT", "Move RIGHT", etc.)
Glare warnings ("AVOID REFLECTIONS")
Auto-capture ready when aligned

Model Outputs

aligned_conf (0-1): Document alignment confidence
glare_conf (0-1): Glare/reflection confidence
direction_x (-1 to +1): Horizontal direction (negative = move right)
direction_y (-1 to +1): Vertical direction (negative = move down)

Decision Logic

if glare_conf >= 0.5:
    → "⚠️ AVOID REFLECTIONS"
elif aligned_conf > 0.8 and glare_conf < 0.2:
    → "✓ HOLD STEADY - Ready to capture"
else:
    → Show direction guidance

Configuration

All settings in config.py:

Model architecture options
Training hyperparameters
Data generation parameters
Inference thresholds

Requirements

Python 3.8+
TensorFlow 2.13+
OpenCV 4.8+
8GB RAM (16GB recommended for training)
GPU optional (use Colab if no local GPU)

Time Estimates

Phase	Duration
Download documents	30-60 min (optional)
Generate synthetic data	2-4 hours
Train model (Colab GPU)	2-3 hours
Test & validate	30 min
Total	6-10 hours

Next Steps After Training

Test on 10-20 real document images
Validate accuracy and performance
Fine-tune if needed
Convert to TFLite for mobile deployment
Implement auto-capture logic

Support

For detailed instructions, see documentation in parent folder.

Status: Ready for execution
Version: 0.1.0 (Prototype)
Last Updated: 2025-11-11

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ENHANCED DATASET_FINAL		ENHANCED DATASET_FINAL
__pycache__		__pycache__
data		data
notebooks		notebooks
scripts		scripts
GETTING_STARTED.md		GETTING_STARTED.md
INSTALLATION_VERIFICATION.md		INSTALLATION_VERIFICATION.md
README.md		README.md
config.py		config.py
download_sample_documents.py		download_sample_documents.py
requirements.txt		requirements.txt
test_unsplash_key.py		test_unsplash_key.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Document Scanner - Prototype

Quick Start

Project Structure

Features

How It Works

1. Synthetic Data Generation

2. Model Training

3. Inference

Model Outputs

Decision Logic

Configuration

Requirements

Time Estimates

Next Steps After Training

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Document Scanner - Prototype

Quick Start

Project Structure

Features

How It Works

1. Synthetic Data Generation

2. Model Training

3. Inference

Model Outputs

Decision Logic

Configuration

Requirements

Time Estimates

Next Steps After Training

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages