Complete AI-powered document scanner with alignment detection and glare warning.
# 1. Install dependencies
pip install -r requirements.txt
# 2. Download base images (optional - or add manually)
python scripts/1_download_documents.py
# 3. Generate synthetic training data (2-4 hours)
python scripts/2_generate_synthetic_data.py
# 4. Train model on Google Colab (recommended - free GPU)
# Upload notebooks/colab_training.ipynb to Google Colab
# 5. Test inference
python scripts/4_test_inference.py --image test.jpg
python scripts/4_test_inference.py --video 0 # Webcamdocument_scanner/
├── data/
│ ├── source_documents/ # Base document images
│ ├── synthetic/ # Generated training data (50k+ images)
│ └── test_samples/ # Real test images
├── models/
│ ├── saved_model/ # Trained TensorFlow model
│ └── tflite/ # TFLite conversions (future)
├── scripts/
│ ├── 1_download_documents.py
│ ├── 2_generate_synthetic_data.py
│ ├── 3_train_model.py
│ ├── 4_test_inference.py
│ └── model_architecture.py
├── notebooks/
│ └── colab_training.ipynb # Google Colab training
├── config.py # Configuration
└── requirements.txt # Dependencies
- ✅ Zero manual labeling - Automatic synthetic data generation
- ✅ 4 outputs: aligned_conf, glare_conf, direction_x, direction_y
- ✅ Lightweight model (~10-12 MB) - MobileNetV3
- ✅ Fast inference (20-30ms on CPU)
- ✅ Google Colab support (free GPU training)
- ✅ Real-time processing (webcam/video)
- Computer automatically creates 50,000+ training images
- Applies random transformations (position, rotation, scale, glare)
- Labels calculated automatically - no human annotation needed!
- MobileNetV3-based CNN with multi-task head
- Trains on Google Colab (free T4 GPU)
- 2-3 hours training time
- Real-time document analysis
- Direction guidance ("Move LEFT", "Move RIGHT", etc.)
- Glare warnings ("AVOID REFLECTIONS")
- Auto-capture ready when aligned
- aligned_conf (0-1): Document alignment confidence
- glare_conf (0-1): Glare/reflection confidence
- direction_x (-1 to +1): Horizontal direction (negative = move right)
- direction_y (-1 to +1): Vertical direction (negative = move down)
if glare_conf >= 0.5:
→ "⚠️ AVOID REFLECTIONS"
elif aligned_conf > 0.8 and glare_conf < 0.2:
→ "✓ HOLD STEADY - Ready to capture"
else:
→ Show direction guidanceAll settings in config.py:
- Model architecture options
- Training hyperparameters
- Data generation parameters
- Inference thresholds
- Python 3.8+
- TensorFlow 2.13+
- OpenCV 4.8+
- 8GB RAM (16GB recommended for training)
- GPU optional (use Colab if no local GPU)
| Phase | Duration |
|---|---|
| Download documents | 30-60 min (optional) |
| Generate synthetic data | 2-4 hours |
| Train model (Colab GPU) | 2-3 hours |
| Test & validate | 30 min |
| Total | 6-10 hours |
- Test on 10-20 real document images
- Validate accuracy and performance
- Fine-tune if needed
- Convert to TFLite for mobile deployment
- Implement auto-capture logic
For detailed instructions, see documentation in parent folder.
Status: Ready for execution
Version: 0.1.0 (Prototype)
Last Updated: 2025-11-11