Skip to content

Devpanchal37/Smart-Card-Detection

Repository files navigation

AI Document Scanner - Prototype

Complete AI-powered document scanner with alignment detection and glare warning.

Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Download base images (optional - or add manually)
python scripts/1_download_documents.py

# 3. Generate synthetic training data (2-4 hours)
python scripts/2_generate_synthetic_data.py

# 4. Train model on Google Colab (recommended - free GPU)
# Upload notebooks/colab_training.ipynb to Google Colab

# 5. Test inference
python scripts/4_test_inference.py --image test.jpg
python scripts/4_test_inference.py --video 0  # Webcam

Project Structure

document_scanner/
├── data/
│   ├── source_documents/     # Base document images
│   ├── synthetic/            # Generated training data (50k+ images)
│   └── test_samples/         # Real test images
├── models/
│   ├── saved_model/          # Trained TensorFlow model
│   └── tflite/               # TFLite conversions (future)
├── scripts/
│   ├── 1_download_documents.py
│   ├── 2_generate_synthetic_data.py
│   ├── 3_train_model.py
│   ├── 4_test_inference.py
│   └── model_architecture.py
├── notebooks/
│   └── colab_training.ipynb  # Google Colab training
├── config.py                 # Configuration
└── requirements.txt          # Dependencies

Features

  • Zero manual labeling - Automatic synthetic data generation
  • 4 outputs: aligned_conf, glare_conf, direction_x, direction_y
  • Lightweight model (~10-12 MB) - MobileNetV3
  • Fast inference (20-30ms on CPU)
  • Google Colab support (free GPU training)
  • Real-time processing (webcam/video)

How It Works

1. Synthetic Data Generation

  • Computer automatically creates 50,000+ training images
  • Applies random transformations (position, rotation, scale, glare)
  • Labels calculated automatically - no human annotation needed!

2. Model Training

  • MobileNetV3-based CNN with multi-task head
  • Trains on Google Colab (free T4 GPU)
  • 2-3 hours training time

3. Inference

  • Real-time document analysis
  • Direction guidance ("Move LEFT", "Move RIGHT", etc.)
  • Glare warnings ("AVOID REFLECTIONS")
  • Auto-capture ready when aligned

Model Outputs

  1. aligned_conf (0-1): Document alignment confidence
  2. glare_conf (0-1): Glare/reflection confidence
  3. direction_x (-1 to +1): Horizontal direction (negative = move right)
  4. direction_y (-1 to +1): Vertical direction (negative = move down)

Decision Logic

if glare_conf >= 0.5:
    → "⚠️ AVOID REFLECTIONS"
elif aligned_conf > 0.8 and glare_conf < 0.2:
    → "✓ HOLD STEADY - Ready to capture"
else:
    → Show direction guidance

Configuration

All settings in config.py:

  • Model architecture options
  • Training hyperparameters
  • Data generation parameters
  • Inference thresholds

Requirements

  • Python 3.8+
  • TensorFlow 2.13+
  • OpenCV 4.8+
  • 8GB RAM (16GB recommended for training)
  • GPU optional (use Colab if no local GPU)

Time Estimates

Phase Duration
Download documents 30-60 min (optional)
Generate synthetic data 2-4 hours
Train model (Colab GPU) 2-3 hours
Test & validate 30 min
Total 6-10 hours

Next Steps After Training

  1. Test on 10-20 real document images
  2. Validate accuracy and performance
  3. Fine-tune if needed
  4. Convert to TFLite for mobile deployment
  5. Implement auto-capture logic

Support

For detailed instructions, see documentation in parent folder.


Status: Ready for execution
Version: 0.1.0 (Prototype)
Last Updated: 2025-11-11

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors