Skip to content

HansQueja/ocr-cnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optical Character Recognition: Raw CNN Implementation from Scratch

This project is a low-level implementation of a Convolutional Neural Network (CNN) built entirely from the ground up using only NumPy. This is my raw dive towards the intricacies of CNN, where I also tried to implement the mathematical theories behind the gradient descent and even the convolutions.

Technical Specifications

  • Input Dimension: pixel grayscale inputs representing handwritten characters (A–Z and 0–9).
  • Kernel Operations: Supports varying kernel counts (1, 3, 8, 16) to explore feature extraction depth.
  • Pooling Layers: Custom-built max-pooling logic with options for different stride/pooling configurations to achieve spatial invariance.
  • Data Pipeline: Manually designed training pipeline with ASCII code mapping and custom activation functions.

Mathematical Foundation

The model updates its weights via a manually implemented gradient descent. For a given loss function, the weight update for a kernel is calculated as:

Where represents the learning rate and is the partial derivative of the loss with respect to the kernel weights, computed via the chain rule across the pooling and convolutional layers.

Usage and Simulation

The system is designed with a command-line interface (CLI) to allow for rapid experimentation with different hyperparameters.

Validation and Verification

image

The training loss graph shows the model successfully learned the OCR patterns. Starting at 4.1, the loss steadily decreased to approximately 0.4 by epoch 400. The smooth curve, free of fluctuations, indicates stable training with good convergence and no overfitting.

image

The graph visualizes the model's learning progression. Starting near 0% accuracy, the model steadily improved in a stepwise pattern, reaching approximately 40% by epoch 100, then continued to climb through several accuracy plateaus. By epoch 250, the accuracy achieved around 95% and stabilized near 100% by epoch 300, maintaining this high performance through epoch 400.

CNN Training

image

OCR-CNN Predicting Accuracy: Max Pooling - 5 Kernels

image

OCR-CNN Predicting Accuracy: Mean Pooling - 5 Kernels

image

OCR-CNN Predicting Accuracy: No Pooling - 5 Kernels

image

OCR-CNN Predicting Accuracy: No Pooling - 8 Kernels

Command Template

python main.py --method [train, predict] --epochs [integer] --pooling [0-2] --kernel [1,3,8,16]

Examples

  • To Train: Execute 1,000 epochs with pooling enabled to optimize feature reduction.
python main.py --method train --epochs 1000 --pooling 1
  • To Predict: Test the model's inference using 8 kernels without a pooling layer.
python main.py --method predict --pooling 0 --kernel 8

About

A low-level implementation of a Convolutional Neural Network (CNN) built entirely from the ground up using only NumPy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages