Optical Character Recognition: Raw CNN Implementation from Scratch

This project is a low-level implementation of a Convolutional Neural Network (CNN) built entirely from the ground up using only NumPy. This is my raw dive towards the intricacies of CNN, where I also tried to implement the mathematical theories behind the gradient descent and even the convolutions.

Technical Specifications

Input Dimension: pixel grayscale inputs representing handwritten characters (A–Z and 0–9).
Kernel Operations: Supports varying kernel counts (1, 3, 8, 16) to explore feature extraction depth.
Pooling Layers: Custom-built max-pooling logic with options for different stride/pooling configurations to achieve spatial invariance.
Data Pipeline: Manually designed training pipeline with ASCII code mapping and custom activation functions.

Mathematical Foundation

The model updates its weights via a manually implemented gradient descent. For a given loss function, the weight update for a kernel is calculated as:

Where represents the learning rate and is the partial derivative of the loss with respect to the kernel weights, computed via the chain rule across the pooling and convolutional layers.

Usage and Simulation

The system is designed with a command-line interface (CLI) to allow for rapid experimentation with different hyperparameters.

Validation and Verification

The training loss graph shows the model successfully learned the OCR patterns. Starting at 4.1, the loss steadily decreased to approximately 0.4 by epoch 400. The smooth curve, free of fluctuations, indicates stable training with good convergence and no overfitting.

The graph visualizes the model's learning progression. Starting near 0% accuracy, the model steadily improved in a stepwise pattern, reaching approximately 40% by epoch 100, then continued to climb through several accuracy plateaus. By epoch 250, the accuracy achieved around 95% and stabilized near 100% by epoch 300, maintaining this high performance through epoch 400.

CNN Training

OCR-CNN Predicting Accuracy: Max Pooling - 5 Kernels

OCR-CNN Predicting Accuracy: Mean Pooling - 5 Kernels

OCR-CNN Predicting Accuracy: No Pooling - 5 Kernels

OCR-CNN Predicting Accuracy: No Pooling - 8 Kernels

Command Template

python main.py --method [train, predict] --epochs [integer] --pooling [0-2] --kernel [1,3,8,16]

Examples

To Train: Execute 1,000 epochs with pooling enabled to optimize feature reduction.

python main.py --method train --epochs 1000 --pooling 1

To Predict: Test the model's inference using 8 kernels without a pooling layer.

python main.py --method predict --pooling 0 --kernel 8

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
characters		characters
dataset		dataset
helpers		helpers
model		model
README.md		README.md
main.py		main.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optical Character Recognition: Raw CNN Implementation from Scratch

Technical Specifications

Mathematical Foundation

Usage and Simulation

Validation and Verification

CNN Training

Command Template

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Optical Character Recognition: Raw CNN Implementation from Scratch

Technical Specifications

Mathematical Foundation

Usage and Simulation

Validation and Verification

CNN Training

Command Template

Examples

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages