Skip to content

carysva/Multi-Digit_Number_Recognition

Repository files navigation

Multi-Digit Number Recognition on SVHN

Deep Learning Project · University of Tennessee, Knoxville · Apr 2026 – May 2026

This repository implements multi-digit number recognition on the Street View House Numbers (SVHN) dataset using TensorFlow/Keras. The project trains a model for variable-length numbers in real-world images with the highest possible accuracy; k+1 softmax outputs for sequences up to k digits).


Project Overview

We built a deep learning pipeline to recognize variable-length multi-digit numbers in SVHN images (33,402 training / 13,068 test per assignment spec). Unlike MNIST, SVHN images have variable dimensions and multiple digits per image; we iterate through three model versions to move past an early ~50% digit-accuracy ceiling toward strong whole-number accuracy.

Presentation: See Multi-Digit_Number_Recognition_Deck.pdf (team deck: approach, training curves, results, challenges, lessons learned).


Dataset & Assignment Specifications

Requirement How we address it
Variable image size Cropping + resize to fixed tensors in preprocessing and training
Label convention SVHN: digit 0 → class 10, digits 1919; null positions use label 0
Multi-digit objective Up to 5 digits → 1 length head + 5 digit heads (6 softmax outputs total), consistent with k+1 heads for max length k
Deliverables (assignment) TensorFlow .py code; presentation (PDF deck linked above) covering performance, what worked / didn’t, training details, challenges

Repository

What’s in this repo

Path Description
README.md Project overview, rubric alignment, setup, and results (this file).
preprocess_svhn.py Bounding-box cropping, label assembly, writes NumPy bundles under preprocessed/.
model_svhn_v3.py Best-performing model (V3): EfficientNetB0, multi-head outputs, 3-phase fine-tuning.
Multi-Digit_Number_Recognition_Deck.pdf Team slides (approach, training, results, challenges).
Snapshot_of_Model_Performance.png Snapshot of final model performance metrics (per-head + whole-number accuracy).
assignment.pdf (optional) Course Assignment 3 rubric — add if you want grading criteria visible on GitHub.

Layout (after download + preprocess)

.
├── README.md
├── assignment.pdf                    # optional
├── Multi-Digit_Number_Recognition_Deck.pdf   # optional
├── preprocess_svhn.py
├── model_svhn_v3.py
├── train/                            # from SVHN.zip — gitignored
│   ├── *.png
│   └── digitStruct.json
├── test/
│   ├── *.png
│   └── digitStruct.json
└── preprocessed/                     # generated — gitignored
    ├── train/
    │   ├── images.npy
    │   ├── labels.npy
    │   ├── filenames.json
    │   └── *.png                     # cropped thumbnails (optional saves)
    └── test/
        ├── images.npy
        ├── labels.npy
        └── filenames.json

Scripts at a Glance

File Role
preprocess_svhn.py Loads train/test folders + digitStruct.json, crops to a tight box around all digit bounding boxes (with padding), sorts digits left-to-right, builds label vectors [length, d1, …, d5], resizes and saves preprocessed/ artifacts (images.npy, labels.npy).
model_svhn_v3.py v3 model: EfficientNetB0 backbone, shared dense trunk (batch norm + dropout), 6 softmax heads (length + five digit positions), 3-phase fine-tuning, augmentation, label smoothing, gradient clipping, metrics and checkpoints.

Model Architecture (v3)

  • Backbone: EfficientNetB0 pretrained on ImageNet (transfer learning).
  • Trunk: Shared dense layer + BatchNormalization + ReLU + Dropout.
  • Outputs: One sequence-length softmax (6 classes: lengths 0–5) plus five digit-position heads (11 classes each: null + digits 1–9 + “0” as class 10).
  • Objective: Structured multi-output learning so the model predicts a full number (length + per-position digits), not a single MNIST-style digit.

Training Strategy

Three-phase fine-tuning (model_svhn_v3.py):

  1. Phase 1: Frozen backbone - warm up classification heads at higher LR.
  2. Phase 2: Unfreeze full model - fine-tune end-to-end at low LR (main unlock vs frozen-only training).
  3. Phase 3: Very low LR polish for stability.

Additional Techniques:

  • Bounding-box cropping in preprocessing (reduces background clutter).
  • Data augmentation: brightness, contrast, saturation, hue, random crop (after resize/crop pipeline in tf.data).
  • Label smoothing on digit heads (reduces overconfidence on null digit slots).
  • Gradient clipping (clipnorm=1.0) for stable backbone fine-tuning.
  • Callbacks: checkpointing, reduce LR on plateau, early stopping (as implemented in script).

Results (Best Model - v3)

Model performance snapshot

Metric Value
Whole-number accuracy 82.31%
Length head ~96.8%
Digit positions (d1–d5) ~92.6% – 100% per-head accuracy (see deck / training logs)

Version progression (deck): V1 24% → V2 29% → V3 82% whole-number accuracy - showing that architecture + preprocessing + staged fine-tuning dominated incremental tweaks alone.


Key Findings

  1. Bounding-box cropping dramatically reduced background noise and improved usable signal.
  2. Fine-tuning the full backbone (not leaving it frozen forever) was critical to break past the mid-accuracy plateau.
  3. Per-digit accuracy can mislead: the product metric is entire sequence correct (whole-number accuracy).
  4. Label smoothing helped calibration on null digit positions.
  5. A large share of remaining errors (~82% in our error analysis) came from digit misclassification, not length prediction - aligning effort with digit heads and harder examples.

Deliverables

Deliverable Location
TensorFlow code preprocess_svhn.py, model_svhn_v3.py
Presentation (performance, attempts, training, challenges) Multi-Digit_Number_Recognition_Deck.pdf
Live demo Target ≤ 5 minutes; test AV/setup before class

Tech Stack

Python · TensorFlow / Keras · EfficientNetB0 · NumPy · Pillow · scikit-learn · Matplotlib · VS Code · GitHub


Skills Demonstrated

Python · TensorFlow · Deep Learning · Convolutional Neural Networks · Transfer Learning · Computer Vision · Data Preprocessing · Multi-output / structured prediction · Model tuning & evaluation


Team & Credits

Course project associated with University of Tennessee, Knoxville.
Team (presentation deck): Katie Watts, Jazmin Elias, Carys Van Atta, Olivia Helms.


References

  • Assignment: Multi-digit Number Recognition (SVHN, digitStruct, k+1 softmax heads).
  • Goodfellow et al., Multi-digit Number Recognition with Space-Fractional Maxout Conv Nets (arXiv:1312.6082) — multi-head formulation context.
  • TensorFlow image loading: Load images, image_dataset_from_directory.

About

Convolutional neural network trained on Street View House Number (SVHN) images to recognize variable-length multi-digit numbers in real-world photos. Built in TensorFlow with an EfficientNetB0 backbone, six softmax heads for the sequence length and per-digit position, and three-phase fine-tuning - reaching 82.3% whole-number accuracy.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages