Multi-Digit Number Recognition on SVHN

Deep Learning Project · University of Tennessee, Knoxville · Apr 2026 – May 2026

This repository implements multi-digit number recognition on the Street View House Numbers (SVHN) dataset using TensorFlow/Keras. The project trains a model for variable-length numbers in real-world images with the highest possible accuracy; k+1 softmax outputs for sequences up to k digits).

Project Overview

We built a deep learning pipeline to recognize variable-length multi-digit numbers in SVHN images (33,402 training / 13,068 test per assignment spec). Unlike MNIST, SVHN images have variable dimensions and multiple digits per image; we iterate through three model versions to move past an early ~50% digit-accuracy ceiling toward strong whole-number accuracy.

Presentation: See Multi-Digit_Number_Recognition_Deck.pdf (team deck: approach, training curves, results, challenges, lessons learned).

Dataset & Assignment Specifications

Requirement	How we address it
Variable image size	Cropping + resize to fixed tensors in preprocessing and training
Label convention	SVHN: digit `0` → class 10, digits `1`–`9` → 1–9; null positions use label 0
Multi-digit objective	Up to 5 digits → 1 length head + 5 digit heads (6 softmax outputs total), consistent with k+1 heads for max length k
Deliverables (assignment)	TensorFlow `.py` code; presentation (PDF deck linked above) covering performance, what worked / didn’t, training details, challenges

Repository

What’s in this repo

Path	Description
`README.md`	Project overview, rubric alignment, setup, and results (this file).
`preprocess_svhn.py`	Bounding-box cropping, label assembly, writes NumPy bundles under `preprocessed/`.
`model_svhn_v3.py`	Best-performing model (V3): EfficientNetB0, multi-head outputs, 3-phase fine-tuning.
`Multi-Digit_Number_Recognition_Deck.pdf`	Team slides (approach, training, results, challenges).
`Snapshot_of_Model_Performance.png`	Snapshot of final model performance metrics (per-head + whole-number accuracy).
`assignment.pdf` (optional)	Course Assignment 3 rubric — add if you want grading criteria visible on GitHub.

Layout (after download + preprocess)

.
├── README.md
├── assignment.pdf                    # optional
├── Multi-Digit_Number_Recognition_Deck.pdf   # optional
├── preprocess_svhn.py
├── model_svhn_v3.py
├── train/                            # from SVHN.zip — gitignored
│   ├── *.png
│   └── digitStruct.json
├── test/
│   ├── *.png
│   └── digitStruct.json
└── preprocessed/                     # generated — gitignored
    ├── train/
    │   ├── images.npy
    │   ├── labels.npy
    │   ├── filenames.json
    │   └── *.png                     # cropped thumbnails (optional saves)
    └── test/
        ├── images.npy
        ├── labels.npy
        └── filenames.json

Scripts at a Glance

File	Role
`preprocess_svhn.py`	Loads train/test folders + `digitStruct.json`, crops to a tight box around all digit bounding boxes (with padding), sorts digits left-to-right, builds label vectors `[length, d1, …, d5]`, resizes and saves `preprocessed/` artifacts (`images.npy`, `labels.npy`).
`model_svhn_v3.py`	v3 model: EfficientNetB0 backbone, shared dense trunk (batch norm + dropout), 6 softmax heads (length + five digit positions), 3-phase fine-tuning, augmentation, label smoothing, gradient clipping, metrics and checkpoints.

Model Architecture (v3)

Backbone: EfficientNetB0 pretrained on ImageNet (transfer learning).
Trunk: Shared dense layer + BatchNormalization + ReLU + Dropout.
Outputs: One sequence-length softmax (6 classes: lengths 0–5) plus five digit-position heads (11 classes each: null + digits 1–9 + “0” as class 10).
Objective: Structured multi-output learning so the model predicts a full number (length + per-position digits), not a single MNIST-style digit.

Training Strategy

Three-phase fine-tuning (model_svhn_v3.py):

Phase 1: Frozen backbone - warm up classification heads at higher LR.
Phase 2: Unfreeze full model - fine-tune end-to-end at low LR (main unlock vs frozen-only training).
Phase 3: Very low LR polish for stability.

Additional Techniques:

Bounding-box cropping in preprocessing (reduces background clutter).
Data augmentation: brightness, contrast, saturation, hue, random crop (after resize/crop pipeline in tf.data).
Label smoothing on digit heads (reduces overconfidence on null digit slots).
Gradient clipping (clipnorm=1.0) for stable backbone fine-tuning.
Callbacks: checkpointing, reduce LR on plateau, early stopping (as implemented in script).

Results (Best Model - v3)

Metric	Value
Whole-number accuracy	82.31%
Length head	~96.8%
Digit positions (d1–d5)	~92.6% – 100% per-head accuracy (see deck / training logs)

Version progression (deck): V1 24% → V2 29% → V3 82% whole-number accuracy - showing that architecture + preprocessing + staged fine-tuning dominated incremental tweaks alone.

Key Findings

Bounding-box cropping dramatically reduced background noise and improved usable signal.
Fine-tuning the full backbone (not leaving it frozen forever) was critical to break past the mid-accuracy plateau.
Per-digit accuracy can mislead: the product metric is entire sequence correct (whole-number accuracy).
Label smoothing helped calibration on null digit positions.
A large share of remaining errors (~82% in our error analysis) came from digit misclassification, not length prediction - aligning effort with digit heads and harder examples.

Deliverables

Deliverable	Location
TensorFlow code	`preprocess_svhn.py`, `model_svhn_v3.py`
Presentation (performance, attempts, training, challenges)	`Multi-Digit_Number_Recognition_Deck.pdf`
Live demo	Target ≤ 5 minutes; test AV/setup before class

Tech Stack

Python · TensorFlow / Keras · EfficientNetB0 · NumPy · Pillow · scikit-learn · Matplotlib · VS Code · GitHub

Skills Demonstrated

Python · TensorFlow · Deep Learning · Convolutional Neural Networks · Transfer Learning · Computer Vision · Data Preprocessing · Multi-output / structured prediction · Model tuning & evaluation

Team & Credits

Course project associated with University of Tennessee, Knoxville.
Team (presentation deck): Katie Watts, Jazmin Elias, Carys Van Atta, Olivia Helms.

References

Assignment: Multi-digit Number Recognition (SVHN, digitStruct, k+1 softmax heads).
Goodfellow et al., Multi-digit Number Recognition with Space-Fractional Maxout Conv Nets (arXiv:1312.6082) — multi-head formulation context.
TensorFlow image loading: Load images, image_dataset_from_directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Digit Number Recognition on SVHN

Project Overview

Dataset & Assignment Specifications

Repository

What’s in this repo

Layout (after download + preprocess)

Scripts at a Glance

Model Architecture (v3)

Training Strategy

Results (Best Model - v3)

Key Findings

Deliverables

Tech Stack

Skills Demonstrated

Team & Credits

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Multi-Digit_Number_Recognition_Deck.pdf		Multi-Digit_Number_Recognition_Deck.pdf
README.md		README.md
Snapshot_of_Model_Performance.png		Snapshot_of_Model_Performance.png
model_svhn_v3.py		model_svhn_v3.py
preprocess_svhn.py		preprocess_svhn.py

Folders and files

Latest commit

History

Repository files navigation

Multi-Digit Number Recognition on SVHN

Project Overview

Dataset & Assignment Specifications

Repository

What’s in this repo

Layout (after download + preprocess)

Scripts at a Glance

Model Architecture (v3)

Training Strategy

Results (Best Model - v3)

Key Findings

Deliverables

Tech Stack

Skills Demonstrated

Team & Credits

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages