Deep Learning Project · University of Tennessee, Knoxville · Apr 2026 – May 2026
This repository implements multi-digit number recognition on the Street View House Numbers (SVHN) dataset using TensorFlow/Keras. The project trains a model for variable-length numbers in real-world images with the highest possible accuracy; k+1 softmax outputs for sequences up to k digits).
We built a deep learning pipeline to recognize variable-length multi-digit numbers in SVHN images (33,402 training / 13,068 test per assignment spec). Unlike MNIST, SVHN images have variable dimensions and multiple digits per image; we iterate through three model versions to move past an early ~50% digit-accuracy ceiling toward strong whole-number accuracy.
Presentation: See Multi-Digit_Number_Recognition_Deck.pdf (team deck: approach, training curves, results, challenges, lessons learned).
| Requirement | How we address it |
|---|---|
| Variable image size | Cropping + resize to fixed tensors in preprocessing and training |
| Label convention | SVHN: digit 0 → class 10, digits 1–9 → 1–9; null positions use label 0 |
| Multi-digit objective | Up to 5 digits → 1 length head + 5 digit heads (6 softmax outputs total), consistent with k+1 heads for max length k |
| Deliverables (assignment) | TensorFlow .py code; presentation (PDF deck linked above) covering performance, what worked / didn’t, training details, challenges |
| Path | Description |
|---|---|
README.md |
Project overview, rubric alignment, setup, and results (this file). |
preprocess_svhn.py |
Bounding-box cropping, label assembly, writes NumPy bundles under preprocessed/. |
model_svhn_v3.py |
Best-performing model (V3): EfficientNetB0, multi-head outputs, 3-phase fine-tuning. |
Multi-Digit_Number_Recognition_Deck.pdf |
Team slides (approach, training, results, challenges). |
Snapshot_of_Model_Performance.png |
Snapshot of final model performance metrics (per-head + whole-number accuracy). |
assignment.pdf (optional) |
Course Assignment 3 rubric — add if you want grading criteria visible on GitHub. |
.
├── README.md
├── assignment.pdf # optional
├── Multi-Digit_Number_Recognition_Deck.pdf # optional
├── preprocess_svhn.py
├── model_svhn_v3.py
├── train/ # from SVHN.zip — gitignored
│ ├── *.png
│ └── digitStruct.json
├── test/
│ ├── *.png
│ └── digitStruct.json
└── preprocessed/ # generated — gitignored
├── train/
│ ├── images.npy
│ ├── labels.npy
│ ├── filenames.json
│ └── *.png # cropped thumbnails (optional saves)
└── test/
├── images.npy
├── labels.npy
└── filenames.json
| File | Role |
|---|---|
preprocess_svhn.py |
Loads train/test folders + digitStruct.json, crops to a tight box around all digit bounding boxes (with padding), sorts digits left-to-right, builds label vectors [length, d1, …, d5], resizes and saves preprocessed/ artifacts (images.npy, labels.npy). |
model_svhn_v3.py |
v3 model: EfficientNetB0 backbone, shared dense trunk (batch norm + dropout), 6 softmax heads (length + five digit positions), 3-phase fine-tuning, augmentation, label smoothing, gradient clipping, metrics and checkpoints. |
- Backbone: EfficientNetB0 pretrained on ImageNet (transfer learning).
- Trunk: Shared dense layer + BatchNormalization + ReLU + Dropout.
- Outputs: One sequence-length softmax (6 classes: lengths 0–5) plus five digit-position heads (11 classes each: null + digits 1–9 + “0” as class 10).
- Objective: Structured multi-output learning so the model predicts a full number (length + per-position digits), not a single MNIST-style digit.
Three-phase fine-tuning (model_svhn_v3.py):
- Phase 1: Frozen backbone - warm up classification heads at higher LR.
- Phase 2: Unfreeze full model - fine-tune end-to-end at low LR (main unlock vs frozen-only training).
- Phase 3: Very low LR polish for stability.
Additional Techniques:
- Bounding-box cropping in preprocessing (reduces background clutter).
- Data augmentation: brightness, contrast, saturation, hue, random crop (after resize/crop pipeline in
tf.data). - Label smoothing on digit heads (reduces overconfidence on null digit slots).
- Gradient clipping (
clipnorm=1.0) for stable backbone fine-tuning. - Callbacks: checkpointing, reduce LR on plateau, early stopping (as implemented in script).
| Metric | Value |
|---|---|
| Whole-number accuracy | 82.31% |
| Length head | ~96.8% |
| Digit positions (d1–d5) | ~92.6% – 100% per-head accuracy (see deck / training logs) |
Version progression (deck): V1 24% → V2 29% → V3 82% whole-number accuracy - showing that architecture + preprocessing + staged fine-tuning dominated incremental tweaks alone.
- Bounding-box cropping dramatically reduced background noise and improved usable signal.
- Fine-tuning the full backbone (not leaving it frozen forever) was critical to break past the mid-accuracy plateau.
- Per-digit accuracy can mislead: the product metric is entire sequence correct (whole-number accuracy).
- Label smoothing helped calibration on null digit positions.
- A large share of remaining errors (~82% in our error analysis) came from digit misclassification, not length prediction - aligning effort with digit heads and harder examples.
| Deliverable | Location |
|---|---|
| TensorFlow code | preprocess_svhn.py, model_svhn_v3.py |
| Presentation (performance, attempts, training, challenges) | Multi-Digit_Number_Recognition_Deck.pdf |
| Live demo | Target ≤ 5 minutes; test AV/setup before class |
Python · TensorFlow / Keras · EfficientNetB0 · NumPy · Pillow · scikit-learn · Matplotlib · VS Code · GitHub
Python · TensorFlow · Deep Learning · Convolutional Neural Networks · Transfer Learning · Computer Vision · Data Preprocessing · Multi-output / structured prediction · Model tuning & evaluation
Course project associated with University of Tennessee, Knoxville.
Team (presentation deck): Katie Watts, Jazmin Elias, Carys Van Atta, Olivia Helms.
- Assignment: Multi-digit Number Recognition (SVHN,
digitStruct, k+1 softmax heads). - Goodfellow et al., Multi-digit Number Recognition with Space-Fractional Maxout Conv Nets (arXiv:1312.6082) — multi-head formulation context.
- TensorFlow image loading: Load images,
image_dataset_from_directory.
