Sign Language Detection using ANN & CNN

A comparative study of Artificial Neural Networks (ANN) vs Convolutional Neural Networks (CNN) for sign language recognition on the Sign MNIST dataset - part of a final-year B.Tech project.

Overview • Dataset • ANN vs CNN • Architectures • Installation • Comparison

📖 Overview

This project compares two deep learning approaches for American Sign Language (ASL) letter recognition:

ANN (Artificial Neural Network) - A fully-connected feedforward network that treats images as flat pixel vectors
CNN (Convolutional Neural Network) - A convolutional architecture that preserves spatial structure and learns hierarchical visual features

Both models are trained on the Sign MNIST dataset to classify hand gesture images into 24 ASL letter classes (A–Y, excluding J and Z which require motion).

Companion Repository: This is the baseline comparison. The object detection approach using YOLOv5 can be found in Sign-Language-Detection-Using-YOLO-V5.

📊 Dataset

Sign MNIST

The Sign Language MNIST dataset is a drop-in replacement for the classic MNIST, formatted identically but with hand gesture images instead of handwritten digits.

Property	Value
Image Size	28 × 28 pixels (grayscale)
Training Samples	27,455 images
Test Samples	7,172 images
Classes	24 letters (A–Y, excluding J & Z)
Format	CSV (label + 784 pixel values per row)
Source	Kaggle Sign Language MNIST

Why exclude J and Z?

The letters J and Z require motion to perform - J involves a downward arc and Z traces a zigzag. Since single static images can't capture motion, they're excluded from this image classification task.

Gesture Mapping

A=0  B=1  C=2  D=3  E=4  F=5  G=6  H=7  I=8
K=10 L=11 M=12 N=13 O=14 P=15 Q=16 R=17 S=18
T=19 U=20 V=21 W=22 X=23 Y=24

(J=9 and Z=25 are excluded - they require motion)

🧠 ANN vs CNN

Aspect	ANN (Fully Connected)	CNN (Convolutional)
Input handling	Flattens 28×28 → 784-pixel vector	Keeps 28×28×1 spatial structure
Feature learning	Learns from raw pixel values	Learns spatial features (edges, textures, shapes)
Parameters	Many (every pixel to every neuron)	Fewer (shared convolutional filters)
Translation invariance	None - position matters	Built-in - same gesture anywhere
Best for	Simple patterns, tabular data	Images, spatial data
Expected accuracy	~70–80%	~90–95%+

🏗 Model Architectures

ANN Architecture (ANN.ipynb)

Input (784 pixels - flattened 28×28)
        │
   Dense(128/256, activation='relu')     ← Hidden Layer 1
        │
   Dense(128, activation='relu')         ← Hidden Layer 2
        │
   Dense(24, activation='softmax')       ← Output (24 ASL classes)

The ANN treats each image as a flat vector of 784 numbers. It has no concept of pixel neighborhoods - pixel (1,1) and pixel (28,28) are treated equally.

CNN Architecture (CNN.ipynb)

Input (28 × 28 × 1 grayscale image)
        │
   Conv2D(32, 3×3, activation='relu')   ← Edge/texture detection
   MaxPooling2D(2×2)                     ← Spatial downsampling
        │
   Conv2D(64, 3×3, activation='relu')   ← Shape detection
   MaxPooling2D(2×2)                     ← Further downsampling
        │
   Flatten()                             ← Convert to 1D
        │
   Dense(128, activation='relu')         ← Classification
        │
   Dense(24, activation='softmax')       ← Output (24 ASL classes)

The CNN preserves 2D spatial structure. Convolutional layers learn local patterns (edges → textures → hand shapes), and pooling layers progressively reduce spatial dimensions.

🛠 Tech Stack

Category	Technology	Purpose
Language	Python 3.7+	Core implementation
Deep Learning	TensorFlow / Keras	Model building and training
Data	NumPy, Pandas	Data loading and preprocessing
Visualization	Matplotlib	Training curves and results
Notebook	Jupyter	Interactive development
Dataset	Sign MNIST (CSV)	28×28 hand gesture images

🚀 Installation

# Clone the repository
git clone https://github.com/zishnusarker/Sign-language-Detection-Using-ANN-CNN.git
cd Sign-language-Detection-Using-ANN-CNN

# Create virtual environment (recommended)
python -m venv venv
venv\Scripts\activate       # Windows
source venv/bin/activate    # Linux/macOS

# Install dependencies
pip install tensorflow numpy pandas matplotlib jupyter scikit-learn

📋 Usage

Run the ANN Model

jupyter notebook ANN.ipynb

Run all cells to: load data → flatten images to 784-d vectors → build dense network → train → evaluate

Run the CNN Model

jupyter notebook CNN.ipynb

Run all cells to: load data → reshape to 28×28×1 → build Conv2D network → train → evaluate and compare

📁 Project Structure

Sign-language-Detection-Using-ANN-CNN/
├── README.md                    # Project documentation
├── ANN.ipynb                    # ANN implementation notebook
├── CNN.ipynb                    # CNN implementation notebook
├── sign_mnist_train.csv         # Training data (27,455 samples)
├── sign_mnist_test.csv          # Test data (7,172 samples)
└── .gitattributes               # Git config

📊 Full Comparison: ANN vs CNN vs YOLOv5

This repo is part of a three-way comparison conducted as a 7th semester final-year project:

Model	Repository	Approach	Best For
ANN	This repo	Flat pixel classification	Baseline benchmark
CNN	This repo	Spatial feature learning	Static image classification
YOLOv5	Sign-Language-Detection-Using-YOLO-V5	Object detection + localization	Real-time webcam detection

Key Findings

ANN establishes a baseline - raw pixel classification has fundamental limitations for image tasks
CNN proves the power of spatial feature learning - convolutional layers extract patterns dense layers cannot
YOLOv5 adds real-time detection, bounding boxes, and multi-sign recognition

🎓 Key Concepts Demonstrated

Why do CNNs outperform ANNs on images?

ANNs treat every pixel independently - pixel (1,1) has no relationship to pixel (1,2). But images have spatial structure: neighboring pixels form edges, edges form textures, textures form shapes. CNNs exploit this with local receptive fields, weight sharing, translation invariance, and hierarchical feature learning.

What is MaxPooling and why is it used?

MaxPooling reduces spatial dimensions by taking the maximum value in each patch (e.g., 2×2). It reduces computational cost, provides slight translation invariance, and forces the network to learn increasingly abstract features.

Why Softmax for 24-class output?

Softmax converts raw outputs into a probability distribution across all 24 classes where probabilities sum to 1.0. For multi-class classification, it gives interpretable results like "85% A, 10% B, 5% C."

Why exclude J and Z from the dataset?

J and Z are dynamic signs requiring motion (J traces a curve, Z traces a zigzag). Static images can't capture this. Recognizing them would require temporal models (LSTMs or video-based CNNs).

🔮 Future Improvements

Add confusion matrix and classification report for both models
Implement data augmentation to improve generalization
Try deeper architectures (ResNet, VGG via transfer learning)
Add dropout regularization to reduce overfitting
Deploy as a web app using Streamlit or Flask
Extend to video-based recognition to include J and Z
Add real-time webcam prediction
Visualize learned convolutional filters

📄 License

This project is available as open source. Understanding why architecture matters in deep learning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sign Language Detection using ANN & CNN

📖 Overview

📊 Dataset

Sign MNIST

Why exclude J and Z?

Gesture Mapping

🧠 ANN vs CNN

🏗 Model Architectures

ANN Architecture (ANN.ipynb)

CNN Architecture (CNN.ipynb)

🛠 Tech Stack

🚀 Installation

📋 Usage

Run the ANN Model

Run the CNN Model

📁 Project Structure

📊 Full Comparison: ANN vs CNN vs YOLOv5

Key Findings

🎓 Key Concepts Demonstrated

🔮 Future Improvements

📄 License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
ANN.ipynb		ANN.ipynb
CNN.ipynb		CNN.ipynb
README.md		README.md
sign_mnist_test.csv		sign_mnist_test.csv
sign_mnist_train.csv		sign_mnist_train.csv

Folders and files

Latest commit

History

Repository files navigation

Sign Language Detection using ANN & CNN

📖 Overview

📊 Dataset

Sign MNIST

Why exclude J and Z?

Gesture Mapping

🧠 ANN vs CNN

🏗 Model Architectures

ANN Architecture (ANN.ipynb)

CNN Architecture (CNN.ipynb)

🛠 Tech Stack

🚀 Installation

📋 Usage

Run the ANN Model

Run the CNN Model

📁 Project Structure

📊 Full Comparison: ANN vs CNN vs YOLOv5

Key Findings

🎓 Key Concepts Demonstrated

🔮 Future Improvements

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages