ASLVizNet is a real-time computer vision framework designed for recognizing static American Sign Language (ASL) alphabets and numbers using deep convolutional neural networks and transfer learning.
The system leverages the TensorFlow Object Detection API and SSD MobileNet v2 to perform bounding box localization and classification of hand gestures from live webcam input.
ASLVizNet was developed as a research-driven project and presented at the IACIT 2021 Conference, with publication in the International Journal of Advanced Research in Computer Science (IJARCS).
Traditional sign language translation systems:
- Depend on expensive sensor gloves
- Require specialized hardware
- Lack real-time responsiveness
- Provide limited accessibility
ASLVizNet proposes a low-cost, vision-based deep learning approach that:
- Uses only a webcam
- Performs real-time detection
- Achieves high accuracy (96β99%)
- Requires no wearable devices
Webcam Input (OpenCV)
β
Image Annotation (LabelImg - XML)
β
XML β TFRecord Conversion
β
TensorFlow Object Detection API
β
SSD MobileNet v2 (Transfer Learning)
β
Real-Time Detection with Bounding Box + Confidence Score
- Model: SSD MobileNet v2
- Framework: TensorFlow Object Detection API
- Approach: Transfer Learning
- Detection Type: Object Detection (Bounding Box + Classification)
- Lightweight architecture
- Optimized for real-time inference
- Efficient for low-compute environments
- Strong balance between speed and accuracy
The complete dataset (gesture images + annotations) is available here:
π Google Drive Dataset Link
https://drive.google.com/drive/folders/1_vZt3Jn-JPQU5viHmGyGMdwQshuFqZOT?usp=sharing
The dataset contains:
- ASL Alphabets (AβZ)
- Numbers (0β9)
- XML annotation files (LabelImg format)
- Images used for training
ASLVizNet/
β
βββ annotations/ # XML files from LabelImg
βββ images/ # Gesture images
βββ training/ # Model checkpoints
βββ exported-model/ # Final exported model
β
βββ ImageCapture.ipynb # Dataset capture notebook
βββ MainCode.ipynb # Real-time detection notebook
βββ generate_tfrecord.py # XML β TFRecord converter
βββ label_map.pbtxt # Class label definitions
βββ pipeline.config # Training configuration
βββ README.md
- Python 3.7
- TensorFlow 2.4.1
- CUDA (Optional for GPU acceleration)
pip install tensorflow==2.4.1
pip install opencv-python
pip install pandas numpy pillow lxmlgit clone https://github.com/tensorflow/models.git
cd models/research
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
pip install .- Download dataset from Google Drive.
- Place images inside
/images - Place XML files inside
/annotations - Ensure
label_map.pbtxtcontains correct class mappings.
python generate_tfrecord.py \
-x annotations \
-l label_map.pbtxt \
-o train.record \
-i imagesThis script:
- Parses XML files
- Converts annotations to TFRecord format
- Maps labels using
.pbtxt - Optionally generates CSV file
python model_main_tf2.py \
--pipeline_config_path=training/pipeline.config \
--model_dir=training/ \
--alsologtostderr- Training Steps: 10,000 epochs
- Final Training Loss: 0.086
- Real-Time Accuracy: 96β99%
python exporter_main_v2.py \
--input_type image_tensor \
--pipeline_config_path training/pipeline.config \
--trained_checkpoint_dir training/ \
--output_directory exported-modelpython MainCode.ipynbOR open notebook and run all cells.
Webcam will activate and display:
- Bounding box
- Predicted ASL character
- Confidence score
| Metric | Value |
|---|---|
| Training Epochs | 10,000 |
| Final Loss | 0.086 |
| Real-Time Accuracy | 96% β 99% |
| Detection Output | Bounding Box + Confidence Score |
The system successfully performs real-time gesture detection with high confidence prediction scores.
Presented at:
IACIT 2021 Conference
Published in:
International Journal of Advanced Research in Computer Science (IJARCS)
βSign Language Recognition using Convolutional Neural Networks in Machine Learningβ, IJARCS, Vol. 12, pp. 16β20, Aug. 2021.
DOI: 10.26483/ijarcs.v12i0.6713
- Computer Vision
- Deep Learning
- TensorFlow Object Detection API
- Transfer Learning
- Dataset Engineering
- TFRecord Pipeline Development
- Real-Time ML Deployment
- Research Publication & Presentation
Model checkpoints and dataset files are not included in the repository due to GitHub size limits. Please use the provided dataset link and training instructions to reproduce results.
Developed as a research-driven computer vision framework integrating deep learning and real-time detection for assistive communication systems.