A real-time American Sign Language (ASL) recognition application powered by Computer Vision and Deep Learning. This desktop application detects hand gestures via webcam and translates them into text instantly using MediaPipe and TensorFlow.
- Real-Time Detection - Instant feedback using MediaPipe's efficient hand tracking
- High Accuracy - Custom-trained neural network for classifying ASL alphabets (A-Y, excluding dynamic gestures J/Z)
- Visual Feedback - Confidence meter, FPS counter, and letter confirmation animations
- Offline Capability - Fully functional without an internet connection after initial setup
- User-Friendly Interface - Clean OpenCV-based GUI with word construction features
- Extensible - Includes scripts for data collection and retraining the model on your own dataset
- Python 3.8 or higher
- A working webcam
- pip (Python package manager)
-
Clone the Repository
git clone https://github.com/AmanSinghNp/ASL-Interpreter-AI.git cd ASL-Interpreter-AI -
Create a Virtual Environment (recommended)
python -m venv venv # Windows venv\Scripts\activate # macOS/Linux source venv/bin/activate
-
Install Dependencies
pip install -r requirements.txt
python asl_app.py| Key | Action |
|---|---|
| Space | Add a space to the current word |
| Backspace | Delete the last character |
| C | Clear the current word |
| Q | Quit the application |
Tip: Ensure your hand is clearly visible to the camera with good lighting for best results.
The model and training data are generated artifacts and are not intended to be committed to Git. You can train your own model to improve accuracy or customize for your needs.
Use the interactive data collection tool to record hand landmarks:
python -m scripts.data_collectionFollow the on-screen instructions to record samples for each ASL letter.
If you have an ASL image dataset organized in folders by letter:
python -m scripts.process_dataset --dataset_dir path/to/your/datasetOnce you have data in asl_data.csv, train the neural network:
python -m scripts.train_modelTraining Options:
python -m scripts.train_model --epochs 100 --batch_size 64The trained model will be saved to saved_model/asl_model/.
The label list will be saved to saved_model/classes.txt (used by the app for correct index→label mapping).
ASL-Interpreter-AI/
├── asl_app.py # Main desktop application
├── utils.py # Shared utilities and configuration
├── asl_data.csv # Training data (generated)
├── requirements.txt # Python dependencies
├── LICENSE # MIT License
├── saved_model/ # Generated TensorFlow model + classes.txt (not committed)
│ └── asl_model/
└── scripts/ # Utility scripts
├── data_collection.py # Interactive data gathering tool
├── process_dataset.py # Batch image processing
├── train_model.py # Model training script
└── README.md # Scripts documentation
| Technology | Purpose |
|---|---|
| MediaPipe | Hand landmark detection |
| TensorFlow | Neural network training and inference |
| OpenCV | Image processing and GUI |
| scikit-learn | Data preprocessing and metrics |
The classifier is a feed-forward neural network:
- Input: 42 features (21 x-coordinates + 21 y-coordinates, normalized)
- Hidden Layers: Dense(128) → Dense(64) → Dense(32) with BatchNorm and Dropout
- Output: Softmax over 24 classes (A-Y, excluding J and Z)
Problem: "Could not open webcam" error
Solutions:
- Ensure your webcam is connected and not in use by another application
- Try a different camera by changing
cv2.VideoCapture(0)tocv2.VideoCapture(1)inasl_app.py - Check camera permissions in your system settings
Problem: "Error loading model" message
Solutions:
- Ensure the model exists at
saved_model/asl_model/ - Run
python -m scripts.train_modelto train a new model - Check that TensorFlow is properly installed:
python -c "import tensorflow as tf; print(tf.__version__)"
Solutions:
- Ensure good lighting conditions
- Keep your hand clearly visible and centered in frame
- Try retraining with more diverse data
- Adjust
CONFIDENCE_THRESHOLDinutils.py(default: 0.7)
Problem: Low FPS or laggy response
Solutions:
- Close other applications using the webcam
- Reduce webcam resolution if supported
- Ensure you're not running on a very low-powered machine
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- Add support for dynamic gestures (J, Z)
- Implement text-to-speech output
- Add word prediction/autocomplete
- Create a web-based version
- Improve the UI/UX design
This project is licensed under the MIT License - see the LICENSE file for details.
- ASL alphabet images from various open datasets
- MediaPipe team for the excellent hand tracking solution
- TensorFlow team for the machine learning framework