🗣️ Sign2Sound Euphoria

Bridging the gap between Sign Language and Spoken English with Real-Time, Edge-Computed AI.

📖 Overview

Sign2Sound Euphoria is a bi-directional Sign Language Translation system designed to run entirely on consumer-grade hardware (Offline-First). It eliminates the need for expensive cloud APIs or heavy server-grade GPUs.

By utilizing a novel Dual-Expert Graph Neural Network (ST-GCN) architecture, the system distinguishes between Dynamic Words (WLASL) and Static Finger-Spelling (ASL) in real-time. It integrates a Small Language Model (SLM) to correct raw glosses into grammatically natural English sentences.

🚀 Key Innovations

Dual-Expert Routing: Separate specialized models for Spelling vs. Signing to eliminate the "Hold vs. Letter" confusion.
Edge-Optimized: Runs at 22+ FPS on a laptop RTX 3050 (4GB VRAM).
Hybrid Pipeline: Combines Vision (ST-GCN) + Language (SLM) for context-aware translation.
Privacy First: Zero data leaves the device; fully offline execution.

🛠️ System Architecture

The pipeline processes video input in four distinct stages:

Skeletal Extraction:
- Tool: Google MediaPipe Holistic.
- Data: Extracts 109 Keypoints (Body, Hands, Face) per frame.
- Normalization: Relative Nose-Centric Alignment (invariant to user position).
Dual-Expert Inference (ST-GCN):
- Expert A (WLASL): Tracks temporal motion for dynamic words (e.g., "Mother", "Eat").
- Expert B (ASL): Recognizes static spatial features for finger-spelling (e.g., "A-D-A-M").
Grammar Correction (SLM):
- Input: Raw Glosses (e.g., "Who Eat Now").
- Model: Quantized Microsoft Phi-2 / DistilGPT-2.
- Output: Natural English (e.g., "Who is eating now?").
Vocalization (Coming Soon):
- Engine: KokoroTTS (High-fidelity, <80ms latency).

📊 Performance Metrics

We evaluated the system on a held-out test set (20% split) using an ASUS TUF A15 (RTX 3050).

Dataset / Task	Accuracy	F1-Score	Latency
ASL Letters (Static)	99.04%	0.99	45ms
WLASL-100 (Dynamic)	92.05%	0.91	45ms
End-to-End Pipeline	N/A	N/A	~22 FPS

Note: Training graphs and confusion matrices are available in the results/ directory.

📦 Installation

Prerequisites

Python 3.10+
NVIDIA GPU (Recommended) or CPU
Webcam

Setup

Clone the Repository

git clone [https://github.com/yourusername/Sign2Sound-Euphoria.git](https://github.com/yourusername/Sign2Sound-Euphoria.git)
cd Sign2Sound-Euphoria

Install Dependencies
```
pip install -r requirements.txt
```
Download Models
- Place stgcn_wlasl100_final.pth in models/.
- Place stgcn_letters_scratch.pth in models/. (Pre-trained weights link to come)

💻 Usage / Demos

1. The "Final Pipeline" Demo (Words + Grammar)

Runs the full stack: Video -> Gloss -> SLM Correction.

python inference/final_pipeline.py

Input: Sequence of videos (e.g., who.mp4, eat.mp4, now.mp4).
Output: [SLM]: Who is eating now?

📂 Dataset Information

We utilized a Split-Dataset Strategy to solve class imbalance and confusion:

IEEE DataPort ASL Dataset: Used for training the static Spelling Expert (Filtered to ~200 samples/class).
WLASL (World Level ASL): Used top 100 classes for the Dynamic Word Expert.

Access: Dataset composition details available here.

🔮 Future Roadmap

KokoroTTS Integration: Replace text output with natural voice synthesis.
Streaming Decoder: Optimize SLM to decode tokens asynchronously for lower latency.
Mobile Port: Quantize models for deployment on Android/iOS via TFLite.

👥 Team

Roshan Robin - AI Engineer & Architecture
Jayalakshmy Jayakrishnan - Data Processing & Evaluation
Nima Fathima - Data Processing & Evaluation
Sakhil N Maju - Frontend & Integration

📜 License

Distributed under the MIT License. See LICENSE for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗣️ Sign2Sound Euphoria

📖 Overview

🚀 Key Innovations

🛠️ System Architecture

📊 Performance Metrics

📦 Installation

Prerequisites

Setup

💻 Usage / Demos

1. The "Final Pipeline" Demo (Words + Grammar)

📂 Dataset Information

🔮 Future Roadmap

👥 Team

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
checkpoints		checkpoints
data		data
docs		docs
features		features
inference		inference
models		models
preprocessing		preprocessing
results		results
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🗣️ Sign2Sound Euphoria

📖 Overview

🚀 Key Innovations

🛠️ System Architecture

📊 Performance Metrics

📦 Installation

Prerequisites

Setup

💻 Usage / Demos

1. The "Final Pipeline" Demo (Words + Grammar)

📂 Dataset Information

🔮 Future Roadmap

👥 Team

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages