Skip to content

Stefanos0710/SignAI

Repository files navigation

SignAI

SignAI — Sign Language Translator

Real-time Sign Language recognition and gloss translation using deep learning


Python 3.8+ TensorFlow 2.16 Keras 3.7 MediaPipe PySide6 Flask

License Website Hosting Media Hackatime

SignAI is a real-time sign language recognition and translation system for German Sign Language (DGS). It uses a sequence-to-sequence model with multi-head attention, trained on MediaPipe Holistic keypoint features. The project won 1st place at the Jugend forscht state competition and received coverage in SZ, BR, and other media outlets.

Primary languages: Python (core, app), CSS/HTML/JavaScript (product website).


Table of Contents


Quick Start

Component Command
Desktop app cd app && python app.py
Web API (Flask + SocketIO, port 8000) python main.py
Flask API (port 5000) python -m api.signai_api
API client (background server + upload) python -c "import request; request.start('video.mp4')"
Product website cd product_webside && python main.py
pip install -r requirements.txt

Requirements

  • OS: Windows (primary). macOS/Linux support in development.
  • Hardware: Webcam for live recognition. GPU recommended; CPU-only supported but slower.
  • Disk: 5 GB minimum (models and caches require more).
  • Python: 3.8+
  • Core stack: TensorFlow 2.16.2 / Keras 3.7.0, MediaPipe 0.10.21, numpy 1.26.4, protobuf 4.25.8

Models & Training

Sentence Seq2Seq

Primary translation model — BiLSTM encoder + LSTM decoder with 8-head MultiHeadAttention.

python train-seq2seq.py

Configuration is at the bottom of train-seq2seq.py (defaults: version 38.4, 200 epochs, batch 64, multi_attention).

Key features:

  • Mixed precision training (mixed_float16 global policy)
  • Per-epoch WER, BLEU-1..4, ROUGE-1/2/L evaluation
  • Epoch-wise augmentation (temporal: stretch/warp/freeze/dropout; spatial: shift/scale/rotate/noise)
  • Transformer architecture also available in utils_experimental_train.py

Latest trained models:

Version Type Notes
v36 BiLSTM-Seq2Seq Latest internal version — June 2026
v30 Seq2Seq Latest public version — April 2026
v29 Seq2Seq 200+ epochs, full history
v28 Seq2Seq 200+ epochs, full history
  • Vocabulary: 800+ gloss tokens
  • Output length: Up to 15 tokens per sentence
  • Input features: 426 per frame (7 pose + 21 left hand + 21 right hand + 93 face landmarks, each x/y/z)

Single-Word Classifier

BiLSTM classifier using 150 features (pose + hands only, no face).

python train.py

Supports --rebuild-cache to force re-parsing of training CSVs.

Metric Value
Training accuracy 99.8%
Validation accuracy 98.7%
Architecture BiLSTM(64) → BiLSTM(32) → Dense(64) → Softmax

Trained on a compressed subset of PHOENIX-Weather-2014T. Performance improves significantly with the full dataset.

Training Data

Training CSVs are stored in data/train_data/. A parsed cache is kept at .parsed_cache.pkl — delete it or pass --rebuild-cache to re-parse. CSVs are git-ignored; only example_for_train_data.csv is tracked.


Preprocessing

MediaPipe Holistic is used for keypoint extraction.

Script Purpose Features Landmarks
preprocessing_train_data.py Training data 426 (×3 xyz) 7 pose + 42 hand + 93 face
api/preprocessing_live_data.py Live inference 151 (averaged) 543 landmarks × 2 (xy)

Normalization pipeline:

  1. Video-wise shoulder midpoint centering
  2. Shoulder-distance scaling
  3. Savitzky–Golay temporal smoothing (window 9, polyorder 2)
  4. Linear interpolation for missing keypoints

Architecture

Seq2Seq (multi_attention)

Encoder: Input(426) → Dense(1024) → LayerNorm → Dropout → DepthwiseConv1D → BiLSTM(512) → LayerNorm
Decoder: Embedding(256) → LayerNorm → LSTM(512) → LayerNorm → MultiHeadAttention(8 heads, residual) → Concat → Dense(512) → Dropout → LayerNorm → Dense(vocab, softmax)

Classifier

Input(150) → Masking → BiLSTM(64) → Dropout(0.2) → BiLSTM(32) → Dropout(0.2) → Dense(64, ReLU) → Dropout(0.2) → Dense(classes, softmax)

Desktop App (PySide6)

  • Workflow: Press Record → perform signs → press again → upload to API → display translation
  • Result display: QPlainTextEdit, hidden until ready, shows translation with optional debug info
  • Single-instance lock: TCP port 52391
  • Logging: stdout/stderr tee'd to logs/desktop_app.log
  • Settings: app/settings/settings.json
  • Path handling: resource_path() for bundled assets, writable_path() for per-user data (%LOCALAPPDATA%\SignAI\)
  • Qt fix: fix_qt_plugin_path() must run before any PySide6 import
  • User-site cleanup: User site-packages stripped from sys.path to avoid protobuf version conflicts
  • Build: PyInstaller spec at app/SignAI - Desktop.spec, output at build/SignAI - Desktop/SignAI - Desktop.exe

Build & Deploy

  • PyInstaller spec: app/SignAI - Desktop.spec — bundles models, tokenizers, UI, icons (pathex set to repo root)
  • Updater: app/start_updater.py, spec at app/SignAI - Updater.spec
  • API overrides: SIGNAI_MODEL_PATH / SIGNAI_MODEL for custom model path, SIGNAI_DISABLE_SITE_CLEANUP=1 to disable user-site cleanup

Known Issues

  • Camera feed: If no image appears, press "Switch Camera" repeatedly. Close other camera-using apps.
  • Admin privileges: Some operations may require elevation. Future releases will reduce this.
  • First-run delay: Models load from disk on first launch — wait a few seconds for the UI to become responsive.
  • Recognition quality: Degrades for casual or atypical signing. Addressed by planned augmentation and larger datasets.

Roadmap

  • Improve accuracy 3x via full datasets, larger compute, synthetic augmentation, and transformer architectures
  • Expand vocabulary to thousands of gloss tokens
  • Reduce admin access requirements
  • Natural language rendering (gloss → grammatical sentences)
  • Multilingual support (ASL planned)

Contributing

  1. Fork and create a branch: git checkout -b feat/my-change
  2. Add tests and documentation for changes
  3. Open a Pull Request with a clear description
  4. Do not commit large model binaries — use release assets

License

Non-commercial license. See LICENSE. Contact maintainers for alternative arrangements.


Contact

About

SignAI is an innovative sign language recognition system that uses artificial intelligence to interpret and translate sign language in real-time.

Topics

Resources

License

Stars

Watchers

Forks

Contributors