SignAI is a real-time sign language recognition and translation system for German Sign Language (DGS). It uses a sequence-to-sequence model with multi-head attention, trained on MediaPipe Holistic keypoint features. The project won 1st place at the Jugend forscht state competition and received coverage in SZ, BR, and other media outlets.
Primary languages: Python (core, app), CSS/HTML/JavaScript (product website).
- Quick Start
- Requirements
- Models & Training
- Preprocessing
- Architecture
- Desktop App
- Build & Deploy
- Known Issues
- Roadmap
- Contributing
- License
- Contact
| Component | Command |
|---|---|
| Desktop app | cd app && python app.py |
| Web API (Flask + SocketIO, port 8000) | python main.py |
| Flask API (port 5000) | python -m api.signai_api |
| API client (background server + upload) | python -c "import request; request.start('video.mp4')" |
| Product website | cd product_webside && python main.py |
pip install -r requirements.txt
- OS: Windows (primary). macOS/Linux support in development.
- Hardware: Webcam for live recognition. GPU recommended; CPU-only supported but slower.
- Disk: 5 GB minimum (models and caches require more).
- Python: 3.8+
- Core stack: TensorFlow 2.16.2 / Keras 3.7.0, MediaPipe 0.10.21, numpy 1.26.4, protobuf 4.25.8
Primary translation model — BiLSTM encoder + LSTM decoder with 8-head MultiHeadAttention.
python train-seq2seq.py
Configuration is at the bottom of train-seq2seq.py (defaults: version 38.4, 200 epochs, batch 64, multi_attention).
Key features:
- Mixed precision training (
mixed_float16global policy) - Per-epoch WER, BLEU-1..4, ROUGE-1/2/L evaluation
- Epoch-wise augmentation (temporal: stretch/warp/freeze/dropout; spatial: shift/scale/rotate/noise)
- Transformer architecture also available in
utils_experimental_train.py
Latest trained models:
| Version | Type | Notes |
|---|---|---|
| v36 | BiLSTM-Seq2Seq | Latest internal version — June 2026 |
| v30 | Seq2Seq | Latest public version — April 2026 |
| v29 | Seq2Seq | 200+ epochs, full history |
| v28 | Seq2Seq | 200+ epochs, full history |
- Vocabulary: 800+ gloss tokens
- Output length: Up to 15 tokens per sentence
- Input features: 426 per frame (7 pose + 21 left hand + 21 right hand + 93 face landmarks, each x/y/z)
BiLSTM classifier using 150 features (pose + hands only, no face).
python train.py
Supports --rebuild-cache to force re-parsing of training CSVs.
| Metric | Value |
|---|---|
| Training accuracy | 99.8% |
| Validation accuracy | 98.7% |
| Architecture | BiLSTM(64) → BiLSTM(32) → Dense(64) → Softmax |
Trained on a compressed subset of PHOENIX-Weather-2014T. Performance improves significantly with the full dataset.
Training CSVs are stored in data/train_data/. A parsed cache is kept at .parsed_cache.pkl — delete it or pass --rebuild-cache to re-parse. CSVs are git-ignored; only example_for_train_data.csv is tracked.
MediaPipe Holistic is used for keypoint extraction.
| Script | Purpose | Features | Landmarks |
|---|---|---|---|
preprocessing_train_data.py |
Training data | 426 (×3 xyz) | 7 pose + 42 hand + 93 face |
api/preprocessing_live_data.py |
Live inference | 151 (averaged) | 543 landmarks × 2 (xy) |
Normalization pipeline:
- Video-wise shoulder midpoint centering
- Shoulder-distance scaling
- Savitzky–Golay temporal smoothing (window 9, polyorder 2)
- Linear interpolation for missing keypoints
Encoder: Input(426) → Dense(1024) → LayerNorm → Dropout → DepthwiseConv1D → BiLSTM(512) → LayerNorm
Decoder: Embedding(256) → LayerNorm → LSTM(512) → LayerNorm → MultiHeadAttention(8 heads, residual) → Concat → Dense(512) → Dropout → LayerNorm → Dense(vocab, softmax)
Input(150) → Masking → BiLSTM(64) → Dropout(0.2) → BiLSTM(32) → Dropout(0.2) → Dense(64, ReLU) → Dropout(0.2) → Dense(classes, softmax)
- Workflow: Press Record → perform signs → press again → upload to API → display translation
- Result display:
QPlainTextEdit, hidden until ready, shows translation with optional debug info - Single-instance lock: TCP port 52391
- Logging: stdout/stderr tee'd to
logs/desktop_app.log - Settings:
app/settings/settings.json - Path handling:
resource_path()for bundled assets,writable_path()for per-user data (%LOCALAPPDATA%\SignAI\) - Qt fix:
fix_qt_plugin_path()must run before any PySide6 import - User-site cleanup: User site-packages stripped from
sys.pathto avoid protobuf version conflicts - Build: PyInstaller spec at
app/SignAI - Desktop.spec, output atbuild/SignAI - Desktop/SignAI - Desktop.exe
- PyInstaller spec:
app/SignAI - Desktop.spec— bundles models, tokenizers, UI, icons (pathex set to repo root) - Updater:
app/start_updater.py, spec atapp/SignAI - Updater.spec - API overrides:
SIGNAI_MODEL_PATH/SIGNAI_MODELfor custom model path,SIGNAI_DISABLE_SITE_CLEANUP=1to disable user-site cleanup
- Camera feed: If no image appears, press "Switch Camera" repeatedly. Close other camera-using apps.
- Admin privileges: Some operations may require elevation. Future releases will reduce this.
- First-run delay: Models load from disk on first launch — wait a few seconds for the UI to become responsive.
- Recognition quality: Degrades for casual or atypical signing. Addressed by planned augmentation and larger datasets.
- Improve accuracy 3x via full datasets, larger compute, synthetic augmentation, and transformer architectures
- Expand vocabulary to thousands of gloss tokens
- Reduce admin access requirements
- Natural language rendering (gloss → grammatical sentences)
- Multilingual support (ASL planned)
- Fork and create a branch:
git checkout -b feat/my-change - Add tests and documentation for changes
- Open a Pull Request with a clear description
- Do not commit large model binaries — use release assets
Non-commercial license. See LICENSE. Contact maintainers for alternative arrangements.
- General / press: hello@signai.dev
- Support: open an issue at GitHub Issues