Commit abd01b9

and

authored

fix(tts): remove 440Hz sine wave placeholder, implement ALBERT encoder (#183)

Fixes #179 - TTS sample outputs beep sound instead of speech Changes: - Remove 440Hz sine wave placeholder generation in _forward_simple() - Implement ALBERT encoder (Kokoro uses ALBERT, not standard BERT) - Add WeightNormConv1d for weight-normalized convolutions - Add InstanceNorm1d for per-channel normalization - Add AdaIN (Adaptive Instance Normalization) for style conditioning - Add KokoroTextEncoder (CNN + BiLSTM architecture) - Add AdaINResBlock for style-conditioned residual blocks - Add builder functions: build_albert_from_weights(), build_text_encoder_from_weights() - Update model.py to use actual neural network layers - Generate silence placeholder instead of beep when decoder not implemented Note: Full decoder/vocoder implementation requires additional weight mapping. Current implementation runs through ALBERT and text encoder, generating placeholder audio while decoder pipeline is being completed. Testing: Not yet verified - requires model weights and audio playback. Testing will be done separately as noted in Issue #179. Build: No C++/CUDA build required. Python-only changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

1 parent 26df666 commit abd01b9Copy full SHA for abd01b9

2 files changed

src/pygpukit/tts/kokoro
- layers.py
- model.py

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit abd01b9

File tree

0 commit comments