Add Moonshine ASR backend (CPU-friendly, with Arabic-tuned default) by Ahmed-Ezzat20 · Pull Request #19 · bakrianoo/mazinger

Ahmed-Ezzat20 · 2026-05-04T13:18:22Z

Summary

Adds a sixth transcription backend, Moonshine (Useful Sensors), via the
Hugging Face Transformers integration. Moonshine is a small (27–61 M params),
CPU-friendly ASR model and the Arabic-specialised checkpoint
moonshine-tiny-ar
matches whisper-medium quality on Arabic Fleurs/CommonVoice at 28× fewer
parameters — a great fit for users who want a free, local, Arabic-strong
alternative to Deepgram or full Whisper.

pip install \"mazinger[transcribe-moonshine]\"
mazinger transcribe audio.mp3 --method moonshine --language ar -o subs.srt

The default model is picked from --language: moonshine-tiny-ar for Arabic
and moonshine-base otherwise. Users can override with --model.

Implementation notes

_transcribe_moonshine() in mazinger/transcribe.py: loads
MoonshineForConditionalGeneration + AutoProcessor, runs Silero VAD
ourselves (Moonshine has no native long-audio chunker and was trained on
≤30 s clips), then transcribes each VAD chunk and stamps it with the chunk's
start/end times.
Hallucination guard: applies the model card's 13 tokens/sec cap as
max_length per chunk to suppress runaway outputs on near-silent regions.
Model caching: reuses the existing _whisper_cache, so
transcribe.clear_cache() correctly frees both Moonshine and Whisper models
between runs.
Audio preprocessing: reuses _preprocess_audio() so a project that
switches between faster-whisper and moonshine doesn't re-encode the audio.

Cross-cutting fix triggered by Moonshine's hallucination behaviour

While testing on a real Arabic clip with a music outro, Moonshine's third VAD
chunk got stuck in a \"كما قلت، كما قلت، كما قلت...\" loop. The existing
_REPEATED_WORD_RE only collapses whitespace-separated repeats, so
punctuation-separated stuck-token loops slipped through.

Added _REPEATED_PHRASE_RE that collapses 3+ phrase repeats joined by Latin
or Arabic punctuation. This benefits every backend that can produce
stuck-token loops on near-silent / OOD audio — Whisper, faster-whisper, and
WhisperX all do this occasionally too. Verified with 7 positive cases (Arabic
multi-word, Latin multi-word, single-word + punct, mixed real text + loop)
and a negative case ensuring normal sentences with shared words like "and"
are not over-collapsed.

Tests

tests/test_moonshine.py — 9 unit tests covering default-model selection,
the new phrase-cleanup regex (positive + negative), method-literal
membership, and the dispatch error message. They run without downloading
the model so they're cheap to add to CI. The full test suite (14 tests
including the existing MLX ones) passes.

End-to-end test result

Tested on the same 58-second Arabic clip used to validate Deepgram earlier:

Run	Wall-clock	Real-time factor	Notes
Cold (downloads model)	122 s	~0.5×	Surfaced the punctuation-loop bug above
Warm (cached model)	24 s	~2.4×	Loop collapsed correctly, two real-content chunks transcribe cleanly

Quality is comparable to Deepgram on the speech portions; the local zero-cost
zero-network properties are the main win.

Files

File	Change
`mazinger/transcribe.py`	New `_transcribe_moonshine()`, default-model picker, dispatch case, docstring updates, new `_REPEATED_PHRASE_RE`
`mazinger/cli/_transcribe.py`	`--method moonshine` choice + help text
`mazinger/cli/_groups.py`	`--transcribe-method moonshine` choice + help text
`pyproject.toml`	New `transcribe-moonshine` extra (`transformers>=4.49`, `torch>=2.4`, `silero-vad>=5.0`)
`tests/test_moonshine.py`	9 new unit tests
`README.md`	Feature list, install extras, dedicated Quick Start section
`docs/installation.md`	Local transcription extras + task matrix row
`docs/cli-reference.md`	Choice lists + transcribe example
`docs/quick-start.md`	Moonshine usage example

Adds a sixth transcription backend (--method moonshine / --transcribe-method moonshine) using Useful Sensors' Moonshine ASR via Hugging Face Transformers. Moonshine is small (27-61M params), CPU-friendly, and the Arabic-tuned moonshine-tiny-ar variant matches whisper-medium quality at 28x fewer params. Implementation: - _transcribe_moonshine() in transcribe.py: loads MoonshineForConditionalGeneration, runs Silero VAD ourselves (Moonshine has no native long-audio chunker and was trained on <=30s clips), and applies the model card's 13 tokens/sec hallucination cap on each chunk. - Default model auto-picked from --language: moonshine-tiny-ar for Arabic, moonshine-base for English / unknown. Users can always pass --model to override. - Reuses _whisper_cache so transcribe.clear_cache() correctly frees both the model and the processor between runs. - Reuses _preprocess_audio (16 kHz mono WAV) so subsequent backend calls on the same project don't reconvert. Cross-cutting fix triggered by Moonshine's hallucination behaviour: - Add _REPEATED_PHRASE_RE that collapses 3+ punctuation-separated phrase repeats (Latin or Arabic punctuation). Catches stuck-token loops like "كما قلت، كما قلت، كما قلت" that the existing whitespace-only regex missed. This benefits ALL backends (Whisper, faster-whisper, Moonshine) on near-silent / OOD audio. Tests: - tests/test_moonshine.py: 9 unit tests covering default-model selection, the new phrase-cleanup regex (positive + negative cases), method-literal membership, and the dispatch error message. All 14 tests pass (5 existing MLX + 9 new Moonshine). Docs: README, docs/installation.md, docs/cli-reference.md, docs/quick-start.md all updated with install extras, choice lists, and a Quick Start example highlighting the Arabic checkpoint.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Moonshine ASR backend (CPU-friendly, with Arabic-tuned default)#19

Add Moonshine ASR backend (CPU-friendly, with Arabic-tuned default)#19
Ahmed-Ezzat20 wants to merge 1 commit into
bakrianoo:masterfrom
Ahmed-Ezzat20:feat/moonshine-asr-backend

Ahmed-Ezzat20 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ahmed-Ezzat20 commented May 4, 2026

Summary

Implementation notes

Cross-cutting fix triggered by Moonshine's hallucination behaviour

Tests

End-to-end test result

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant