Add Moonshine ASR backend (CPU-friendly, with Arabic-tuned default)#19
Open
Ahmed-Ezzat20 wants to merge 1 commit into
Open
Add Moonshine ASR backend (CPU-friendly, with Arabic-tuned default)#19Ahmed-Ezzat20 wants to merge 1 commit into
Ahmed-Ezzat20 wants to merge 1 commit into
Conversation
Adds a sixth transcription backend (--method moonshine / --transcribe-method moonshine) using Useful Sensors' Moonshine ASR via Hugging Face Transformers. Moonshine is small (27-61M params), CPU-friendly, and the Arabic-tuned moonshine-tiny-ar variant matches whisper-medium quality at 28x fewer params. Implementation: - _transcribe_moonshine() in transcribe.py: loads MoonshineForConditionalGeneration, runs Silero VAD ourselves (Moonshine has no native long-audio chunker and was trained on <=30s clips), and applies the model card's 13 tokens/sec hallucination cap on each chunk. - Default model auto-picked from --language: moonshine-tiny-ar for Arabic, moonshine-base for English / unknown. Users can always pass --model to override. - Reuses _whisper_cache so transcribe.clear_cache() correctly frees both the model and the processor between runs. - Reuses _preprocess_audio (16 kHz mono WAV) so subsequent backend calls on the same project don't reconvert. Cross-cutting fix triggered by Moonshine's hallucination behaviour: - Add _REPEATED_PHRASE_RE that collapses 3+ punctuation-separated phrase repeats (Latin or Arabic punctuation). Catches stuck-token loops like "كما قلت، كما قلت، كما قلت" that the existing whitespace-only regex missed. This benefits ALL backends (Whisper, faster-whisper, Moonshine) on near-silent / OOD audio. Tests: - tests/test_moonshine.py: 9 unit tests covering default-model selection, the new phrase-cleanup regex (positive + negative cases), method-literal membership, and the dispatch error message. All 14 tests pass (5 existing MLX + 9 new Moonshine). Docs: README, docs/installation.md, docs/cli-reference.md, docs/quick-start.md all updated with install extras, choice lists, and a Quick Start example highlighting the Arabic checkpoint.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a sixth transcription backend, Moonshine (Useful Sensors), via the
Hugging Face Transformers integration. Moonshine is a small (27–61 M params),
CPU-friendly ASR model and the Arabic-specialised checkpoint
moonshine-tiny-armatches
whisper-mediumquality on Arabic Fleurs/CommonVoice at 28× fewerparameters — a great fit for users who want a free, local, Arabic-strong
alternative to Deepgram or full Whisper.
The default model is picked from
--language:moonshine-tiny-arfor Arabicand
moonshine-baseotherwise. Users can override with--model.Implementation notes
_transcribe_moonshine()inmazinger/transcribe.py: loadsMoonshineForConditionalGeneration+AutoProcessor, runs Silero VADourselves (Moonshine has no native long-audio chunker and was trained on
≤30 s clips), then transcribes each VAD chunk and stamps it with the chunk's
start/end times.
13 tokens/seccap asmax_lengthper chunk to suppress runaway outputs on near-silent regions._whisper_cache, sotranscribe.clear_cache()correctly frees both Moonshine and Whisper modelsbetween runs.
_preprocess_audio()so a project thatswitches between faster-whisper and moonshine doesn't re-encode the audio.
Cross-cutting fix triggered by Moonshine's hallucination behaviour
While testing on a real Arabic clip with a music outro, Moonshine's third VAD
chunk got stuck in a
\"كما قلت، كما قلت، كما قلت...\"loop. The existing_REPEATED_WORD_REonly collapses whitespace-separated repeats, sopunctuation-separated stuck-token loops slipped through.
Added
_REPEATED_PHRASE_REthat collapses 3+ phrase repeats joined by Latinor Arabic punctuation. This benefits every backend that can produce
stuck-token loops on near-silent / OOD audio — Whisper, faster-whisper, and
WhisperX all do this occasionally too. Verified with 7 positive cases (Arabic
multi-word, Latin multi-word, single-word + punct, mixed real text + loop)
and a negative case ensuring normal sentences with shared words like "and"
are not over-collapsed.
Tests
tests/test_moonshine.py— 9 unit tests covering default-model selection,the new phrase-cleanup regex (positive + negative), method-literal
membership, and the dispatch error message. They run without downloading
the model so they're cheap to add to CI. The full test suite (14 tests
including the existing MLX ones) passes.
End-to-end test result
Tested on the same 58-second Arabic clip used to validate Deepgram earlier:
Quality is comparable to Deepgram on the speech portions; the local zero-cost
zero-network properties are the main win.
Files
mazinger/transcribe.py_transcribe_moonshine(), default-model picker, dispatch case, docstring updates, new_REPEATED_PHRASE_REmazinger/cli/_transcribe.py--method moonshinechoice + help textmazinger/cli/_groups.py--transcribe-method moonshinechoice + help textpyproject.tomltranscribe-moonshineextra (transformers>=4.49,torch>=2.4,silero-vad>=5.0)tests/test_moonshine.pyREADME.mddocs/installation.mddocs/cli-reference.mddocs/quick-start.md