feat: add CoreML export for Sherpa-ONNX Zipformer2 transducer by JarbasAl · Pull Request #35 · FluidInference/mobius

JarbasAl · 2026-03-27T10:44:22Z

Summary

Convert icefall Zipformer2 transducer checkpoints (Vosk/sherpa-onnx) to CoreML from original PyTorch .pt files
Exports encoder, stateless decoder, and joiner as three .mlpackage files
Includes comparison script, int8 quantization (3.5x compression), mel extraction, and greedy RNNT decoder

Key technical challenges solved

Patched coremltools _cast to handle numpy ndarray constants from aten::Int ops
Replaced dynamic reshapes in Zipformer2 attention with unflatten/flatten
Froze CompactRelPositionalEncoding outputs via forward hooks to eliminate pe.size() indexing during tracing
Patched Conv2dSubsampling to use permute+flatten instead of reshape(b,t,c*f)
convert_num_channels uses constant_pad_nd instead of zeros+cat
SimpleUpsample uses repeat_interleave instead of expand+reshape

Validated accuracy

Variant	Encoder Size	Cosine Similarity	Transcription
FP32	258 MB	1.000000	Exact match
INT8	75 MB	0.999861	Near-match
FP16	129 MB	0.990837	Diverges at tail

Test plan

uv sync && uv run python convert-coreml.py --checkpoint <path> --tokens <path> --output-dir ./build/test
uv run python compare-models.py --checkpoint <path> --tokens <path> --coreml-dir ./build/test --audio-file <wav>
uv run python quantize-coreml.py --input-dir ./build/test --output-dir ./build/test-int8

🤖 Generated with Claude Code

Convert icefall Zipformer2 transducer checkpoints (used by Vosk and sherpa-onnx) to CoreML .mlpackage format from original PyTorch .pt checkpoints. Exports encoder, stateless decoder, and joiner as three separate models. Key challenges solved: - Patched coremltools _cast to handle numpy ndarray constants from aten::Int ops in traced graphs - Replaced dynamic reshapes in Zipformer2 attention with unflatten/flatten to avoid traced shape variables - Froze CompactRelPositionalEncoding outputs via forward hooks to eliminate pe.size() indexing during tracing - Patched Conv2dSubsampling to use permute+flatten instead of reshape(b,t,c*f) - Replaced convert_num_channels zeros+cat with constant_pad_nd - SimpleUpsample uses repeat_interleave instead of expand+reshape Includes comparison script (PyTorch vs CoreML encoder cosine similarity and greedy RNNT transcription), int8 quantization (3.5x compression), mel spectrogram extraction, and greedy RNNT decoder. Validated: FP32 cosine=1.000000, INT8 cosine=0.999861, FP16 cosine=0.991. Encoder 258MB FP32 → 75MB INT8. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

devin-ai-integration

Devin Review found 3 potential issues.

View 6 additional findings in Devin Review.

devin-ai-integration · 2026-03-27T10:51:55Z

models/stt/sherpa-onnx-zipformer/coreml/convert-coreml.py

+    dec_path = output_dir / "decoder.mlpackage"
+    dec_ml.short_description = f"Zipformer2 Stateless Decoder (context_size={context_size})"
+    dec_ml.author = AUTHOR
+    dec_ml.save(str(dec_path))


🔴 Missing shutil.rmtree before saving decoder and joiner mlpackages causes failure on re-runs

The encoder save at convert-coreml.py:458-459 correctly checks for an existing .mlpackage directory and removes it before calling save(), but the decoder save at line 487 and the joiner save at line 520 lack this protection. CoreML's save() for .mlpackage format (which creates a directory structure) will fail if the target path already exists. This means re-running the conversion with the same --output-dir will succeed for the encoder (existing package is cleaned up) but fail for the decoder or joiner. The quantize-coreml.py at lines 89-91 follows the correct pattern of rmtree before save, confirming this is an oversight.

Suggested change

dec_path = output_dir / "decoder.mlpackage"

dec_ml.short_description = f"Zipformer2 Stateless Decoder (context_size={context_size})"

dec_ml.author = AUTHOR

dec_ml.save(str(dec_path))

dec_path = output_dir / "decoder.mlpackage"

dec_ml.short_description = f"Zipformer2 Stateless Decoder (context_size={context_size})"

dec_ml.author = AUTHOR

if dec_path.exists():

shutil.rmtree(dec_path)

dec_ml.save(str(dec_path))

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-27T10:51:57Z

models/stt/sherpa-onnx-zipformer/coreml/convert-coreml.py

+    join_path = output_dir / "joiner.mlpackage"
+    join_ml.short_description = "Zipformer2 Joiner (encoder_out + decoder_out -> logits)"
+    join_ml.author = AUTHOR
+    join_ml.save(str(join_path))


🔴 Missing shutil.rmtree before saving joiner mlpackage causes failure on re-runs

Same issue as the decoder save — the joiner save at line 520 is missing the if exists: shutil.rmtree() guard that the encoder save has at convert-coreml.py:458-459. This will cause the conversion to fail when the output directory already contains joiner.mlpackage from a previous run.

Suggested change

join_path = output_dir / "joiner.mlpackage"

join_ml.short_description = "Zipformer2 Joiner (encoder_out + decoder_out -> logits)"

join_ml.author = AUTHOR

join_ml.save(str(join_path))

join_path = output_dir / "joiner.mlpackage"

join_ml.short_description = "Zipformer2 Joiner (encoder_out + decoder_out -> logits)"

join_ml.author = AUTHOR

if join_path.exists():

shutil.rmtree(join_path)

join_ml.save(str(join_path))

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-27T10:51:58Z

models/stt/sherpa-onnx-zipformer/coreml/convert-coreml.py

+# Patch Conv2dSubsampling to avoid aten::Int ops that coremltools rejects
+# ---------------------------------------------------------------------------
+
+def _freeze_rel_pos_encoding(module: nn.Module) -> None:


🟡 _freeze_rel_pos_encoding return type annotated as None but returns a tuple

The function _freeze_rel_pos_encoding at line 145 is annotated -> None but actually returns (captured, hooks) at line 178. The caller at convert-coreml.py:426 correctly unpacks the return value as captured, hooks = _freeze_rel_pos_encoding(encoder). While this works at runtime (Python doesn't enforce return type annotations), it's incorrect and would be flagged by any type checker as trying to unpack None.

Suggested change

def _freeze_rel_pos_encoding(module: nn.Module) -> None:

def _freeze_rel_pos_encoding(module: nn.Module) -> Tuple[dict, list]:

Was this helpful? React with 👍 or 👎 to provide feedback.

Add FusedPreprocessorForExport that bakes kaldi fbank mel extraction into the CoreML encoder. When --fuse-mel is passed, the output is Preprocessor.mlpackage taking raw audio (1, 239120) like Parakeet, instead of encoder.mlpackage taking mel frames (1, 1495, 80). New file: fused_fbank.py — traceable kaldi-compatible fbank extractor using index-gather framing (no as_strided/unfold which coremltools rejects). Includes preemphasis (0.97), DC offset removal, povey window, and HTK mel filterbank via torchaudio. Both export modes preserved: --fuse-mel for FluidAudio integration, default mel-frames mode for standalone encoder validation.

Fused preprocessor (audio → encoder features) is now the default. Use --no-fuse-mel for standalone encoder with mel frame input.

Three fixes to fused_fbank.py: - Use high_freq=0.0 (Nyquist=8000) for mel filterbank, matching torchaudio.kaldi.fbank default. Was using 7600 (kaldi CLI default). - Fix padding offset: left_pad = win//2 - hop//2 = 120, not win//2 = 200 - Apply preemphasis per-frame after extraction (kaldi order), not on the full waveform before framing - Use reflection padding (torch.flip + cat) instead of zero padding - Use machine epsilon for log floor instead of energy_floor=1.0 Verified with debug-fbank.py: all 10 processing steps pass with cosine=1.000000 and max_diff=0.000000 against torchaudio reference. New file: debug-fbank.py — step-by-step comparison script for validating fbank parity at each processing stage.

…rences

- model_avg has substantially different weights (cosine=0.82 vs ONNX) causing ~26% WER instead of ~9% - Add compare-pipeline.py for stage-by-stage PyTorch vs CoreML validation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

devin-ai-integration bot reviewed Mar 27, 2026

View reviewed changes

JarbasAl mentioned this pull request Mar 27, 2026

feat: add Zipformer2 transducer support for Vosk/sherpa-onnx models FluidInference/FluidAudio#443

Draft

JarbasAl marked this pull request as draft March 27, 2026 18:22

JarbasAl and others added 5 commits March 27, 2026 18:32

refactor: make --fuse-mel the default export mode

93f9f75

Fused preprocessor (audio → encoder features) is now the default. Use --no-fuse-mel for standalone encoder with mel frame input.

docs: update README for fused preprocessor, remove private model refe…

03e052a

…rences

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add CoreML export for Sherpa-ONNX Zipformer2 transducer#35

feat: add CoreML export for Sherpa-ONNX Zipformer2 transducer#35
JarbasAl wants to merge 6 commits intoFluidInference:mainfrom
TigreGotico:feat/sherpa-onnx-zipformer-coreml

JarbasAl commented Mar 27, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 27, 2026

Uh oh!

devin-ai-integration bot Mar 27, 2026

Uh oh!

devin-ai-integration bot Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	def _freeze_rel_pos_encoding(module: nn.Module) -> None:
	def _freeze_rel_pos_encoding(module: nn.Module) -> Tuple[dict, list]:

Conversation

JarbasAl commented Mar 27, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key technical challenges solved

Validated accuracy

Test plan

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JarbasAl commented Mar 27, 2026 •

edited by devin-ai-integration bot

Loading