Moxin Voice

AI-powered Text-to-Speech desktop application with voice cloning — built on OminiX MLX

Moxin Voice is a modern, GPU-accelerated desktop TTS application built entirely in Rust. It uses the Makepad UI framework for native performance and the OminiX MLX inference stack for high-speed, Python-free speech synthesis on Apple Silicon.

🪄 New: Live Translation

Moxin Voice now includes a built-in Live Translation mode for real-time bilingual subtitles.

Microphone or system audio input — translate speech from your mic, browser, meeting app, or video player
Real-time subtitle overlay — compact or fullscreen floating window with adjustable text size, position, and opacity
Low-latency streaming pipeline — VAD-segmented ASR + rolling translation commits for readable subtitle chunks
Bilingual display — original text and translated text shown together in the overlay
No extra virtual audio driver required — system audio capture uses macOS ScreenCaptureKit directly

Hardware / System Requirements

Apple Silicon Mac required — M1 / M2 / M3 / M4
macOS 14.0+ recommended for the full app
Live Translation system audio input is macOS-only
System audio capture requires Screen Recording permission On first use, macOS will prompt for Screen Recording access because system audio capture is implemented with ScreenCaptureKit.
A display must be available ScreenCaptureKit requires a display-backed capture session even when you only want audio.

If Screen Recording permission is denied or ScreenCaptureKit is unavailable, Live Translation still works with the microphone input source.

⚡ Powered by OminiX MLX

The inference engine behind Moxin Voice is OminiX MLX — a comprehensive Rust-native ML inference ecosystem for Apple Silicon.

OminiX MLX provides:

Pure Rust inference — no Python runtime required at synthesis time
Metal GPU acceleration — optimized for M1/M2/M3/M4 chips via Apple's MLX framework
Unified memory — zero-copy CPU/GPU data sharing
Qwen3-TTS-MLX — the TTS engine used by Moxin Voice (9 built-in voices, 12 languages, ICL voice cloning, 2.3× real-time on M3 Max)

Moxin Voice uses OminiX MLX's dora-qwen3-tts-mlx node as its sole TTS backend. Source: node-hub/dora-qwen3-tts-mlx/

✨ Features

🎙️ Zero-Shot Voice Cloning — Clone any voice with 5–30 seconds of audio (ICL Express mode)
🎵 Text-to-Speech — 9 preset voices across Chinese, English, Japanese, and Korean
🌍 Live Translation — Real-time subtitles from microphone or system audio with a floating overlay
🔮 Qwen3-TTS-MLX Backend — 2.3× real-time synthesis via OminiX MLX on Apple Silicon
🎤 Audio Recording — Built-in real-time recording with waveform visualization
🔍 ASR Integration — Automatic text transcription for cloning reference audio
💾 Audio Export — Save generated speech as WAV files
🌓 Dark Mode — Native dark theme via Makepad GPU rendering
🌐 Bilingual UI — Chinese and English interface

🏗️ Architecture

moxin-voice/
├── moxin-voice-shell/          # Application entry point (binary)
├── apps/moxin-voice/           # UI + application logic
│   └── dataflow/tts.yml        # Dora dataflow graph
├── moxin-widgets/              # Shared Makepad UI components
├── moxin-ui/                   # Application infrastructure
├── moxin-dora-bridge/          # Dora dataflow integration bridge
└── node-hub/
    ├── dora-qwen3-tts-mlx/     # ★ OminiX MLX Qwen3-TTS Rust node
    │   └── previews/           # Pre-generated voice preview WAVs
    └── dora-qwen3-asr/         # ★ OminiX MLX Qwen3-ASR Rust node

The TTS pipeline runs as a Dora dataflow: the UI sends text, the qwen-tts-node (built from dora-qwen3-tts-mlx) synthesizes audio using OminiX MLX, and the audio player receives the stream.

🚀 Quick Start (macOS)

Prerequisites

macOS 14.0+ (Sonoma), Apple Silicon (M1/M2/M3/M4)
Rust 1.82+
Dora CLI (cargo install dora-cli)
Python 3.8+ (for the one-time model download script; not required at runtime)

1. Download Models

bash scripts/init_qwen3_models.sh

This downloads all three model snapshots into ~/.OminiX/models/:

Model	Purpose
`Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit`	Preset voice synthesis
`Qwen3-TTS-12Hz-1.7B-Base-8bit`	ICL zero-shot voice cloning
`Qwen3-ASR-1.7B-8bit`	Voice cloning reference audio transcription

huggingface_hub is installed automatically if not present.

2. Build

cargo build --release

This builds all binaries including dora-qwen3-asr (the ASR Dora node) and qwen-tts-node.

3. Run

dora up
cargo run -p moxin-voice-shell

First-Time Distribution (macOS .app)

For end-users receiving the distributed .app, model download and initialization happen automatically via the in-app bootstrap wizard on first launch.

🔮 Qwen3-TTS Voice Library

9 built-in preset voices, UI names localized to Chinese or English:

ID	Language	Character
`vivian`	zh	薇薇安 — bright, slightly edgy young female
`serena`	zh	赛琳娜 — warm, gentle young female
`uncle_fu`	zh	傅叔 — low, mellow seasoned male
`dylan`	zh	迪伦 — clear Beijing young male
`eric`	zh	埃里克 — lively Chengdu young male
`ryan`	en	Ryan — dynamic male with rhythmic drive
`aiden`	en	Aiden — sunny American male
`ono_anna`	ja	小野安奈 — playful Japanese female
`sohee`	ko	素熙 — warm Korean female

Voice Cloning (Express Mode)

Upload or record 5–30 seconds of reference audio. Moxin Voice uses Qwen3-TTS's In-Context Learning (ICL) to clone the voice in real time — no training required. ASR auto-transcription is optional; if ASR is unavailable, users can enter reference text manually.

📦 Build

Development

cargo build -p moxin-voice-shell

macOS App Bundle

bash scripts/build_macos_app.sh --version 0.1.0
bash scripts/build_macos_dmg.sh

Distribution Bootstrap (user machine)

bash scripts/macos_bootstrap.sh

Downloads Qwen3-TTS and Qwen3-ASR models, sets up the app-private conda env (needed for TTS download script only).

🔧 Technology Stack

Component	Technology
UI framework	Makepad — GPU-accelerated, pure Rust
TTS inference	OminiX MLX · Qwen3-TTS-MLX
TTS model	Qwen3-TTS (Alibaba)
ML runtime	Apple MLX via `mlx-sys` / `mlx-rs` (OminiX MLX)
Dataflow	Dora
Audio I/O	CPAL
ASR	OminiX MLX · Qwen3-ASR-MLX (Rust, Metal GPU)
Language	Rust 2021 edition

📝 License

Apache License 2.0 — see LICENSE.

🙏 Acknowledgments

OminiX MLX — the core ML inference engine powering all synthesis in this project
Qwen3-TTS — the TTS model (Alibaba)
Makepad — GPU-accelerated UI framework
Dora — dataflow architecture
Apple MLX — foundation for OminiX MLX

Repository: https://github.com/moxin-org/Moxin-Voice

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.cargo		.cargo
apps/moxin-voice		apps/moxin-voice
doc		doc
docs/superpowers		docs/superpowers
libs/dora-common		libs/dora-common
models		models
moxin-dora-bridge		moxin-dora-bridge
moxin-init		moxin-init
moxin-ui		moxin-ui
moxin-voice-shell		moxin-voice-shell
moxin-widgets		moxin-widgets
node-hub		node-hub
scripts		scripts
skills		skills
.gitignore		.gitignore
APP_DEVELOPMENT_GUIDE.md		APP_DEVELOPMENT_GUIDE.md
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
DEPLOY_WITH_NIX.md		DEPLOY_WITH_NIX.md
LICENSE		LICENSE
MACOS_CHECKLIST.md		MACOS_CHECKLIST.md
MACOS_DISTRIBUTION_PLAN.md		MACOS_DISTRIBUTION_PLAN.md
MACOS_SETUP.md		MACOS_SETUP.md
MIGRATION_CHECKLIST.md		MIGRATION_CHECKLIST.md
MIGRATION_PLAN.md		MIGRATION_PLAN.md
MLX_CORE_MIGRATION.md		MLX_CORE_MIGRATION.md
MLX_TTS_MIGRATION.md		MLX_TTS_MIGRATION.md
MOXIN_DORA_ARCHITECTURE.md		MOXIN_DORA_ARCHITECTURE.md
MOXIN_DORA_INTEGRATION_PLAN.md		MOXIN_DORA_INTEGRATION_PLAN.md
NODE_BACKENDS_SUMMARY.md		NODE_BACKENDS_SUMMARY.md
QUICKSTART_MACOS.md		QUICKSTART_MACOS.md
QWEN3_TTS_INTEGRATION_STATUS.md		QWEN3_TTS_INTEGRATION_STATUS.md
QWEN3_TTS_NODE_MIGRATION_FLOW.md		QWEN3_TTS_NODE_MIGRATION_FLOW.md
QWEN_MLX_SUBPROJECT_RESEARCH.md		QWEN_MLX_SUBPROJECT_RESEARCH.md
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
install_macos_deps.sh		install_macos_deps.sh
moxin-studio-dora-integration-checklist.md		moxin-studio-dora-integration-checklist.md
moxin_icon_fixed.png		moxin_icon_fixed.png
pixi.lock		pixi.lock
pixi.toml		pixi.toml
run-moxin-studio.sh		run-moxin-studio.sh
run.sh		run.sh
架构指南.md		架构指南.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moxin Voice

🪄 New: Live Translation

Hardware / System Requirements

⚡ Powered by OminiX MLX

✨ Features

🏗️ Architecture

🚀 Quick Start (macOS)

Prerequisites

1. Download Models

2. Build

3. Run

First-Time Distribution (macOS .app)

🔮 Qwen3-TTS Voice Library

Voice Cloning (Express Mode)

📦 Build

Development

macOS App Bundle

Distribution Bootstrap (user machine)

🔧 Technology Stack

📝 License

🙏 Acknowledgments

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Moxin Voice

🪄 New: Live Translation

Hardware / System Requirements

⚡ Powered by OminiX MLX

✨ Features

🏗️ Architecture

🚀 Quick Start (macOS)

Prerequisites

1. Download Models

2. Build

3. Run

First-Time Distribution (macOS .app)

🔮 Qwen3-TTS Voice Library

Voice Cloning (Express Mode)

📦 Build

Development

macOS App Bundle

Distribution Bootstrap (user machine)

🔧 Technology Stack

📝 License

🙏 Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages