MellowMic routes your microphone through a chain of low-latency voice effects, optional AI voice conversion, and VB-Cable so any app — Discord, OBS, Teams, Zoom, Twitch — can pick up your transformed voice without sending audio to a cloud API.
- Local-only. Audio never leaves your machine.
- 5 ms one-way latency at the default block size — usable on a live stream.
- Eight built-in DSP voices plus optional licensed RVC ONNX models.
- A real desktop UI — dark / light themes, live VU meters, latency display, drop counter.
- Hot-swappable effects with a 40 ms crossfade — switch presets while you talk without a click or pop.
Note
MellowMic does not ship celebrity voices or any speaker identity. The built-in DSP presets are effects (deeper, brighter, robotic, etc.) — they cannot make you sound like a specific person. AI voice models must be supplied by the user and must be owned or licensed for the intended use.
- What you'll need
- Install — fast path
- Install — manual
- First run tutorial
- Using MellowMic with Discord / OBS / Teams
- The eight DSP voices
- AI voice models (optional)
- CLI mode
- Customizing effects
- VAD backends
- Formant-preserving pitch
- Building a single-exe
- Troubleshooting
- FAQ
- Architecture
- Development
- License
| OS | Windows 10 or 11 |
| Python | 3.11 or 3.13 (3.12 also works) |
| Routing | VB-Audio Virtual Cable — free download, install once |
| Headphones | Required for live monitoring (so you can hear yourself) |
| GPU (optional) | NVIDIA + CUDA for AI voice mode at low latency |
Clone the repo, double-click the launcher. The launcher creates .venv on first run and installs everything for you.
git clone https://github.com/gauravsoodtech/Mellow-mic.git
cd Mellow-mic
.\MellowMic.cmdThat's it. Use MellowMic-Debug.cmd if you want the console window to stay open so you can read errors during setup.
If you'd rather control the venv yourself:
py -3.11 -m venv .venv
.\.venv\Scripts\activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt # for tests
python main.pyOptional AI dependencies (only if you have an RVC ONNX model bundle):
python -m pip install -r requirements-ai.txt- Install VB-Cable. Download from https://vb-audio.com/Cable/, run the installer as administrator, reboot. You should now have a
CABLE Inputoutput device and aCABLE Outputinput device in Windows sound settings. - Launch MellowMic with
MellowMic.cmd. - In the Routing panel:
- Microphone Input — your real mic.
- MellowMic Output —
CABLE Input (VB-Audio Virtual Cable). The app auto-detects this on first run. - Headphone Monitor — your headphones (so you can hear the processed voice). Toggle the monitor switch on.
- Click Start Processing.
- Talk. You should hear your voice in your headphones.
- Try a preset:
- Start with Clean Bypass to confirm routing.
- Switch to DSP Effects → Feminine DSP (or any other) and slide the Voice Blend slider — left is dry, right is fully processed.
- Open the app you want to use the voice in (Discord, OBS, Teams) and set its microphone to
CABLE Output (VB-Audio Virtual Cable). That app now hears your processed voice.
That's the loop. Your settings — engine mode, active effect, blend level, devices, monitor toggle — persist between runs.
Once routing is working, point each app's microphone setting at CABLE Output:
- Settings → Voice & Video
- Input Device →
CABLE Output (VB-Audio Virtual Cable) - Voice Processing → turn off "Echo Cancellation" and "Noise Suppression" (Discord's processing fights ours).
- Settings → Audio
- Mic / Auxiliary Audio →
CABLE Output (VB-Audio Virtual Cable)
- Settings → Devices
- Microphone →
CABLE Output. Teams' built-in noise suppression will partially fight MellowMic's chain — set it to "Low" or off.
- Settings → Audio
- Microphone →
CABLE Output - Disable "Automatically adjust microphone volume" and "Suppress background noise" for predictable behavior.
| Preset | What it does |
|---|---|
| Bypass | Clean passthrough, no processing. Use to confirm routing. |
| Deep Voice | -1.5 semitone shift + warm/smooth EQ + tanh saturation. Lower, broader. |
| Bright Voice | +4.0 semitone shift, mostly wet. Playful chipmunk-style. |
| Feminine DSP | +1.8 semitone shift + 4-stage EQ (HP 120 Hz, body, air, deess) + limiting. Low-latency feminine approximation. |
| Robot | 72 Hz ring modulation, kept readable. |
| Radio | 300–3400 Hz bandpass plus tanh distortion — AM-radio crackle. |
| Echo | 280 ms delay, 38 % feedback, 55 % wet. Spacious. |
| Demon | -3.0 semitone shift + 45 Hz ring mod + sub-bass emphasis. Dramatic growl. |
The Voice Blend slider mixes dry and wet for every preset, so you can dial any effect from subtle to extreme.
For a true new-speaker voice you need an RVC ONNX model bundle. Drop bundles under models/ like:
models/
└── my_voice/
├── metadata.json # see models/README.md for schema
└── voice.onnx
Then in the app, switch to AI Model Voice, pick the model, and choose CUDAExecutionProvider if you have an NVIDIA GPU.
Important
AI models are NOT shipped with MellowMic. You must own or license any model you use. The app does not generate, train, or distribute speaker identities.
Useful for testing and scripting:
python main.py --cli --list-devices
python main.py --cli --engine dsp --effect feminine_dsp
python main.py --cli --engine bypass
python main.py --cli --list-models
python main.py --cli --engine ai --model-id my_voice --ai-provider CUDAExecutionProviderCtrl+C stops cleanly.
The built-in DSP presets ship with hand-tuned constants. You can override them without editing Python by copying presets/effect_overrides.example.json to presets/effect_overrides.json:
{
"deep": { "pitch_semitones": -2.0 },
"feminine_dsp": { "pitch_semitones": 2.4, "air_freq_hz": 5000.0 },
"chipmunk": { "pitch_semitones": 3.5 },
"demon": { "pitch_semitones": -4.0 }
}Restart the app to pick up changes. Unknown keys and fields are silently ignored, so it is safe to leave the file in place across upgrades. The live effect_overrides.json is gitignored — only the .example.json template is checked in.
Voice activity detection gates silence so the effects chain only runs while you are actually speaking. Two backends:
# config.yaml
vad:
backend: webrtc # default — fast, no extra deps, good enough for most voices
# backend: silero # better on whispers / music / crosstalk; needs an ONNX model
silero_model_path: models/vad/silero_vad.onnx
silero_threshold: 0.5 # 0.0 = always speech, 1.0 = never speechIf backend: silero is set but the model file or onnxruntime is missing, MellowMic silently falls back to webrtcvad so audio always works.
By default the pitch shifter uses rational resampling — fast but not formant-aware (deep voices sound a bit "tube-y"). Set this in config.yaml to use Pedalboard's PitchShift instead:
effects:
formant_preserving_pitch: truePedalboard wraps Rubberband / SoundTouch under the hood, which is noticeably more natural for shifts greater than ±2 semitones at the cost of a small extra CPU hit. If Pedalboard isn't installed, MellowMic falls back to resampling automatically.
.\.venv\Scripts\activate
python -m pip install pyinstaller
python -m PyInstaller voiceforge.spec --noconfirmOutput: dist/VoiceForge.exe. CI also builds and uploads this on every v* tag — see .github/workflows/release.yml.
I hear nothing in my headphones.
- Did you click Start Processing? The big button must say
STOP PROCESSING. - Is the Headphone Monitor toggle on?
- Is the right output device selected as the monitor?
- Look at the Drops counter on the right panel — if it's racing up, your block size is too small for your CPU. Edit
config.yaml:audio.block_size: 512.
Discord still hears my normal voice.
Discord caches the input device. Switch its Input Device to something else, then back to CABLE Output. Or restart Discord.
VB-Cable is not detected.
The status chip at the top right shows VB-Cable not found. Reinstall VB-Cable as administrator and reboot. After reboot, both CABLE Input and CABLE Output should appear in Windows Sound settings.
Latency is too high.
Edit config.yaml:
audio:
block_size: 128 # default 256 (~5.3 ms), drop to 128 (~2.7 ms) on fast CPUs
ring_buffer_depth: 4 # default 8, lower = less bufferIf drops spike, go back up. The trade-off is real on slower hardware.
"Audio start failed" toast on launch.
The selected input or output device disappeared. Re-pick a device in the Routing panel, then click Start. The new typed AudioStartError carries the original cause — check the bottom status bar for the underlying message.
I switched effects and heard a click.
You shouldn't — engine swaps run through a 40 ms crossfade. If you do hear a click, the most likely cause is changing the block size or sample rate mid-session, which forces a full audio-engine restart. Stop, change the value, then start again.
Does MellowMic send audio anywhere? No. Every effect runs locally on your CPU (and optionally GPU for AI models). There is no cloud component, no telemetry, no API key.
Can it imitate a specific person's voice? Not with the built-in DSP presets — those are effects, not speaker identities. AI mode can, but only with an RVC ONNX model that you supply. Use only models you own or are licensed to use.
Does it work on macOS or Linux?
Not yet. The audio path is Windows-only via WASAPI + VB-Cable. macOS BlackHole / Linux PulseAudio support is on the roadmap — stubbed in core/routing.py.
Why Python and not C++?
Real-time audio in Python works fine when the hot path is NumPy / SciPy / sounddevice (PortAudio) — those release the GIL and run native code. The pure-Python overhead per block is well under a millisecond at 256 samples.
Can I add my own effect?
Right now you have to add a method to EffectsChain in core/effects.py and register it in PRESET_INFO. A plugin manifest system is on the roadmap.
mic ──► [WASAPI duplex stream]
│
▼
[input ring buffer]
│
▼
┌────────────────────────────────┐
│ Processor Chain │
│ │
│ VAD ──► Noise Suppressor ──► │
│ │ │
│ ▼ │
│ Voice Engine (DSP / AI / Bypass)
│ │ │
│ ▼ │
│ Output Limiter │
└────────────┬───────────────────┘
▼
[output ring buffer]
│
▼
[WASAPI duplex stream]
│
├──► VB-Cable ──► Discord / OBS / Teams
└──► Headphone monitor (optional separate stream)
Module overview:
| Module | Responsibility |
|---|---|
core/audio_io.py |
WASAPI duplex stream, ring buffers, RT-safe processing thread, AudioStartError. Read the module docstring before changing the audio path. |
core/processor_chain.py |
Composable VAD → noise suppressor → engine → limiter chain, plus CrossfadeProcessor for hot-swap. |
core/effects.py |
Eight built-in DSP presets and the EffectsChain state machine. |
core/engines/ |
BypassVoiceEngine, DspVoiceEngine, RvcOnnxVoiceEngine — all share a process() / set_params() / set_enabled() / close() / stats() protocol. |
core/vad.py + core/vad_silero.py |
webrtcvad and Silero ONNX backends behind a single make_vad() factory. |
core/noise_suppression.py |
80 Hz HPF → spectral denoise (noisereduce) → expander gate. |
core/model_registry.py |
Discovery and validation of licensed RVC ONNX bundles under models/. |
ui/main_window.py |
PyQt5 control-room UI — header, mode rail, effects, routing, diagnostics. Settings persistence via ui/theme.py. |
presets/ |
effect_overrides.py user-tweakable preset constants + Phase 2 PresetConfig dataclass for future neural pipeline. |
Run the test suite:
.\.venv\Scripts\python.exe -m pytest tests -q --basetemp .pytest-tmpCurrently 56 passing tests covering: real-time safety contract (lock under stress, passthrough on processor exception, no print() on the audio thread, AudioStartError cause chain, engine close()), Tier 2 polish (crossfade ramp shape, NoiseSuppressor reset-on-toggle, UI settings merge, VAD scratch buffer reuse), and Tier 3 features (Silero fallback, override JSON parsing, EffectsChain pitch override path).
If you change core/audio_io.py, re-read the module docstring — the audio-thread safety contract (no blocking I/O, no zero-fill on error, lock around processor swap) is load-bearing and easy to regress.
MIT — Copyright (c) 2026 Gaurav Sood.