MellowMic

Real-time voice styles for creators — local, low-latency, zero cloud.

MellowMic routes your microphone through a chain of low-latency voice effects, optional AI voice conversion, and VB-Cable so any app — Discord, OBS, Teams, Zoom, Twitch — can pick up your transformed voice without sending audio to a cloud API.

Local-only. Audio never leaves your machine.
5 ms one-way latency at the default block size — usable on a live stream.
Eight built-in DSP voices plus optional licensed RVC ONNX models.
A real desktop UI — dark / light themes, live VU meters, latency display, drop counter.
Hot-swappable effects with a 40 ms crossfade — switch presets while you talk without a click or pop.

Note

MellowMic does not ship celebrity voices or any speaker identity. The built-in DSP presets are effects (deeper, brighter, robotic, etc.) — they cannot make you sound like a specific person. AI voice models must be supplied by the user and must be owned or licensed for the intended use.

What you'll need


OS	Windows 10 or 11
Python	3.11 or 3.13 (3.12 also works)
Routing	VB-Audio Virtual Cable — free download, install once
Headphones	Required for live monitoring (so you can hear yourself)
GPU (optional)	NVIDIA + CUDA for AI voice mode at low latency

Install — fast path

Clone the repo, double-click the launcher. The launcher creates .venv on first run and installs everything for you.

git clone https://github.com/gauravsoodtech/Mellow-mic.git
cd Mellow-mic
.\MellowMic.cmd

That's it. Use MellowMic-Debug.cmd if you want the console window to stay open so you can read errors during setup.

Install — manual

If you'd rather control the venv yourself:

py -3.11 -m venv .venv
.\.venv\Scripts\activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt   # for tests
python main.py

Optional AI dependencies (only if you have an RVC ONNX model bundle):

python -m pip install -r requirements-ai.txt

First run tutorial

Install VB-Cable. Download from https://vb-audio.com/Cable/, run the installer as administrator, reboot. You should now have a CABLE Input output device and a CABLE Output input device in Windows sound settings.
Launch MellowMic with MellowMic.cmd.
In the Routing panel:
- Microphone Input — your real mic.
- MellowMic Output — CABLE Input (VB-Audio Virtual Cable). The app auto-detects this on first run.
- Headphone Monitor — your headphones (so you can hear the processed voice). Toggle the monitor switch on.
Click Start Processing.
Talk. You should hear your voice in your headphones.
Try a preset:
- Start with Clean Bypass to confirm routing.
- Switch to DSP Effects → Feminine DSP (or any other) and slide the Voice Blend slider — left is dry, right is fully processed.
Open the app you want to use the voice in (Discord, OBS, Teams) and set its microphone to CABLE Output (VB-Audio Virtual Cable). That app now hears your processed voice.

That's the loop. Your settings — engine mode, active effect, blend level, devices, monitor toggle — persist between runs.

Using MellowMic with Discord / OBS / Teams

Once routing is working, point each app's microphone setting at CABLE Output:

Discord

Settings → Voice & Video
Input Device → CABLE Output (VB-Audio Virtual Cable)
Voice Processing → turn off "Echo Cancellation" and "Noise Suppression" (Discord's processing fights ours).

OBS

Settings → Audio
Mic / Auxiliary Audio → CABLE Output (VB-Audio Virtual Cable)

Microsoft Teams

Settings → Devices
Microphone → CABLE Output. Teams' built-in noise suppression will partially fight MellowMic's chain — set it to "Low" or off.

Zoom

Settings → Audio
Microphone → CABLE Output
Disable "Automatically adjust microphone volume" and "Suppress background noise" for predictable behavior.

The eight DSP voices

Preset	What it does
Bypass	Clean passthrough, no processing. Use to confirm routing.
Deep Voice	-1.5 semitone shift + warm/smooth EQ + tanh saturation. Lower, broader.
Bright Voice	+4.0 semitone shift, mostly wet. Playful chipmunk-style.
Feminine DSP	+1.8 semitone shift + 4-stage EQ (HP 120 Hz, body, air, deess) + limiting. Low-latency feminine approximation.
Robot	72 Hz ring modulation, kept readable.
Radio	300–3400 Hz bandpass plus tanh distortion — AM-radio crackle.
Echo	280 ms delay, 38 % feedback, 55 % wet. Spacious.
Demon	-3.0 semitone shift + 45 Hz ring mod + sub-bass emphasis. Dramatic growl.

The Voice Blend slider mixes dry and wet for every preset, so you can dial any effect from subtle to extreme.

AI voice models (optional)

For a true new-speaker voice you need an RVC ONNX model bundle. Drop bundles under models/ like:

models/
└── my_voice/
    ├── metadata.json     # see models/README.md for schema
    └── voice.onnx

Then in the app, switch to AI Model Voice, pick the model, and choose CUDAExecutionProvider if you have an NVIDIA GPU.

Important

AI models are NOT shipped with MellowMic. You must own or license any model you use. The app does not generate, train, or distribute speaker identities.

CLI mode

Useful for testing and scripting:

python main.py --cli --list-devices
python main.py --cli --engine dsp --effect feminine_dsp
python main.py --cli --engine bypass
python main.py --cli --list-models
python main.py --cli --engine ai --model-id my_voice --ai-provider CUDAExecutionProvider

Ctrl+C stops cleanly.

Customizing effects

The built-in DSP presets ship with hand-tuned constants. You can override them without editing Python by copying presets/effect_overrides.example.json to presets/effect_overrides.json:

{
  "deep":         { "pitch_semitones": -2.0 },
  "feminine_dsp": { "pitch_semitones":  2.4, "air_freq_hz": 5000.0 },
  "chipmunk":     { "pitch_semitones":  3.5 },
  "demon":        { "pitch_semitones": -4.0 }
}

Restart the app to pick up changes. Unknown keys and fields are silently ignored, so it is safe to leave the file in place across upgrades. The live effect_overrides.json is gitignored — only the .example.json template is checked in.

VAD backends

Voice activity detection gates silence so the effects chain only runs while you are actually speaking. Two backends:

# config.yaml
vad:
  backend: webrtc            # default — fast, no extra deps, good enough for most voices
  # backend: silero          # better on whispers / music / crosstalk; needs an ONNX model
  silero_model_path: models/vad/silero_vad.onnx
  silero_threshold: 0.5      # 0.0 = always speech, 1.0 = never speech

If backend: silero is set but the model file or onnxruntime is missing, MellowMic silently falls back to webrtcvad so audio always works.

Formant-preserving pitch

By default the pitch shifter uses rational resampling — fast but not formant-aware (deep voices sound a bit "tube-y"). Set this in config.yaml to use Pedalboard's PitchShift instead:

effects:
  formant_preserving_pitch: true

Pedalboard wraps Rubberband / SoundTouch under the hood, which is noticeably more natural for shifts greater than ±2 semitones at the cost of a small extra CPU hit. If Pedalboard isn't installed, MellowMic falls back to resampling automatically.

Building a single-exe

.\.venv\Scripts\activate
python -m pip install pyinstaller
python -m PyInstaller voiceforge.spec --noconfirm

Output: dist/VoiceForge.exe. CI also builds and uploads this on every v* tag — see .github/workflows/release.yml.

Troubleshooting

I hear nothing in my headphones.

Did you click Start Processing? The big button must say STOP PROCESSING.
Is the Headphone Monitor toggle on?
Is the right output device selected as the monitor?
Look at the Drops counter on the right panel — if it's racing up, your block size is too small for your CPU. Edit config.yaml: audio.block_size: 512.

Discord still hears my normal voice.

Discord caches the input device. Switch its Input Device to something else, then back to CABLE Output. Or restart Discord.

VB-Cable is not detected.

The status chip at the top right shows VB-Cable not found. Reinstall VB-Cable as administrator and reboot. After reboot, both CABLE Input and CABLE Output should appear in Windows Sound settings.

Latency is too high.

Edit config.yaml:

audio:
  block_size: 128   # default 256 (~5.3 ms), drop to 128 (~2.7 ms) on fast CPUs
  ring_buffer_depth: 4   # default 8, lower = less buffer

If drops spike, go back up. The trade-off is real on slower hardware.

"Audio start failed" toast on launch.

The selected input or output device disappeared. Re-pick a device in the Routing panel, then click Start. The new typed AudioStartError carries the original cause — check the bottom status bar for the underlying message.

I switched effects and heard a click.

You shouldn't — engine swaps run through a 40 ms crossfade. If you do hear a click, the most likely cause is changing the block size or sample rate mid-session, which forces a full audio-engine restart. Stop, change the value, then start again.

FAQ

Does MellowMic send audio anywhere? No. Every effect runs locally on your CPU (and optionally GPU for AI models). There is no cloud component, no telemetry, no API key.

Can it imitate a specific person's voice? Not with the built-in DSP presets — those are effects, not speaker identities. AI mode can, but only with an RVC ONNX model that you supply. Use only models you own or are licensed to use.

Does it work on macOS or Linux? Not yet. The audio path is Windows-only via WASAPI + VB-Cable. macOS BlackHole / Linux PulseAudio support is on the roadmap — stubbed in core/routing.py.

Why Python and not C++? Real-time audio in Python works fine when the hot path is NumPy / SciPy / sounddevice (PortAudio) — those release the GIL and run native code. The pure-Python overhead per block is well under a millisecond at 256 samples.

Can I add my own effect? Right now you have to add a method to EffectsChain in core/effects.py and register it in PRESET_INFO. A plugin manifest system is on the roadmap.

Architecture

mic ──► [WASAPI duplex stream]
            │
            ▼
       [input ring buffer]
            │
            ▼
   ┌────────────────────────────────┐
   │       Processor Chain          │
   │                                │
   │  VAD ──► Noise Suppressor ──►  │
   │   │                            │
   │   ▼                            │
   │  Voice Engine (DSP / AI / Bypass)
   │   │                            │
   │   ▼                            │
   │  Output Limiter                │
   └────────────┬───────────────────┘
                ▼
       [output ring buffer]
                │
                ▼
   [WASAPI duplex stream]
            │
            ├──► VB-Cable ──► Discord / OBS / Teams
            └──► Headphone monitor (optional separate stream)

Module overview:

Module	Responsibility
`core/audio_io.py`	WASAPI duplex stream, ring buffers, RT-safe processing thread, `AudioStartError`. Read the module docstring before changing the audio path.
`core/processor_chain.py`	Composable VAD → noise suppressor → engine → limiter chain, plus `CrossfadeProcessor` for hot-swap.
`core/effects.py`	Eight built-in DSP presets and the `EffectsChain` state machine.
`core/engines/`	`BypassVoiceEngine`, `DspVoiceEngine`, `RvcOnnxVoiceEngine` — all share a `process()` / `set_params()` / `set_enabled()` / `close()` / `stats()` protocol.
`core/vad.py` + `core/vad_silero.py`	webrtcvad and Silero ONNX backends behind a single `make_vad()` factory.
`core/noise_suppression.py`	80 Hz HPF → spectral denoise (noisereduce) → expander gate.
`core/model_registry.py`	Discovery and validation of licensed RVC ONNX bundles under `models/`.
`ui/main_window.py`	PyQt5 control-room UI — header, mode rail, effects, routing, diagnostics. Settings persistence via `ui/theme.py`.
`presets/`	`effect_overrides.py` user-tweakable preset constants + Phase 2 `PresetConfig` dataclass for future neural pipeline.

Development

Run the test suite:

.\.venv\Scripts\python.exe -m pytest tests -q --basetemp .pytest-tmp

Currently 56 passing tests covering: real-time safety contract (lock under stress, passthrough on processor exception, no print() on the audio thread, AudioStartError cause chain, engine close()), Tier 2 polish (crossfade ramp shape, NoiseSuppressor reset-on-toggle, UI settings merge, VAD scratch buffer reuse), and Tier 3 features (Silero fallback, override JSON parsing, EffectsChain pitch override path).

If you change core/audio_io.py, re-read the module docstring — the audio-thread safety contract (no blocking I/O, no zero-fill on error, lock around processor swap) is load-bearing and easy to regress.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MellowMic

Table of Contents

What you'll need

Install — fast path

Install — manual

First run tutorial

Using MellowMic with Discord / OBS / Teams

Discord

OBS

Microsoft Teams

Zoom

The eight DSP voices

AI voice models (optional)

CLI mode

Customizing effects

VAD backends

Formant-preserving pitch

Building a single-exe

Troubleshooting

FAQ

Architecture

Development

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
assets		assets
core		core
docs		docs
models		models
presets		presets
tests		tests
ui		ui
utils		utils
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MellowMic-Debug.cmd		MellowMic-Debug.cmd
MellowMic.cmd		MellowMic.cmd
README.md		README.md
VoiceForge-Debug.cmd		VoiceForge-Debug.cmd
VoiceForge.cmd		VoiceForge.cmd
config.yaml		config.yaml
main.py		main.py
requirements-ai.txt		requirements-ai.txt
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup_env.bat		setup_env.bat
voiceforge.spec		voiceforge.spec

Folders and files

Latest commit

History

Repository files navigation

MellowMic

Table of Contents

What you'll need

Install — fast path

Install — manual

First run tutorial

Using MellowMic with Discord / OBS / Teams

Discord

OBS

Microsoft Teams

Zoom

The eight DSP voices

AI voice models (optional)

CLI mode

Customizing effects

VAD backends

Formant-preserving pitch

Building a single-exe

Troubleshooting

FAQ

Architecture

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages