Skip to content

gauravsoodtech/Mellow-mic

Repository files navigation

MellowMic

MellowMic

Real-time voice styles for creators — local, low-latency, zero cloud.

License: MIT Python 3.11+ Platform: Windows Tests: 56 passing


MellowMic routes your microphone through a chain of low-latency voice effects, optional AI voice conversion, and VB-Cable so any app — Discord, OBS, Teams, Zoom, Twitch — can pick up your transformed voice without sending audio to a cloud API.

  • Local-only. Audio never leaves your machine.
  • 5 ms one-way latency at the default block size — usable on a live stream.
  • Eight built-in DSP voices plus optional licensed RVC ONNX models.
  • A real desktop UI — dark / light themes, live VU meters, latency display, drop counter.
  • Hot-swappable effects with a 40 ms crossfade — switch presets while you talk without a click or pop.

Note

MellowMic does not ship celebrity voices or any speaker identity. The built-in DSP presets are effects (deeper, brighter, robotic, etc.) — they cannot make you sound like a specific person. AI voice models must be supplied by the user and must be owned or licensed for the intended use.


Table of Contents

  1. What you'll need
  2. Install — fast path
  3. Install — manual
  4. First run tutorial
  5. Using MellowMic with Discord / OBS / Teams
  6. The eight DSP voices
  7. AI voice models (optional)
  8. CLI mode
  9. Customizing effects
  10. VAD backends
  11. Formant-preserving pitch
  12. Building a single-exe
  13. Troubleshooting
  14. FAQ
  15. Architecture
  16. Development
  17. License

What you'll need

OS Windows 10 or 11
Python 3.11 or 3.13 (3.12 also works)
Routing VB-Audio Virtual Cable — free download, install once
Headphones Required for live monitoring (so you can hear yourself)
GPU (optional) NVIDIA + CUDA for AI voice mode at low latency

Install — fast path

Clone the repo, double-click the launcher. The launcher creates .venv on first run and installs everything for you.

git clone https://github.com/gauravsoodtech/Mellow-mic.git
cd Mellow-mic
.\MellowMic.cmd

That's it. Use MellowMic-Debug.cmd if you want the console window to stay open so you can read errors during setup.

Install — manual

If you'd rather control the venv yourself:

py -3.11 -m venv .venv
.\.venv\Scripts\activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt   # for tests
python main.py

Optional AI dependencies (only if you have an RVC ONNX model bundle):

python -m pip install -r requirements-ai.txt

First run tutorial

  1. Install VB-Cable. Download from https://vb-audio.com/Cable/, run the installer as administrator, reboot. You should now have a CABLE Input output device and a CABLE Output input device in Windows sound settings.
  2. Launch MellowMic with MellowMic.cmd.
  3. In the Routing panel:
    • Microphone Input — your real mic.
    • MellowMic OutputCABLE Input (VB-Audio Virtual Cable). The app auto-detects this on first run.
    • Headphone Monitor — your headphones (so you can hear the processed voice). Toggle the monitor switch on.
  4. Click Start Processing.
  5. Talk. You should hear your voice in your headphones.
  6. Try a preset:
    • Start with Clean Bypass to confirm routing.
    • Switch to DSP Effects → Feminine DSP (or any other) and slide the Voice Blend slider — left is dry, right is fully processed.
  7. Open the app you want to use the voice in (Discord, OBS, Teams) and set its microphone to CABLE Output (VB-Audio Virtual Cable). That app now hears your processed voice.

That's the loop. Your settings — engine mode, active effect, blend level, devices, monitor toggle — persist between runs.


Using MellowMic with Discord / OBS / Teams

Once routing is working, point each app's microphone setting at CABLE Output:

Discord

  1. Settings → Voice & Video
  2. Input DeviceCABLE Output (VB-Audio Virtual Cable)
  3. Voice Processing → turn off "Echo Cancellation" and "Noise Suppression" (Discord's processing fights ours).

OBS

  1. Settings → Audio
  2. Mic / Auxiliary AudioCABLE Output (VB-Audio Virtual Cable)

Microsoft Teams

  1. Settings → Devices
  2. MicrophoneCABLE Output. Teams' built-in noise suppression will partially fight MellowMic's chain — set it to "Low" or off.

Zoom

  1. Settings → Audio
  2. MicrophoneCABLE Output
  3. Disable "Automatically adjust microphone volume" and "Suppress background noise" for predictable behavior.

The eight DSP voices

Preset What it does
Bypass Clean passthrough, no processing. Use to confirm routing.
Deep Voice -1.5 semitone shift + warm/smooth EQ + tanh saturation. Lower, broader.
Bright Voice +4.0 semitone shift, mostly wet. Playful chipmunk-style.
Feminine DSP +1.8 semitone shift + 4-stage EQ (HP 120 Hz, body, air, deess) + limiting. Low-latency feminine approximation.
Robot 72 Hz ring modulation, kept readable.
Radio 300–3400 Hz bandpass plus tanh distortion — AM-radio crackle.
Echo 280 ms delay, 38 % feedback, 55 % wet. Spacious.
Demon -3.0 semitone shift + 45 Hz ring mod + sub-bass emphasis. Dramatic growl.

The Voice Blend slider mixes dry and wet for every preset, so you can dial any effect from subtle to extreme.


AI voice models (optional)

For a true new-speaker voice you need an RVC ONNX model bundle. Drop bundles under models/ like:

models/
└── my_voice/
    ├── metadata.json     # see models/README.md for schema
    └── voice.onnx

Then in the app, switch to AI Model Voice, pick the model, and choose CUDAExecutionProvider if you have an NVIDIA GPU.

Important

AI models are NOT shipped with MellowMic. You must own or license any model you use. The app does not generate, train, or distribute speaker identities.


CLI mode

Useful for testing and scripting:

python main.py --cli --list-devices
python main.py --cli --engine dsp --effect feminine_dsp
python main.py --cli --engine bypass
python main.py --cli --list-models
python main.py --cli --engine ai --model-id my_voice --ai-provider CUDAExecutionProvider

Ctrl+C stops cleanly.


Customizing effects

The built-in DSP presets ship with hand-tuned constants. You can override them without editing Python by copying presets/effect_overrides.example.json to presets/effect_overrides.json:

{
  "deep":         { "pitch_semitones": -2.0 },
  "feminine_dsp": { "pitch_semitones":  2.4, "air_freq_hz": 5000.0 },
  "chipmunk":     { "pitch_semitones":  3.5 },
  "demon":        { "pitch_semitones": -4.0 }
}

Restart the app to pick up changes. Unknown keys and fields are silently ignored, so it is safe to leave the file in place across upgrades. The live effect_overrides.json is gitignored — only the .example.json template is checked in.


VAD backends

Voice activity detection gates silence so the effects chain only runs while you are actually speaking. Two backends:

# config.yaml
vad:
  backend: webrtc            # default — fast, no extra deps, good enough for most voices
  # backend: silero          # better on whispers / music / crosstalk; needs an ONNX model
  silero_model_path: models/vad/silero_vad.onnx
  silero_threshold: 0.5      # 0.0 = always speech, 1.0 = never speech

If backend: silero is set but the model file or onnxruntime is missing, MellowMic silently falls back to webrtcvad so audio always works.


Formant-preserving pitch

By default the pitch shifter uses rational resampling — fast but not formant-aware (deep voices sound a bit "tube-y"). Set this in config.yaml to use Pedalboard's PitchShift instead:

effects:
  formant_preserving_pitch: true

Pedalboard wraps Rubberband / SoundTouch under the hood, which is noticeably more natural for shifts greater than ±2 semitones at the cost of a small extra CPU hit. If Pedalboard isn't installed, MellowMic falls back to resampling automatically.


Building a single-exe

.\.venv\Scripts\activate
python -m pip install pyinstaller
python -m PyInstaller voiceforge.spec --noconfirm

Output: dist/VoiceForge.exe. CI also builds and uploads this on every v* tag — see .github/workflows/release.yml.


Troubleshooting

I hear nothing in my headphones.
  1. Did you click Start Processing? The big button must say STOP PROCESSING.
  2. Is the Headphone Monitor toggle on?
  3. Is the right output device selected as the monitor?
  4. Look at the Drops counter on the right panel — if it's racing up, your block size is too small for your CPU. Edit config.yaml: audio.block_size: 512.
Discord still hears my normal voice.

Discord caches the input device. Switch its Input Device to something else, then back to CABLE Output. Or restart Discord.

VB-Cable is not detected.

The status chip at the top right shows VB-Cable not found. Reinstall VB-Cable as administrator and reboot. After reboot, both CABLE Input and CABLE Output should appear in Windows Sound settings.

Latency is too high.

Edit config.yaml:

audio:
  block_size: 128   # default 256 (~5.3 ms), drop to 128 (~2.7 ms) on fast CPUs
  ring_buffer_depth: 4   # default 8, lower = less buffer

If drops spike, go back up. The trade-off is real on slower hardware.

"Audio start failed" toast on launch.

The selected input or output device disappeared. Re-pick a device in the Routing panel, then click Start. The new typed AudioStartError carries the original cause — check the bottom status bar for the underlying message.

I switched effects and heard a click.

You shouldn't — engine swaps run through a 40 ms crossfade. If you do hear a click, the most likely cause is changing the block size or sample rate mid-session, which forces a full audio-engine restart. Stop, change the value, then start again.


FAQ

Does MellowMic send audio anywhere? No. Every effect runs locally on your CPU (and optionally GPU for AI models). There is no cloud component, no telemetry, no API key.

Can it imitate a specific person's voice? Not with the built-in DSP presets — those are effects, not speaker identities. AI mode can, but only with an RVC ONNX model that you supply. Use only models you own or are licensed to use.

Does it work on macOS or Linux? Not yet. The audio path is Windows-only via WASAPI + VB-Cable. macOS BlackHole / Linux PulseAudio support is on the roadmap — stubbed in core/routing.py.

Why Python and not C++? Real-time audio in Python works fine when the hot path is NumPy / SciPy / sounddevice (PortAudio) — those release the GIL and run native code. The pure-Python overhead per block is well under a millisecond at 256 samples.

Can I add my own effect? Right now you have to add a method to EffectsChain in core/effects.py and register it in PRESET_INFO. A plugin manifest system is on the roadmap.


Architecture

mic ──► [WASAPI duplex stream]
            │
            ▼
       [input ring buffer]
            │
            ▼
   ┌────────────────────────────────┐
   │       Processor Chain          │
   │                                │
   │  VAD ──► Noise Suppressor ──►  │
   │   │                            │
   │   ▼                            │
   │  Voice Engine (DSP / AI / Bypass)
   │   │                            │
   │   ▼                            │
   │  Output Limiter                │
   └────────────┬───────────────────┘
                ▼
       [output ring buffer]
                │
                ▼
   [WASAPI duplex stream]
            │
            ├──► VB-Cable ──► Discord / OBS / Teams
            └──► Headphone monitor (optional separate stream)

Module overview:

Module Responsibility
core/audio_io.py WASAPI duplex stream, ring buffers, RT-safe processing thread, AudioStartError. Read the module docstring before changing the audio path.
core/processor_chain.py Composable VAD → noise suppressor → engine → limiter chain, plus CrossfadeProcessor for hot-swap.
core/effects.py Eight built-in DSP presets and the EffectsChain state machine.
core/engines/ BypassVoiceEngine, DspVoiceEngine, RvcOnnxVoiceEngine — all share a process() / set_params() / set_enabled() / close() / stats() protocol.
core/vad.py + core/vad_silero.py webrtcvad and Silero ONNX backends behind a single make_vad() factory.
core/noise_suppression.py 80 Hz HPF → spectral denoise (noisereduce) → expander gate.
core/model_registry.py Discovery and validation of licensed RVC ONNX bundles under models/.
ui/main_window.py PyQt5 control-room UI — header, mode rail, effects, routing, diagnostics. Settings persistence via ui/theme.py.
presets/ effect_overrides.py user-tweakable preset constants + Phase 2 PresetConfig dataclass for future neural pipeline.

Development

Run the test suite:

.\.venv\Scripts\python.exe -m pytest tests -q --basetemp .pytest-tmp

Currently 56 passing tests covering: real-time safety contract (lock under stress, passthrough on processor exception, no print() on the audio thread, AudioStartError cause chain, engine close()), Tier 2 polish (crossfade ramp shape, NoiseSuppressor reset-on-toggle, UI settings merge, VAD scratch buffer reuse), and Tier 3 features (Silero fallback, override JSON parsing, EffectsChain pitch override path).

If you change core/audio_io.py, re-read the module docstring — the audio-thread safety contract (no blocking I/O, no zero-fill on error, lock around processor swap) is load-bearing and easy to regress.


License

MIT — Copyright (c) 2026 Gaurav Sood.

About

Real-time voice styles for creators — local, low-latency, zero cloud. Route your mic through DSP effects or RVC ONNX models into Discord, OBS, Teams.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors