Voice Mode Integration: Gemini CLI

🔗 Official Repository: Gemini CLI on GitHub
📦 Download/Install: npm install -g @google/gemini-cli
🏷️ Version Requirements: Gemini CLI v0.1.0+

Overview

Gemini CLI is Google's command-line interface for their Gemini AI models, offering a Claude Code-like experience with generous free tier (1000 requests/day, 1M token context). Voice Mode enhances Gemini CLI by adding natural voice conversation capabilities, allowing you to speak your coding requests and hear Gemini's responses.

Prerequisites

Gemini CLI installed (npm install -g @google/gemini-cli) - See npm setup guide to avoid sudo
Python 3.10 or higher
uv package manager (curl -LsSf https://astral.sh/uv/install.sh | sh)
OpenAI API key (or compatible service for STT/TTS)
System audio dependencies installed (see main README)

Quick Start

# Install Voice Mode
uvx voice-mode

# Set your OpenAI API key (for voice services)
export OPENAI_API_KEY="your-openai-key"

# Configure Gemini CLI with Voice Mode
gemini-cli config set mcp.voice-mode.command "uvx voice-mode"

# Start voice-enabled Gemini CLI
gemini-cli

Installation Steps

1. Install Gemini CLI

macOS/Linux Users: To avoid using sudo with npm, see our npm setup guide

# Install globally via npm
npm install -g @google/gemini-cli

# Configure with your Gemini API key
gemini-cli auth login

2. Add Voice Mode to Gemini CLI

Find Gemini CLI's configuration file:

macOS/Linux: ~/.gemini/settings.json
Windows: %USERPROFILE%\.gemini\settings.json

Add the Voice Mode MCP server to your existing configuration:

{
  "mcpServers": {
    "voice-mode": {
      "command": "uvx",
      "args": [
        "voice-mode"
      ]
    }
  }
}

3. Environment Variables (Optional)

For advanced configuration:

# Required for voice services
export OPENAI_API_KEY="your-key"

# Optional - Use local services
export VOICEMODE_TTS_BASE_URL="http://127.0.0.1:8880/v1"
export VOICEMODE_STT_BASE_URL="http://127.0.0.1:2022/v1"

# Optional - Preferred voice
export VOICEMODE_TTS_VOICE="nova"

Verification

Check Installation:

# Verify Gemini CLI
gemini-cli --version

# Test Voice Mode directly
uvx voice-mode --help

Test Voice Mode:

# Start Gemini CLI
gemini-cli

# Try voice commands:
# "Hello, can you hear me?"
# "Let's have a voice conversation"

Verify MCP Connection:
- Check Gemini CLI logs for MCP server initialization
- Look for "voice-mode" in active servers list

Usage Examples

Basic Voice Conversation

# In Gemini CLI:
"Hey Gemini, let's talk about this code"
"Can you explain this error message?"
"What's the best approach for this feature?"

Voice-Enabled Code Review

# Navigate to your project
cd my-project

# Start Gemini CLI
gemini-cli

# Voice commands:
"Review this pull request"
"Suggest improvements for this function"
"Help me write tests for this module"

Troubleshooting

Voice Mode Not Recognized

Ensure MCP support is enabled in Gemini CLI
Check configuration file syntax
Try running uvx voice-mode directly to test

Audio Issues

Grant terminal microphone permissions
Check system audio settings
Run: VOICEMODE_DEBUG=true gemini-cli for detailed logs

Performance Considerations

Gemini CLI may be slower than Claude Code
Voice processing adds ~2-3s latency
Consider using local STT/TTS for better performance

Platform-Specific Notes

macOS

Grant terminal app microphone access in System Preferences
Gemini CLI config may be in ~/Library/Application Support/gemini-cli/

Linux

Ensure PulseAudio/PipeWire is running
Check audio permissions: groups $USER | grep audio

Windows

Best experience with WSL2
Native Windows support may have audio limitations

Advanced Configuration

Using Local STT/TTS Services

For privacy and offline usage:

Start local services:

# Kokoro TTS
docker run -p 8880:8880 kokoro-tts

# Whisper STT
whisper-cpp-server -p 2022

Configure endpoints:

{
  "env": {
    "VOICEMODE_TTS_BASE_URL": "http://127.0.0.1:8880/v1",
    "VOICEMODE_STT_BASE_URL": "http://127.0.0.1:2022/v1",
    "VOICEMODE_PREFER_LOCAL": "true"
  }
}

Optimizing for Gemini's Context Window

Take advantage of Gemini's 1M token context:

# Enable verbose mode for large codebases
export VOICEMODE_DEBUG="trace"

# Increase audio duration for longer responses
export VOICEMODE_LISTEN_DURATION="30"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voice Mode Integration: Gemini CLI

Overview

Prerequisites

Quick Start

Installation Steps

1. Install Gemini CLI

2. Add Voice Mode to Gemini CLI

3. Environment Variables (Optional)

Verification

Usage Examples

Basic Voice Conversation

Voice-Enabled Code Review

Troubleshooting

Voice Mode Not Recognized

Audio Issues

Performance Considerations

Platform-Specific Notes

macOS

Linux

Windows

Advanced Configuration

Using Local STT/TTS Services

Optimizing for Gemini's Context Window

See Also

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Voice Mode Integration: Gemini CLI

Overview

Prerequisites

Quick Start

Installation Steps

1. Install Gemini CLI

2. Add Voice Mode to Gemini CLI

3. Environment Variables (Optional)

Verification

Usage Examples

Basic Voice Conversation

Voice-Enabled Code Review

Troubleshooting

Voice Mode Not Recognized

Audio Issues

Performance Considerations

Platform-Specific Notes

macOS

Linux

Windows

Advanced Configuration

Using Local STT/TTS Services

Optimizing for Gemini's Context Window

See Also