Skip to content

Latest commit

 

History

History
214 lines (159 loc) · 5.47 KB

File metadata and controls

214 lines (159 loc) · 5.47 KB

Voice Mode Integration: Gemini CLI

🔗 Official Repository: Gemini CLI on GitHub
📦 Download/Install: npm install -g @google/gemini-cli
🏷️ Version Requirements: Gemini CLI v0.1.0+

Overview

Gemini CLI is Google's command-line interface for their Gemini AI models, offering a Claude Code-like experience with generous free tier (1000 requests/day, 1M token context). Voice Mode enhances Gemini CLI by adding natural voice conversation capabilities, allowing you to speak your coding requests and hear Gemini's responses.

Prerequisites

  • Gemini CLI installed (npm install -g @google/gemini-cli) - See npm setup guide to avoid sudo
  • Python 3.10 or higher
  • uv package manager (curl -LsSf https://astral.sh/uv/install.sh | sh)
  • OpenAI API key (or compatible service for STT/TTS)
  • System audio dependencies installed (see main README)

Quick Start

# Install Voice Mode
uvx voice-mode

# Set your OpenAI API key (for voice services)
export OPENAI_API_KEY="your-openai-key"

# Configure Gemini CLI with Voice Mode
gemini-cli config set mcp.voice-mode.command "uvx voice-mode"

# Start voice-enabled Gemini CLI
gemini-cli

Installation Steps

1. Install Gemini CLI

macOS/Linux Users: To avoid using sudo with npm, see our npm setup guide

# Install globally via npm
npm install -g @google/gemini-cli

# Configure with your Gemini API key
gemini-cli auth login

2. Add Voice Mode to Gemini CLI

Find Gemini CLI's configuration file:

  • macOS/Linux: ~/.gemini/settings.json
  • Windows: %USERPROFILE%\.gemini\settings.json

Add the Voice Mode MCP server to your existing configuration:

{
  "mcpServers": {
    "voice-mode": {
      "command": "uvx",
      "args": [
        "voice-mode"
      ]
    }
  }
}

3. Environment Variables (Optional)

For advanced configuration:

# Required for voice services
export OPENAI_API_KEY="your-key"

# Optional - Use local services
export VOICEMODE_TTS_BASE_URL="http://127.0.0.1:8880/v1"
export VOICEMODE_STT_BASE_URL="http://127.0.0.1:2022/v1"

# Optional - Preferred voice
export VOICEMODE_TTS_VOICE="nova"

Verification

  1. Check Installation:

    # Verify Gemini CLI
    gemini-cli --version
    
    # Test Voice Mode directly
    uvx voice-mode --help
  2. Test Voice Mode:

    # Start Gemini CLI
    gemini-cli
    
    # Try voice commands:
    # "Hello, can you hear me?"
    # "Let's have a voice conversation"
  3. Verify MCP Connection:

    • Check Gemini CLI logs for MCP server initialization
    • Look for "voice-mode" in active servers list

Usage Examples

Basic Voice Conversation

# In Gemini CLI:
"Hey Gemini, let's talk about this code"
"Can you explain this error message?"
"What's the best approach for this feature?"

Voice-Enabled Code Review

# Navigate to your project
cd my-project

# Start Gemini CLI
gemini-cli

# Voice commands:
"Review this pull request"
"Suggest improvements for this function"
"Help me write tests for this module"

Troubleshooting

Voice Mode Not Recognized

  • Ensure MCP support is enabled in Gemini CLI
  • Check configuration file syntax
  • Try running uvx voice-mode directly to test

Audio Issues

  • Grant terminal microphone permissions
  • Check system audio settings
  • Run: VOICEMODE_DEBUG=true gemini-cli for detailed logs

Performance Considerations

  • Gemini CLI may be slower than Claude Code
  • Voice processing adds ~2-3s latency
  • Consider using local STT/TTS for better performance

Platform-Specific Notes

macOS

  • Grant terminal app microphone access in System Preferences
  • Gemini CLI config may be in ~/Library/Application Support/gemini-cli/

Linux

  • Ensure PulseAudio/PipeWire is running
  • Check audio permissions: groups $USER | grep audio

Windows

  • Best experience with WSL2
  • Native Windows support may have audio limitations

Advanced Configuration

Using Local STT/TTS Services

For privacy and offline usage:

  1. Start local services:

    # Kokoro TTS
    docker run -p 8880:8880 kokoro-tts
    
    # Whisper STT
    whisper-cpp-server -p 2022
  2. Configure endpoints:

    {
      "env": {
        "VOICEMODE_TTS_BASE_URL": "http://127.0.0.1:8880/v1",
        "VOICEMODE_STT_BASE_URL": "http://127.0.0.1:2022/v1",
        "VOICEMODE_PREFER_LOCAL": "true"
      }
    }

Optimizing for Gemini's Context Window

Take advantage of Gemini's 1M token context:

# Enable verbose mode for large codebases
export VOICEMODE_DEBUG="trace"

# Increase audio duration for longer responses
export VOICEMODE_LISTEN_DURATION="30"

See Also


Need Help? Join our Discord community or post in r/GeminiCLI