Glossary

This glossary defines key terms and concepts used throughout the Voice Mode project. It serves as a reference for both human developers and AI assistants to ensure consistent terminology.

Why This Glossary Exists

Consistency: Ensures everyone uses the same terms with the same meanings
Clarity: Reduces confusion, especially around MCP-specific terminology
Onboarding: Helps new contributors quickly understand domain-specific language
AI Assistance: Provides LLMs with precise definitions to improve their understanding

How to Use This Glossary

For Humans: Reference this when you encounter unfamiliar terms or want to ensure you're using the right terminology
For LLMs: This file should be read at the start of each session to understand project-specific terminology
In Documentation: Link to glossary entries when introducing potentially unfamiliar terms

Base Directory: The root directory for all Voice Mode data, defaults to ~/.voicemode.

Conversation: A group of related exchanges that form a complete interaction, typically with less than 5 minutes between exchanges.

Endpoint: A specific URL where a provider's API is accessible (e.g., http://127.0.0.1:8880/v1).

Exchange: A single call-and-response interaction in voice mode. One user utterance and one assistant response.

Event Log: Structured log of voice interaction events used for debugging and performance analysis.

MCP Client: The LLM or AI assistant that connects to MCP servers. The client uses the tools and resources provided by servers.

MCP Host: The application that manages MCP connections between clients and servers. Examples: Claude Desktop, VS Code, Cursor. These are the AI coding assistants that users install Voice Mode into.

MCP (Model Context Protocol): The protocol that enables LLMs to interact with external tools and resources through a standardized interface.

MP3: Widely supported compressed audio format, good balance of size and compatibility.

Opus: Compressed audio format optimized for voice, good compression but can have quality issues with streaming.

PCM: Uncompressed audio format, best for real-time streaming with lowest latency.

Prompt: Pre-written instructions that help guide AI assistants in using MCP tools effectively.

Provider: A service that provides TTS (text-to-speech) or STT (speech-to-text) capabilities. Examples: OpenAI, Kokoro, Whisper.

MCP Server: A program that provides tools and resources via MCP. Voice Mode is an MCP server that provides voice interaction capabilities.

Resource: Data or content exposed by an MCP server that clients can read.

STT (Speech-to-Text): Converting spoken audio into written text.

Streaming: Playing audio as it arrives rather than waiting for the complete file, reducing latency.

Tool: A function exposed by an MCP server that clients can invoke. Voice Mode exposes tools like converse, listen_for_speech, etc.

Transcription: Text version of spoken audio, saved separately from audio files when enabled.

Transport: The method used for voice communication - either "local" (direct microphone) or "livekit" (room-based).

TTFA (Time to First Audio): The time between requesting TTS and when audio playback begins.

TTS (Text-to-Speech): Converting written text into spoken audio.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glossary

Why This Glossary Exists

How to Use This Glossary

FilesExpand file tree

GLOSSARY.md

Latest commit

History

GLOSSARY.md

File metadata and controls

Glossary

Why This Glossary Exists

How to Use This Glossary