Skip to content

Latest commit

 

History

History
62 lines (34 loc) · 3.33 KB

File metadata and controls

62 lines (34 loc) · 3.33 KB

Glossary

This glossary defines key terms and concepts used throughout the Voice Mode project. It serves as a reference for both human developers and AI assistants to ensure consistent terminology.

Why This Glossary Exists

  • Consistency: Ensures everyone uses the same terms with the same meanings
  • Clarity: Reduces confusion, especially around MCP-specific terminology
  • Onboarding: Helps new contributors quickly understand domain-specific language
  • AI Assistance: Provides LLMs with precise definitions to improve their understanding

How to Use This Glossary

  • For Humans: Reference this when you encounter unfamiliar terms or want to ensure you're using the right terminology
  • For LLMs: This file should be read at the start of each session to understand project-specific terminology
  • In Documentation: Link to glossary entries when introducing potentially unfamiliar terms

Base Directory: The root directory for all Voice Mode data, defaults to ~/.voicemode.

Conversation: A group of related exchanges that form a complete interaction, typically with less than 5 minutes between exchanges.

Endpoint: A specific URL where a provider's API is accessible (e.g., http://127.0.0.1:8880/v1).

Exchange: A single call-and-response interaction in voice mode. One user utterance and one assistant response.

Event Log: Structured log of voice interaction events used for debugging and performance analysis.

MCP Client: The LLM or AI assistant that connects to MCP servers. The client uses the tools and resources provided by servers.

MCP Host: The application that manages MCP connections between clients and servers. Examples: Claude Desktop, VS Code, Cursor. These are the AI coding assistants that users install Voice Mode into.

MCP (Model Context Protocol): The protocol that enables LLMs to interact with external tools and resources through a standardized interface.

MP3: Widely supported compressed audio format, good balance of size and compatibility.

Opus: Compressed audio format optimized for voice, good compression but can have quality issues with streaming.

PCM: Uncompressed audio format, best for real-time streaming with lowest latency.

Prompt: Pre-written instructions that help guide AI assistants in using MCP tools effectively.

Provider: A service that provides TTS (text-to-speech) or STT (speech-to-text) capabilities. Examples: OpenAI, Kokoro, Whisper.

MCP Server: A program that provides tools and resources via MCP. Voice Mode is an MCP server that provides voice interaction capabilities.

Resource: Data or content exposed by an MCP server that clients can read.

STT (Speech-to-Text): Converting spoken audio into written text.

Streaming: Playing audio as it arrives rather than waiting for the complete file, reducing latency.

Tool: A function exposed by an MCP server that clients can invoke. Voice Mode exposes tools like converse, listen_for_speech, etc.

Transcription: Text version of spoken audio, saved separately from audio files when enabled.

Transport: The method used for voice communication - either "local" (direct microphone) or "livekit" (room-based).

TTFA (Time to First Audio): The time between requesting TTS and when audio playback begins.

TTS (Text-to-Speech): Converting written text into spoken audio.