Python Voice Agent Examples

A collection of production-ready voice AI agent examples built with Plivo. Each example demonstrates a different combination of AI models and frameworks for building real-time phone-based voice agents on the Plivo voice AI platform.

How It Works

All examples follow the same general pattern:

┌─────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Phone  │────▶│   Plivo     │────▶│   Server    │────▶│  AI Agent   │
│  Call   │◀────│  (Voice AI) │◀────│  (FastAPI)  │◀────│             │
└─────────┘     └─────────────┘     └─────────────┘     └─────────────┘

A phone call comes in (or is initiated) through Plivo
Plivo hits a webhook on your FastAPI server
The server establishes a bidirectional WebSocket for audio streaming
The AI agent processes speech and generates responses in real-time

Examples

Each example directory is self-contained with its own dependencies, environment configuration, and documentation. Directory names follow the convention {llm}-{stt}-{tts}-{framework} (see CONTRIBUTING.md for details).

Speech-to-Speech (S2S)

These examples use models that handle both speech input and output natively — the simplest architecture with the fewest moving parts.

Example	Model	Framework	Highlights
gemini-live-native	Gemini Live	None	Direct API integration, function calling, auto-webhook config
gemini-live-pipecat	Gemini Live	Pipecat	Modular pipeline, built-in VAD, less code
gpt-realtime-native	GPT Realtime 1.5	None	Silero VAD, barge-in support, function calling
grok-voice-native	Grok Voice	None	Silero VAD, barge-in support, function calling

STT + LLM + TTS Pipeline

These examples wire up separate providers for speech-to-text, language model, and text-to-speech — offering more flexibility to mix and match.

Example	STT	LLM	TTS	Framework
gemini-deepgram-cartesia-native	Deepgram	Gemini	Cartesia	None
gemini-deepgram-elevenlabs-native	Deepgram	Gemini	ElevenLabs	None
gpt4o-deepgram-openaitts-pipecat	Deepgram	GPT-4o-mini	GPT-4o-mini-TTS	Pipecat
daily-plivo	Deepgram	OpenAI	Cartesia	Pipecat + Daily

Prerequisites

Python 3.10+ (3.12 recommended)
uv package manager (recommended) or pip
ngrok for local development
A Plivo account with a phone number
API keys for the AI services used by your chosen example

Quick Start

Choose an example from the tables above and navigate to its directory:
```
cd gemini-live-native  # or any other example
```

Install dependencies:

uv sync
# or: uv pip install -r requirements.txt

Configure environment variables:
```
cp .env.example .env
```
Edit .env with your API keys and Plivo credentials.

Start ngrok (in a separate terminal):

ngrok http 8000  # port varies by example

Update PUBLIC_URL in .env with your ngrok HTTPS URL.

Run the server:

uv run python server.py  # entry point varies by example

Call your Plivo phone number to talk to the agent.

See each example's README for detailed setup and configuration.

Contributing

Want to add a new voice agent example? See CONTRIBUTING.md for the project naming convention, required structure, and submission guidelines.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Voice Agent Examples

How It Works

Examples

Speech-to-Speech (S2S)

STT + LLM + TTS Pipeline

Prerequisites

Quick Start

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Python Voice Agent Examples

How It Works

Examples

Speech-to-Speech (S2S)

STT + LLM + TTS Pipeline

Prerequisites

Quick Start

Contributing

License