This repo contains production-ready voice agent examples using Plivo telephony. Every example follows the same structure regardless of AI API or orchestration approach.
{llm-provider+series}-{stt-provider+series}-{tts-provider+series}-{orchestration}[-{variant}]
Every component always includes provider name + model series. The series identifies the API contract; the size variant (mini/nano/pro/flash) is config in .env, not part of the folder name.
Drop the size class (mini/nano/pro/flash) — it's .env config. Only include size when two different sizes are used together in the same example.
| Model | Folder component | Notes |
|---|---|---|
gpt-5.4-mini |
gpt5.4 |
drop "mini" |
gpt-4.1 |
gpt4.1 |
|
gpt-4.1-mini (alone) |
gpt4.1 |
drop "mini" |
gpt-4.1-mini + gpt-4.1 (dual) |
gpt4.1mini-gpt4.1 |
two sizes → keep both |
gpt-4o-mini |
gpt4o |
drop "mini" |
gemini-2.0-flash |
gemini2 |
drop "flash" |
gemini-2.5-flash (live API) |
gemini2.5-live |
drop "flash"; -live = S2S API type |
gemini-3.1-flash (live API) |
gemini3.1-live |
drop "flash"; -live = S2S API type |
gpt-realtime-1.5 (S2S) |
gptrealtime1.5 |
"realtime" is the model name |
grok-3-fast-voice (S2S) |
grok3-voice |
| Model | Folder component |
|---|---|
Deepgram nova-2-phonecall |
deepgramnova2 |
Deepgram nova-3 |
deepgramnova3 |
Deepgram flux |
deepgramflux |
AssemblyAI u3-rt-pro |
assemblyaiu3 |
| Sarvam STT | sarvam (no named model series) |
| Model | Folder component |
|---|---|
ElevenLabs eleven_flash_v2_5 |
elevenflashv2.5 |
Cartesia sonic-2 |
cartesiasonic2 |
Cartesia sonic-3 |
cartesiasonic3 |
OpenAI gpt-4o-mini-tts |
openaitts4o |
Grok grok-3-fast-voice (TTS only) |
groktts3 |
gpt5.4-assemblyaiu3-cartesiasonic3-native, gemini2.5-live-pipecat, gpt4.1-deepgramnova3-elevenflashv2.5-native
Orchestration types:
- native — raw websockets/SDK, custom asyncio task management, client-side Silero VAD (default)
- pipecat / livekit / vapi — framework-based Pipeline, framework-managed VAD
Variants:
-no-vad— explicitly opts out of client-side VAD (e.g.,gemini2.5-live-native-no-vadrelies on server-side VAD)-webrtcvad— uses WebRTC VAD instead of Silero (e.g.,gemini2.5-live-native-webrtcvad)- All new native examples include Silero VAD by default. These suffixes are the exception, not the rule.
{example-name}/
├── inbound/
│ ├── __init__.py
│ ├── agent.py # AI-specific voice agent class (or framework pipeline)
│ ├── server.py # FastAPI: /answer, /ws, /hangup
│ └── system_prompt.md # System prompt for inbound calls
├── outbound/
│ ├── __init__.py
│ ├── agent.py # Same agent class + OutboundCallRecord, CallManager
│ ├── server.py # FastAPI: /outbound/call, /outbound/ws, etc.
│ └── system_prompt.md # System prompt for outbound calls
├── utils.py # Audio conversion, VAD (if native), phone utils
├── tests/
│ ├── __init__.py
│ ├── conftest.py # sys.path setup (copy from grok3-voice-native)
│ ├── helpers.py # ngrok, recording, transcription (copy from grok3-voice-native)
│ ├── test_integration.py # Unit + local integration tests
│ ├── test_e2e_live.py # E2E with real API (no phone call)
│ ├── test_live_call.py # Real inbound call test
│ ├── test_multiturn_voice.py # Multi-turn conversation test
│ └── test_outbound_call.py # Real outbound call test
├── pyproject.toml
├── .env.example # Leading dot (industry standard)
├── .gitignore
├── .pre-commit-config.yaml
├── Dockerfile
└── README.md
No exceptions. S2S, pipeline, and framework examples all use this structure.
Constants live where they are consumed:
server.py owns (duplicated in inbound/outbound — each file is self-contained):
SERVER_PORT,PLIVO_AUTH_ID,PLIVO_AUTH_TOKEN,PLIVO_PHONE_NUMBER,PUBLIC_URL
agent.py owns:
- API keys, model names, voice names, API URLs
PLIVO_CHUNK_SIZE = 160(used in_send_to_plivo)SYSTEM_PROMPT(loaded fromsystem_prompt.md)
utils.py owns only what its functions consume:
- Audio sample rates:
PLIVO_SAMPLE_RATE,{API}_SAMPLE_RATE,VAD_SAMPLE_RATE - VAD params (native only):
VAD_START_THRESHOLD,VAD_END_THRESHOLD,VAD_MIN_SILENCE_MS,VAD_CHUNK_SAMPLES DEFAULT_COUNTRY_CODE
Only utility functions and their internal constants. No server or agent config.
Required functions:
ulaw_to_pcm(ulaw_data: bytes) -> bytes— G.711 decode tablepcm_to_ulaw(pcm_data: bytes) -> bytes— G.711 encoderesample_audio(audio_data: bytes, input_rate: int, output_rate: int) -> bytesplivo_to_{api}(mulaw_8k: bytes) -> bytes— Plivo audio to API format{api}_to_plivo(pcm: bytes) -> bytes— API audio to Plivo formatnormalize_phone_number(phone: str, default_region: str) -> str
For native examples, also:
plivo_to_vad(mulaw_8k: bytes) -> np.ndarray— float32 16kHz for SileroSileroVADProcessorclass (reference:grok3-voice-native/utils.py)
For framework examples: no VAD in utils (framework handles it).
Native examples: client-side Silero VAD (SileroVADProcessor).
- VAD runs in
plivo_rxtask alongside audio forwarding - Speech start during AI response triggers barge-in (
response.cancelor equivalent) - Speech end triggers turn commit (
input_audio_buffer.commit+response.createor equivalent) - Reference:
grok3-voice-native/utils.py(SileroVADProcessor),grok3-voice-native/inbound/agent.py(integration)
Framework examples (Pipecat/LiveKit): use vad_enabled=True in transport params. No separate Silero.
PLIVO_CHUNK_SIZE = 160— exactly 20ms at 8kHz mono μ-law. Defined inagent.py._send_to_plivo().- Plivo WebSocket sends/receives base64 μ-law at 8kHz
- playAudio JSON format:
{"event": "playAudio", "media": {"contentType": "audio/x-mulaw", "sampleRate": 8000, "payload": "<base64>"}} - Answer webhook returns
<Stream>XML:bidirectional=True,keepCallAlive=True,contentType="audio/x-mulaw;rate=8000"
Native orchestration: custom agent class with these methods:
__init__,run(),_run_streaming_tasks()(3 concurrent tasks)_receive_from_plivo()— plivo_rx: decode audio, run VAD, forward to API_receive_from_{api}()— api_rx: receive API events, queue audio for plivo_send_to_plivo()— plivo_tx: chunk audio to 160 bytes, send playAudio- Public
run_agent()function wraps class instantiation
Framework orchestration: run_agent() function assembles Pipeline. No custom class needed.
Pipecat PipelineRunner signal handling:
- Use
PipelineRunner()(defaulthandle_sigterm=False) when running inside uvicorn. - Do NOT use
PipelineRunner(handle_sigterm=True)— it callsloop.add_signal_handler(signal.SIGTERM, ...)in__init__, which replaces uvicorn's SIGTERM handler. After the pipeline finishes, uvicorn's handler is never restored, so uvicorn never receives a shutdown signal and the process hangs indefinitely. handle_sigterm=Trueis only appropriate for standalone scripts where PipelineRunner owns the process lifecycle.- PipelineRunner idle timeout is 300s, cancel timeout is 20s — relevant for shutdown timing.
- Plivo sends
{"event": "start", "start": {"callId": "...", "streamId": "..."}}— handle first - Plivo sends
{"event": "media", "media": {"payload": "<base64 μ-law>"}}— audio data - Plivo sends
{"event": "stop"}— call ended - Agent sends
{"event": "playAudio", "media": {...}}— response audio - Agent sends
{"event": "clearAudio"}— on barge-in to stop playback
# Task management — always use this pattern
tasks = [
asyncio.create_task(self._receive_from_plivo(), name="plivo_rx"),
asyncio.create_task(self._receive_from_{api}(ws), name="{api}_rx"),
asyncio.create_task(self._send_to_plivo(), name="plivo_tx"),
]
try:
done, _pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
for task in done:
if task.exception():
logger.error(f"Task {task.get_name()} failed: {task.exception()}")
finally:
self._running = False
for task in tasks:
if not task.done():
task.cancel()
with contextlib.suppress(asyncio.CancelledError):
await taskNote: _pending with underscore prefix avoids RUF059 lint warning.
- Always use
uv— neverpip,pip install, orpython -m pip - Each example has its own virtualenv (
.venv/inside the example directory) uv syncto install deps,uv add {pkg}to add new deps,uv runto execute commands- All commands run through
uv run:uv run pytest,uv run ruff check .,uv run python -m inbound.server uv.lockis committed to git for reproducible builds
Every example must include [project.optional-dependencies] with observability and streaming extras (reference: gpt4.1-sarvam-elevenflashv2.5-native/pyproject.toml). The Dockerfile's uv sync command must include --extra streaming so Redis is available at runtime. If pyproject.toml defines a streaming extra but the Dockerfile omits --extra streaming, the container will fail at runtime when streaming features are used.
- Never commit directly to
main. Always create a feature branch first:git checkout -b {example-name}(orgit checkout -b fix/{description}for fixes) - Push to the
forkremote (notorigin, which has IP restrictions):git push -u fork {branch-name} - Open a PR from the fork branch to
origin/mainwhen ready.
from __future__ import annotationsat top of every.pyfilelogurufor logging (not stdliblogging)- No hardcoded API keys — always
os.getenv() python-dotenvwithload_dotenv()at module level- All imports lazy where heavy (e.g.,
import torchinside methods)
Ruff with: select = ["E", "W", "F", "I", "B", "UP", "SIM", "RUF"], line-length = 100, target-version = "py310"
Run: uv run ruff check .
Unit tests (-k "unit"): offline, no API keys needed
TestUnitAudioConversion: ulaw↔pcm roundtrip, silence detectionTestUnitPhoneNormalization: E.164 formatting
Local integration (-k "local"): starts server subprocess, tests WebSocket flow with real API
TestLocalIntegration: health check, answer webhook XML, WebSocket audio flow
E2E live call tests: real Plivo calls, recording, transcription
test_live_call.py: inbound call → greeting verificationtest_outbound_call.py: outbound call → greeting verificationtest_multiturn_voice.py: multi-turn + barge-in verification
Test infra: conftest.py sets sys.path, helpers.py has ngrok/recording/transcription utils.
Server subprocess teardown in server_process fixture — always use SIGTERM with SIGKILL fallback:
os.kill(proc.pid, signal.SIGTERM)
try:
proc.wait(timeout=5)
except subprocess.TimeoutExpired:
proc.kill()
proc.wait()Pipecat servers may not exit on SIGTERM alone when a PipelineRunner has been active (see "Pipecat PipelineRunner signal handling" above). Native servers typically exit cleanly on SIGTERM, but the fallback pattern is safe for all examples.
Run: uv run pytest tests/test_integration.py -v -k "unit" (offline)
- Primary reference:
grok3-voice-native/— complete native example with Silero VAD grok3-voice-native/utils.py— SileroVADProcessor class, audio conversiongrok3-voice-native/inbound/agent.py— native agent pattern with VAD + barge-ingrok3-voice-native/outbound/agent.py— OutboundCallRecord, CallManager patterngrok3-voice-native/tests/— full test suite to replicategemini2.5-live-native-no-vad/— alternative native pattern (SDK-based, server-side VAD, no client-side VAD)gemini2.5-live-pipecat/inbound/agent.py— framework Pipeline reference
The text between H1 (#) and the first H2 (##) in each README is displayed as the demo description in the hosting app. It must be 5 lines or fewer but pack maximum technical detail. Use gpt4.1mini-sarvam-elevenlabs-native/README.md as the reference.
Text and bullet lists only — no tables, no diagrams, no code blocks between H1 and first H2. Write a dense description (≤5 lines) that traces the full pipeline from telephony input to audio output, naming every component along the way. Include:
- Orchestration approach (native/framework)
- Each component: service name, model/engine, protocol (WS/HTTP), audio format, sample rate, region
- VAD: engine, frame size, threshold values with empirical tuning rationale (echo vs speech probability ranges)
- Barge-in: what gets cancelled and what event is sent
- Any notable audio conversions (resample or no-resample)
- No vague descriptions ("production-ready", "best-in-class") — every word should be a technical fact
- No tables, diagrams, or code blocks — the hosting app doesn't render them properly. Bullet lists are fine
- Do not include observed latency — that belongs in the detailed sections below
- The rest of the README (after the first H2) can use tables, diagrams, and full detail
/scaffold-example {name} {description} # Phase 1: directory structure + boilerplate
/implement-agent {name} {api-docs-url} # Phase 2: write agent.py + utils.py
/test-example {name} # Phase 3: create tests + run them
/review-example {name} # Phase 4: quality gate checklist
/document-example {name} # Phase 5: README + .env.example validation
Each phase gets a fresh context window. Run sequentially.
./scripts/validate-example.sh {example-name}Exit 0 = pass, exit 1 = fail. Checks structure, lint, unit tests, config placement.