Skip to content

Commit 0be79be

Browse files
committed
Update SDK to latest agent API: new event types, prebuilt tools, provider enhancements
- Migrate all event type strings to new dotted convention (e.g. session.started, tool.called, user.transcription, agent.speech_interrupted) - Add 12 new event types: DtmfSent, AgentToolStarted/Completed/Failed, UserStateChanged, AgentStateChanged, AgentSpeechCreated/Started/Completed, AgentFalseInterruption, ToolExecuted, LlmAvailabilityChanged - Unify transfer_to_number/transfer_to_sip into single transfer() method - Add send_dtmf() and handoff() session methods - Add semantic_vad eagerness presets (high/medium/low/auto) with expansion - Add prebuilt tools module: EndCall, SendDtmf, WarmTransfer + 7 agent tools (CollectEmail, CollectAddress, CollectPhone, CollectName, CollectDOB, CollectDigits, CollectCreditCard) - Update tool.result/tool.error message types - Add provider validation docs, base_url/region support, updated provider lists - Add 3 new examples: full_pipeline_fallback, prebuilt_tools, pipeline_mcp - Update all 15 existing examples to new event types and API - Expand test suite from 73 to 87 tests covering all new functionality
1 parent 757ad02 commit 0be79be

26 files changed

Lines changed: 2854 additions & 338 deletions

CLAUDE.md

Lines changed: 82 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,24 +16,25 @@ src/plivo_agentstack/
1616
utils.py # Webhook signature validation (v3)
1717
agent/ # Voice AI Agent stack
1818
app.py # VoiceApp WebSocket server
19-
client.py # Agent REST client (agents, calls, numbers, sessions)
20-
events.py # 25+ typed event dataclasses + parse_event()
19+
client.py # Agent REST client (agents, calls, numbers, sessions) + semantic_vad presets
20+
events.py # 38 typed event dataclasses + parse_event()
2121
session.py # Per-connection Session handle
22+
tools.py # Prebuilt tools: EndCall, SendDtmf, WarmTransfer, Collect* agent tools
2223
messaging/ # SMS/MMS/WhatsApp
2324
client.py # MessagesClient
2425
templates.py # WhatsApp Template builder
2526
interactive.py # InteractiveMessage + Location builders
2627
numbers/ # Phone number management
2728
client.py # NumbersClient + LookupResource
2829
tests/ # pytest + pytest-asyncio + respx
29-
examples/ # 15 runnable scripts
30+
examples/ # 18 runnable scripts
3031
```
3132

3233
## Build & run
3334

3435
```bash
3536
pip install -e ".[dev]" # install in dev mode
36-
pytest tests/ -v # run all tests (~70)
37+
pytest tests/ -v # run all tests (~87)
3738
ruff check src/ tests/ # lint
3839
```
3940

@@ -49,6 +50,81 @@ ruff check src/ tests/ # lint
4950
- **No bare `except`**: always catch specific exceptions
5051
- **asyncio_mode = "auto"**: all async test functions run without explicit markers
5152

53+
## Event type strings
54+
55+
All WebSocket events use dotted naming convention:
56+
57+
| Event | Type string | Dataclass |
58+
|---|---|---|
59+
| Session started | `session.started` | `AgentSessionStarted` |
60+
| Session ended | `session.ended` | `AgentSessionEnded` |
61+
| Session error | `session.error` | `Error` |
62+
| Tool called | `tool.called` | `ToolCall` |
63+
| Tool executed (MCP) | `tool.executed` | `ToolExecuted` |
64+
| Turn completed | `turn.completed` | `TurnCompleted` |
65+
| Turn metrics | `turn.metrics` | `TurnMetrics` |
66+
| User transcription | `user.transcription` | `Prompt` |
67+
| User DTMF | `user.dtmf` | `Dtmf` |
68+
| DTMF sent | `dtmf.sent` | `DtmfSent` |
69+
| User idle | `user.idle` | `UserIdle` |
70+
| User speech started | `user.speech_started` | `VadSpeechStarted` |
71+
| User speech stopped | `user.speech_stopped` | `VadSpeechStopped` |
72+
| User turn completed | `user.turn_completed` | `TurnDetected` |
73+
| User state changed | `user.state_changed` | `UserStateChanged` |
74+
| Agent handoff | `agent.handoff` | `AgentHandoff` |
75+
| Agent speech interrupted | `agent.speech_interrupted` | `Interruption` |
76+
| Agent speech created | `agent.speech_created` | `AgentSpeechCreated` |
77+
| Agent speech started | `agent.speech_started` | `AgentSpeechStarted` |
78+
| Agent speech completed | `agent.speech_completed` | `AgentSpeechCompleted` |
79+
| Agent false interruption | `agent.false_interruption` | `AgentFalseInterruption` |
80+
| Agent state changed | `agent.state_changed` | `AgentStateChanged` |
81+
| Agent tool started | `agent_tool.started` | `AgentToolStarted` |
82+
| Agent tool completed | `agent_tool.completed` | `AgentToolCompleted` |
83+
| Agent tool failed | `agent_tool.failed` | `AgentToolFailed` |
84+
| LLM availability | `llm.availability_changed` | `LlmAvailabilityChanged` |
85+
| Voicemail detected | `voicemail.detected` | `VoicemailDetected` |
86+
| Voicemail beep | `voicemail.beep` | `VoicemailBeep` |
87+
| Participant added | `participant.added` | `ParticipantAdded` |
88+
| Participant removed | `participant.removed` | `ParticipantRemoved` |
89+
| Call transferred | `call.transferred` | `CallTransferred` |
90+
| Play completed | `play.completed` | `PlayCompleted` |
91+
92+
Audio stream events use the Plivo protocol: `start`, `media`, `dtmf`, `playedStream`, `clearedAudio`, `stop`.
93+
94+
## Session methods
95+
96+
- **Managed mode**: `send_tool_result()`, `send_tool_error()`
97+
- **Text mode (BYOLLM)**: `send_text()`, `extend_wait()`, `send_raw()`
98+
- **Audio stream**: `send_media()`, `send_checkpoint()`, `clear_audio()`
99+
- **Control**: `update()`, `inject()`, `handoff()`, `speak()`, `play()`, `transfer()`, `send_dtmf()`, `hangup()`
100+
- **Background audio**: `play_background()`, `stop_background()`
101+
102+
## Prebuilt tools
103+
104+
Simple tools (customer-side): `EndCall`, `SendDtmf`, `WarmTransfer` — each has `.tool` (schema), `.instructions` (prompt hint), `.match(event)`, `.handle(session, event)`.
105+
106+
Agent tools (server-side sub-agents): `CollectEmail`, `CollectAddress`, `CollectPhone`, `CollectName`, `CollectDOB`, `CollectDigits`, `CollectCreditCard` — each has `.definition` (for `agent_tools=[]`), `.prompt_hint` (for system prompt).
107+
108+
## Supported providers
109+
110+
| Component | Providers |
111+
|---|---|
112+
| STT | deepgram, google, azure, assemblyai, groq, openai |
113+
| LLM | openai, anthropic, groq, google, azure, together, fireworks, perplexity, mistral |
114+
| TTS | elevenlabs, cartesia, google, azure, openai, deepgram |
115+
| S2S | openai_realtime, gemini_live, azure_openai |
116+
117+
Provider names are case-insensitive. BYOK API keys are validated at agent creation time.
118+
119+
All provider configs (STT, LLM, TTS) accept optional `base_url` for custom endpoints. ElevenLabs TTS also accepts `region` (`"us"` default, `"in"` for India residency). Azure OpenAI uses `azure_deployment`, `azure_endpoint`, `api_version`.
120+
121+
## Semantic VAD
122+
123+
Agent creation supports `semantic_vad` as a string preset or dict:
124+
- `"high"` / `"medium"` / `"low"` / `"auto"` — eagerness presets
125+
- `{"eagerness": "high", "min_interruption_duration_ms": 200}` — preset + overrides
126+
- Raw dict — full manual control
127+
52128
## Testing patterns
53129

54130
- **HTTP mocking**: use `respx` (not `unittest.mock` for HTTP). Fixture `mock_api` provides a router scoped to `https://api.plivo.com`
@@ -64,6 +140,7 @@ ruff check src/ tests/ # lint
64140
- VoiceApp auto-detects sync vs async handlers — sync runs in thread pool via `asyncio.to_thread()`
65141
- Unknown WebSocket events parse to raw `dict` (forward-compatible)
66142
- HttpTransport retries on 429 (respects `Retry-After`) and 5xx with exponential backoff
143+
- Agent REST client auto-expands `semantic_vad` presets to full config dicts
67144

68145
## Git & commit rules
69146

@@ -78,5 +155,6 @@ ruff check src/ tests/ # lint
78155

79156
- New REST resources: add to the appropriate sub-client (`agent/client.py`, `messaging/client.py`, `numbers/client.py`), wire into the parent client, add tests with `respx` mocks
80157
- New WebSocket events: add a `@dataclass` to `agent/events.py`, register in `_EVENT_REGISTRY`, add parse test in `test_events.py`
158+
- New prebuilt tools: add to `agent/tools.py`, export in `agent/__init__.py`
81159
- New examples: add to `examples/`, use `from plivo_agentstack import AsyncClient` and `from plivo_agentstack.agent import VoiceApp, ...` pattern. Update README Quick start section
82160
- Keep dependencies minimal — core deps are `httpx`, `websockets`, `starlette` only

README.md

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,8 @@ Inbound/Outbound Call
2929
|
3030
WebSocket ──────► Your VoiceApp server
3131
| |
32-
Audio stream @app.on("tool_call")
33-
VAD / Turn @app.on("prompt") ← BYOLLM
32+
Audio stream @app.on("tool.called")
33+
VAD / Turn @app.on("user.transcription") ← BYOLLM
3434
STT → LLM → TTS @app.on("turn.completed")
3535
| |
3636
Caller hears session.send_tool_result()
@@ -42,15 +42,20 @@ Inbound/Outbound Call
4242
### Agent capabilities
4343

4444
- **Tool calling** - LLM invokes tools, you handle them and return results
45-
- **Mid-call model switching** - swap LLM model/prompt/tools via `session.update()` for agent handoff
45+
- **Agent tools** - server-side sub-agents for multi-turn data collection (email, address, phone, name, DOB, digits, credit card)
46+
- **Prebuilt tools** - ready-to-use EndCall, SendDtmf, WarmTransfer with `match()` + `handle()` patterns
47+
- **Mid-call model switching** - swap LLM model/prompt/tools via `session.update()` or `session.handoff()` for agent handoff
4648
- **Multi-party conferences** - add participants with `calls.dial()`, warm transfer patterns
4749
- **Voicemail detection** - async AMD with beep detection for outbound calls
4850
- **Background audio** - ambient sounds (office, typing, call-center) mixed with agent speech
49-
- **DTMF handling** - detect keypress events for IVR flows
50-
- **Interruption (barge-in)** - caller can interrupt the agent mid-speech
51+
- **DTMF handling** - receive keypress events and send DTMF tones for IVR navigation
52+
- **Interruption (barge-in)** - caller can interrupt the agent mid-speech, false interruption detection
53+
- **Semantic VAD** - unified VAD + turn detection + interruption with eagerness presets (high/medium/low/auto)
5154
- **User idle detection** - configurable reminders and auto-hangup on silence
5255
- **Per-turn metrics** - latency breakdown (STT, LLM TTFT, TTS) for monitoring
5356
- **Audio streaming** - raw audio relay with `send_media()`, checkpoints, and `clear_audio()`
57+
- **Provider fallback** - automatic failover chains for STT, LLM, and TTS providers
58+
- **MCP integration** - connect MCP servers (HTTP/stdio) for external tool discovery
5459
- **BYOK (Bring Your Own Keys)** - pass API keys for Deepgram, OpenAI, ElevenLabs, Cartesia, etc.
5560

5661
## SDK features
@@ -60,7 +65,8 @@ Inbound/Outbound Call
6065
- **Standalone mode** - `app.run(port=9000)` starts a WebSocket server with graceful shutdown
6166
- **Sync + async handlers** - sync handlers run in a thread pool automatically
6267
- **Automatic retries** - exponential backoff on 429 (respects `Retry-After`) and 5xx
63-
- **Typed events** - 25 dataclasses for all WebSocket events (`ToolCall`, `TurnMetrics`, `StreamMedia`, ...)
68+
- **Typed events** - 38 dataclasses for all WebSocket events (`ToolCall`, `TurnMetrics`, `AgentToolCompleted`, ...)
69+
- **Prebuilt tools** - EndCall, SendDtmf, WarmTransfer, CollectEmail, CollectAddress, CollectPhone, CollectName, CollectDOB, CollectDigits, CollectCreditCard
6470
- **Per-session state** - `session.data` dict persists across events within a call
6571
- **Messaging** - SMS, MMS, WhatsApp with template and interactive message builders
6672
- **Numbers** - search, buy, manage, and carrier lookup
@@ -80,14 +86,17 @@ Requires Python 3.10+.
8086

8187
Sign up at [cx.plivo.com/signup](https://cx.plivo.com/signup) to get your `PLIVO_AUTH_ID` and `PLIVO_AUTH_TOKEN`, set them as environment variables, then see the [`examples/`](examples/) directory for runnable scripts:
8288

83-
- [**Full AI pipeline**](examples/full_pipeline.py) - tool calls, model switching, voicemail detection, transfers
89+
- [**Full AI pipeline**](examples/full_pipeline.py) - tool calls, agent tools, model switching, voicemail detection, transfers, handoff, metrics
90+
- [**Provider fallback**](examples/full_pipeline_fallback.py) - resilient multi-provider STT/LLM/TTS fallback chains
91+
- [**Prebuilt tools**](examples/prebuilt_tools.py) - all 10 prebuilt tools: 7 agent tools + 3 simple tools
8492
- [**BYOLLM**](examples/byollm.py) - bring your own LLM with OpenAI streaming, per-session conversation history
8593
- [**BYOLLM echo**](examples/byollm_echo.py) - minimal echo agent for testing, no external dependencies
8694
- [**Multi-party conference**](examples/multi_party.py) - MPC with mid-call dial, warm transfer to human agents
8795
- [**Speech-to-speech**](examples/s2s_agent.py) - OpenAI Realtime / Gemini Live integration
8896
- [**Raw audio streaming**](examples/audio_stream.py) - bidirectional audio relay with checkpoints and pacing
8997
- [**Background audio**](examples/background_audio.py) - ambient office/typing sounds mixed with agent speech
9098
- [**Pipeline modes**](examples/pipeline_modes.py) - all five config combinations in one file
99+
- [**MCP integration**](examples/pipeline_mcp.py) - connect external MCP tool servers
91100
- [**Metrics & observability**](examples/metrics.py) - per-turn latency breakdown, VAD and turn events
92101
- [**SMS & MMS**](examples/send_sms.py) - text messages and MMS with media attachments
93102
- [**WhatsApp**](examples/whatsapp.py) - text, media, templates, buttons, lists, CTA, location
@@ -102,7 +111,7 @@ git clone https://github.com/plivo/plivo-agentstack-python.git
102111
cd plivo-agentstack-python
103112
python -m venv .venv && source .venv/bin/activate
104113
pip install -e ".[dev]"
105-
pytest tests/ -v # 70 tests
114+
pytest tests/ -v # 87 tests
106115
ruff check src/ tests/ # lint
107116
```
108117

examples/agent_basic.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,8 @@ async def create_agent() -> str:
6969
],
7070
},
7171
tts={
72-
"provider": "eleven_labs",
73-
"voice": "rachel",
72+
"provider": "elevenlabs",
73+
"voice": "EXAVITQu4vr4xnSDxMaL",
7474
},
7575
)
7676
agent_uuid = resp["agent_uuid"]
@@ -83,7 +83,7 @@ async def create_agent() -> str:
8383
app = VoiceApp()
8484

8585

86-
@app.on("agent_session.started")
86+
@app.on("session.started")
8787
def on_session_started(session, event: AgentSessionStarted):
8888
logger.info(
8989
"Session started: session_id=%s call_id=%s caller=%s",
@@ -95,7 +95,7 @@ def on_session_started(session, event: AgentSessionStarted):
9595
session.data["caller"] = event.caller
9696

9797

98-
@app.on("tool_call")
98+
@app.on("tool.called")
9999
def on_tool_call(session, event: ToolCall):
100100
"""Handle tool calls from the LLM."""
101101
logger.info("Tool call: name=%s args=%s", event.name, event.arguments)
@@ -114,7 +114,7 @@ def on_tool_call(session, event: ToolCall):
114114
session.send_tool_error(event.id, f"Unknown tool: {event.name}")
115115

116116

117-
@app.on("agent_session.ended")
117+
@app.on("session.ended")
118118
def on_session_ended(session, event: AgentSessionEnded):
119119
logger.info(
120120
"Session ended: duration=%ds turns=%s",

examples/agent_fastapi.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,12 @@
3333
voice = VoiceApp()
3434

3535

36-
@voice.on("agent_session.started")
36+
@voice.on("session.started")
3737
async def on_session_started(session, event: AgentSessionStarted):
3838
logger.info("Session started: %s", event.agent_session_id)
3939

4040

41-
@voice.on("tool_call")
41+
@voice.on("tool.called")
4242
async def on_tool_call(session, event: ToolCall):
4343
"""Handle tool calls -- async handlers work natively with FastAPI."""
4444
logger.info("Tool call: %s(%s)", event.name, event.arguments)
@@ -55,7 +55,7 @@ async def on_tool_call(session, event: ToolCall):
5555
session.send_tool_error(event.id, f"Unknown tool: {event.name}")
5656

5757

58-
@voice.on("agent_session.ended")
58+
@voice.on("session.ended")
5959
async def on_session_ended(session, event: AgentSessionEnded):
6060
logger.info(
6161
"Session ended: duration=%ds turns=%s",

0 commit comments

Comments
 (0)