fix(inworld): Default to inworld-tts-2 by Nash0x7E2 · Pull Request #531 · GetStream/Vision-Agents

Nash0x7E2 · 2026-05-05T16:19:05Z

Why

The Inworld TTS plugin only decoded the first streaming chunk via av.open(). Inworld's default audio encoding is MP3, where mid-stream chunks aren't self-contained — they fail to parse with Invalid data found when processing input. The bug was silent for short replies that fit in one chunk and only surfaced once a reply was long enough to span multiple chunks, so the demo "worked" right up until it didn't.

Forcing audioConfig.audioEncoding=LINEAR16 makes Inworld emit each chunk as a self-contained RIFF WAV, which the existing per-chunk decode path already handles cleanly. No decoder rewrite needed.

While in here: Inworld TTS v2 (currently in pre-release, API-compatible with v1) is added to the model Literal and made the default — anyone constructing inworld.TTS() without arguments now gets the newer model. The example's LLM is also swapped from gemini-3.1-pro-preview to gemini-3.1-flash-lite-preview because the pro variant's ~3-5s per-turn latency drowned out the expressive-TTS demo; flash-lite lands replies in <1s and still picks the right Inworld steering tags ([whisper], [laugh], [sigh], [shout]) from the audio guide.

Changes

Force LINEAR16 audio encoding in the streaming TTS payload
Add inworld-tts-2 to the model Literal and use it as the default
Switch the example's LLM to gemini-3.1-flash-lite-preview

Inworld TTS streams default to MP3, but the plugin only decodes the first chunk via av.open() — subsequent mid-stream MP3 chunks fail to parse ('Invalid data found when processing input'). The bug is silent for short inputs that fit in one chunk but breaks any reply long enough to span multiple chunks. Setting audioConfig.audioEncoding=LINEAR16 makes Inworld return each chunk as a self-contained RIFF WAV, which the existing per-chunk decode path handles cleanly.

Inworld TTS v2 (currently in pre-release) is API-compatible with v1 — same streaming endpoint, same payload format, plus a new per-chunk 'usage' field ({processedCharactersCount, modelId}). Adding it to the Literal lets users opt in; flipping the default exposes the new model to anyone constructing inworld.TTS() without arguments.

Default Gemini in the example was gemini-3.1-pro-preview which has ~3-5s latency per turn — slow enough to break the conversational feel of an expressive-TTS demo. Flash-lite consistently lands replies in <1s while still picking the right Inworld steering tags ([whisper], [laugh], [sigh], [shout]) from the audio guide.

coderabbitai · 2026-05-05T16:20:28Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c96be6ce-f67f-4ca5-bc5c-8938149107ca

📥 Commits

Reviewing files that changed from the base of the PR and between 38ef975 and 0b460a4.

📒 Files selected for processing (1)

CHANGELOG.md

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.md

📝 Walkthrough

Walkthrough

Changelog updated for Inworld TTS v2. Built-in HTTP session routes moved under /calls/{call_id}/..., call_id moved to path, session delete/close return 202 and close is async, and permission callbacks now accept call_id: str. FunctionRegistry rejects sync functions; call_function and LLM.call_function made async. Agent.create_user/EdgeTransport.create_user renamed to authenticate and authentication runs during Agent.start(). Testing helpers removed. AgentLauncher renamed cleanup_interval→maintenance_interval, removed created_by, added call_id validation and optional registry, and new methods get_session_info/request_close_session. Inworld TTS default set to inworld-tts-2, default voice_id changed, and audio encoding forced to LINEAR16; examples/docs updated.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/inworld/vision_agents/plugins/inworld/tts.py (1)
52-53: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

model_id docstring is stale after changing the default.
The docstring still says default is "inworld-tts-1.5-max", but code default is now "inworld-tts-2" (Line 43). Update options/default text to match runtime behavior.
Proposed fix
-            model_id: The model ID to use for synthesis. Options: "inworld-tts-1.5-max",
-                     "inworld-tts-1.5-mini" (default: "inworld-tts-1.5-max").
+            model_id: The model ID to use for synthesis. Options: "inworld-tts-1.5-max",
+                     "inworld-tts-1.5-mini", "inworld-tts-1", "inworld-tts-1-max",
+                     "inworld-tts-2" (default: "inworld-tts-2").

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0893cced-59fe-49f2-9093-9b57706e28de

📥 Commits

Reviewing files that changed from the base of the PR and between 3df4dd3 and 0439f5e.

📒 Files selected for processing (3)

CHANGELOG.md
plugins/inworld/example/inworld_tts_example.py
plugins/inworld/vision_agents/plugins/inworld/tts.py

…arah, rewrite audio guide for TTS-2

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 72a09d48-73d3-4f61-bdb6-09c81df3a2bf

📥 Commits

Reviewing files that changed from the base of the PR and between 0439f5e and 2d4b1a4.

📒 Files selected for processing (3)

plugins/inworld/README.md
plugins/inworld/example/inworld-audio-guide.md
plugins/inworld/vision_agents/plugins/inworld/tts.py

✅ Files skipped from review due to trivial changes (1)

plugins/inworld/README.md

🚧 Files skipped from review as they are similar to previous changes (1)

plugins/inworld/vision_agents/plugins/inworld/tts.py

coderabbitai · 2026-05-05T16:27:35Z

+```
+[say sadly with deliberate pauses in a low voice and hushed style] I'm sorry, that didn't work.
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

These fences trigger markdownlint MD040. Add a language (for example text) to each fenced block.

Proposed fix

-``` +```text [say sadly with deliberate pauses in a low voice and hushed style] I'm sorry, that didn't work.

@@
- +text
[say warmly and a little excited] I'd be glad to help with that. [breathe] Here's what you need to know...

@@ -``` +```text [say sadly with deliberate pauses in a low voice] Unfortunately, that's not possible. [sigh] Let me explain why...

@@
- +text
[say excitedly with a high pitch and fast pace] Oh, that's fascinating — I just realized something important.

@@ -``` +```text [say slowly and thoughtfully] Let me think about this... [breathe] Yes, I believe the solution is...

@@
- +text
[clear throat] [say crisply with a measured pace] Actually, there's been a misunderstanding. Let me clarify...

@@ -``` +```text [whisper in a hushed style] Between you and me, the real answer is simpler than it looks.

</details> Also applies to: 56-58, 61-63, 66-68, 71-73, 76-78, 81-83 <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.22.1)</summary> [warning] 24-24: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details>   

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/inworld/example/inworld_tts_example.py (1)
10-10: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove stale Smart Turn claim from module docstring.

Line 10 says Smart Turn is part of this example, but the plugin import and turn_detection wiring were removed. Update the feature list to match current behavior.
Proposed fix
-- Smart Turn for turn detection

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9d744a9f-77d8-4ff1-b4ca-d83cf5817024

📥 Commits

Reviewing files that changed from the base of the PR and between 2d4b1a4 and 00fb5dd.

📒 Files selected for processing (1)

plugins/inworld/example/inworld_tts_example.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

plugins/inworld/example/inworld_tts_example.py (1)
6-10: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale docstring — remove the "Smart Turn" line.

smart_turn was removed from both imports and the agent config, but the module docstring still lists it as a component.
Proposed fix
 This example creates an agent that uses:
 - Inworld AI for text-to-speech (TTS)
 - Stream for edge/real-time communication
 - Deepgram for speech-to-text (STT)
-- Smart Turn for turn detection

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b52d7323-dacb-41d6-bfe2-a0d21e51484d

📥 Commits

Reviewing files that changed from the base of the PR and between 00fb5dd and 38ef975.

📒 Files selected for processing (1)

plugins/inworld/example/inworld_tts_example.py

Nash0x7E2 added 3 commits April 30, 2026 12:08

github-actions Bot added the plugins label May 5, 2026

Nash0x7E2 changed the title ~~fix(inworld): make multi-chunk TTS playback work; default to inworld-tts-2~~ fix(inworld): Default to inworld-tts-2 May 5, 2026

docs(changelog): add inworld TTS v2 + LINEAR16 fix entries

0439f5e

github-actions Bot added docs project-info labels May 5, 2026

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

docs(inworld): document TTS-2 capabilities, switch default voice to S…

2d4b1a4

…arah, rewrite audio guide for TTS-2

Nash0x7E2 marked this pull request as ready for review May 5, 2026 16:26

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Nash0x7E2 added 2 commits May 5, 2026 10:32

Fix turn detection in example

00fb5dd

remove agent.llm.simple_response

38ef975

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Merge branch 'main' into nash/inworld-2

0b460a4

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Nash0x7E2 merged commit fb7a015 into main May 5, 2026
6 checks passed

Nash0x7E2 deleted the nash/inworld-2 branch May 5, 2026 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inworld): Default to inworld-tts-2#531

fix(inworld): Default to inworld-tts-2#531
Nash0x7E2 merged 8 commits intomainfrom
nash/inworld-2

Nash0x7E2 commented May 5, 2026

Uh oh!

coderabbitai Bot commented May 5, 2026 •

edited

Loading

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 5, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Nash0x7E2 commented May 5, 2026

Why

Changes

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 5, 2026 •

edited

Loading