Skip to content

fix(inworld): Default to inworld-tts-2#531

Merged
Nash0x7E2 merged 8 commits intomainfrom
nash/inworld-2
May 5, 2026
Merged

fix(inworld): Default to inworld-tts-2#531
Nash0x7E2 merged 8 commits intomainfrom
nash/inworld-2

Conversation

@Nash0x7E2
Copy link
Copy Markdown
Member

Why

The Inworld TTS plugin only decoded the first streaming chunk via av.open(). Inworld's default audio encoding is MP3, where mid-stream chunks aren't self-contained — they fail to parse with Invalid data found when processing input. The bug was silent for short replies that fit in one chunk and only surfaced once a reply was long enough to span multiple chunks, so the demo "worked" right up until it didn't.

Forcing audioConfig.audioEncoding=LINEAR16 makes Inworld emit each chunk as a self-contained RIFF WAV, which the existing per-chunk decode path already handles cleanly. No decoder rewrite needed.

While in here: Inworld TTS v2 (currently in pre-release, API-compatible with v1) is added to the model Literal and made the default — anyone constructing inworld.TTS() without arguments now gets the newer model. The example's LLM is also swapped from gemini-3.1-pro-preview to gemini-3.1-flash-lite-preview because the pro variant's ~3-5s per-turn latency drowned out the expressive-TTS demo; flash-lite lands replies in <1s and still picks the right Inworld steering tags ([whisper], [laugh], [sigh], [shout]) from the audio guide.

Changes

  • Force LINEAR16 audio encoding in the streaming TTS payload
  • Add inworld-tts-2 to the model Literal and use it as the default
  • Switch the example's LLM to gemini-3.1-flash-lite-preview

Inworld TTS streams default to MP3, but the plugin only decodes the first
chunk via av.open() — subsequent mid-stream MP3 chunks fail to parse
('Invalid data found when processing input'). The bug is silent for short
inputs that fit in one chunk but breaks any reply long enough to span
multiple chunks.

Setting audioConfig.audioEncoding=LINEAR16 makes Inworld return each chunk
as a self-contained RIFF WAV, which the existing per-chunk decode path
handles cleanly.
Inworld TTS v2 (currently in pre-release) is API-compatible with v1 — same
streaming endpoint, same payload format, plus a new per-chunk 'usage' field
({processedCharactersCount, modelId}). Adding it to the Literal lets users
opt in; flipping the default exposes the new model to anyone constructing
inworld.TTS() without arguments.
Default Gemini in the example was gemini-3.1-pro-preview which has ~3-5s
latency per turn — slow enough to break the conversational feel of an
expressive-TTS demo. Flash-lite consistently lands replies in <1s while
still picking the right Inworld steering tags ([whisper], [laugh], [sigh],
[shout]) from the audio guide.
@Nash0x7E2 Nash0x7E2 changed the title fix(inworld): make multi-chunk TTS playback work; default to inworld-tts-2 fix(inworld): Default to inworld-tts-2 May 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c96be6ce-f67f-4ca5-bc5c-8938149107ca

📥 Commits

Reviewing files that changed from the base of the PR and between 38ef975 and 0b460a4.

📒 Files selected for processing (1)
  • CHANGELOG.md
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

Changelog updated for Inworld TTS v2. Built-in HTTP session routes moved under /calls/{call_id}/..., call_id moved to path, session delete/close return 202 and close is async, and permission callbacks now accept call_id: str. FunctionRegistry rejects sync functions; call_function and LLM.call_function made async. Agent.create_user/EdgeTransport.create_user renamed to authenticate and authentication runs during Agent.start(). Testing helpers removed. AgentLauncher renamed cleanup_intervalmaintenance_interval, removed created_by, added call_id validation and optional registry, and new methods get_session_info/request_close_session. Inworld TTS default set to inworld-tts-2, default voice_id changed, and audio encoding forced to LINEAR16; examples/docs updated.

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/inworld/vision_agents/plugins/inworld/tts.py (1)

52-53: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

model_id docstring is stale after changing the default.
The docstring still says default is "inworld-tts-1.5-max", but code default is now "inworld-tts-2" (Line 43). Update options/default text to match runtime behavior.

Proposed fix
-            model_id: The model ID to use for synthesis. Options: "inworld-tts-1.5-max",
-                     "inworld-tts-1.5-mini" (default: "inworld-tts-1.5-max").
+            model_id: The model ID to use for synthesis. Options: "inworld-tts-1.5-max",
+                     "inworld-tts-1.5-mini", "inworld-tts-1", "inworld-tts-1-max",
+                     "inworld-tts-2" (default: "inworld-tts-2").

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0893cced-59fe-49f2-9093-9b57706e28de

📥 Commits

Reviewing files that changed from the base of the PR and between 3df4dd3 and 0439f5e.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • plugins/inworld/example/inworld_tts_example.py
  • plugins/inworld/vision_agents/plugins/inworld/tts.py

@Nash0x7E2 Nash0x7E2 marked this pull request as ready for review May 5, 2026 16:26
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 72a09d48-73d3-4f61-bdb6-09c81df3a2bf

📥 Commits

Reviewing files that changed from the base of the PR and between 0439f5e and 2d4b1a4.

📒 Files selected for processing (3)
  • plugins/inworld/README.md
  • plugins/inworld/example/inworld-audio-guide.md
  • plugins/inworld/vision_agents/plugins/inworld/tts.py
✅ Files skipped from review due to trivial changes (1)
  • plugins/inworld/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/inworld/vision_agents/plugins/inworld/tts.py

Comment on lines +24 to +26
```
[say sadly with deliberate pauses in a low voice and hushed style] I'm sorry, that didn't work.
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks.

These fences trigger markdownlint MD040. Add a language (for example text) to each fenced block.

Proposed fix
-```
+```text
 [say sadly with deliberate pauses in a low voice and hushed style] I'm sorry, that didn't work.

@@
- +text
[say warmly and a little excited] I'd be glad to help with that. [breathe] Here's what you need to know...

@@
-```
+```text
[say sadly with deliberate pauses in a low voice] Unfortunately, that's not possible. [sigh] Let me explain why...

@@
- +text
[say excitedly with a high pitch and fast pace] Oh, that's fascinating — I just realized something important.

@@
-```
+```text
[say slowly and thoughtfully] Let me think about this... [breathe] Yes, I believe the solution is...

@@
- +text
[clear throat] [say crisply with a measured pace] Actually, there's been a misunderstanding. Let me clarify...

@@
-```
+```text
[whisper in a hushed style] Between you and me, the real answer is simpler than it looks.
</details>
 


Also applies to: 56-58, 61-63, 66-68, 71-73, 76-78, 81-83

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 24-24: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<!-- fingerprinting:phantom:triton:hawk -->

<!-- d98c2f50 -->

<!-- This is an auto-generated comment by CodeRabbit -->

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/inworld/example/inworld_tts_example.py (1)

10-10: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove stale Smart Turn claim from module docstring.

Line 10 says Smart Turn is part of this example, but the plugin import and turn_detection wiring were removed. Update the feature list to match current behavior.

Proposed fix
-- Smart Turn for turn detection

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9d744a9f-77d8-4ff1-b4ca-d83cf5817024

📥 Commits

Reviewing files that changed from the base of the PR and between 2d4b1a4 and 00fb5dd.

📒 Files selected for processing (1)
  • plugins/inworld/example/inworld_tts_example.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/inworld/example/inworld_tts_example.py (1)

6-10: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stale docstring — remove the "Smart Turn" line.

smart_turn was removed from both imports and the agent config, but the module docstring still lists it as a component.

Proposed fix
 This example creates an agent that uses:
 - Inworld AI for text-to-speech (TTS)
 - Stream for edge/real-time communication
 - Deepgram for speech-to-text (STT)
-- Smart Turn for turn detection

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b52d7323-dacb-41d6-bfe2-a0d21e51484d

📥 Commits

Reviewing files that changed from the base of the PR and between 00fb5dd and 38ef975.

📒 Files selected for processing (1)
  • plugins/inworld/example/inworld_tts_example.py

@Nash0x7E2 Nash0x7E2 merged commit fb7a015 into main May 5, 2026
6 checks passed
@Nash0x7E2 Nash0x7E2 deleted the nash/inworld-2 branch May 5, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant