feat(inworld): add STT plugin with voice profiling by karan-dhir · Pull Request #1516 · livekit/agents-js

karan-dhir · 2026-05-15T19:47:06Z

Description

Ports the Python livekit-plugins-inworld STT implementation to TypeScript, adding the missing STT capability to the existing Inworld plugin (which previously only had TTS).

Changes Made

plugins/inworld/src/stt.ts — new STT class with both streaming (bidirectional WebSocket to wss://api.inworld.ai/stt/v1/transcribe:streamBidirectional) and batch (REST POST /stt/v1/transcribe) modes; includes word-level timestamp mapping, periodic audio duration reporting, and exponential-backoff reconnection
plugins/inworld/src/index.ts — exports the new STT and SpeechStream classes
agents/src/inference/stt.ts — adds InworldSTTModels = 'inworld/inworld-stt-1' to the inference STT type union

When enableVoiceProfile is true (default), each transcript includes an acoustic VoiceProfile in SpeechData.metadata.voice_profile with typed fields for emotion, accent, age, pitch, and vocalStyle.

Pre-Review Checklist

Build passes: All builds (lint, typecheck, tests) pass locally
AI-generated code reviewed: Removed unnecessary comments and ensured code quality
Changes explained: All changes are properly documented and justified above
Scope appropriate: All changes relate to the PR title

Testing

Build passes (pnpm build, pnpm --filter @livekit/agents-plugin-inworld build)
Lint passes (pnpm -w lint)
Format passes (pnpm -w format:write)
Automated tests added/updated (voice profiling requires a live API key; unit test mocks pending)

Additional Notes

The VoiceProfile response schema is not publicly documented by Inworld. The interface uses known dimension names (emotion, accent, age, pitch, vocalStyle) based on their API resources page, with an index signature ([key: string]: unknown) to handle any undocumented fields. Word timestamps handle both startTime/endTime (streaming, seconds) and startTimeMs/endTimeMs (REST, milliseconds) naming conventions.

Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.

Ports the Python livekit-plugins-inworld STT implementation to TypeScript. Adds streaming (WebSocket) and batch (REST) modes, word-level timestamps, and typed VoiceProfile with emotion/accent/age/pitch/vocalStyle dimensions. Also registers InworldSTTModels in the inference STT type union.

changeset-bot · 2026-05-15T19:47:10Z

⚠️ No Changeset found

Latest commit: 33955ea

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

devin-ai-integration

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-05-15T19:53:34Z

+
+    await Promise.race([
+      this.#resetWS.await,
+      Promise.all([sendTask(), listenTask.result, wsMonitor]),


🔴 wsMonitor Task object is not thenable, making WebSocket close detection a no-op in Promise.all

On line 479, wsMonitor (a Task<void> object) is passed directly to Promise.all instead of wsMonitor.result (a Promise<void>). The Task class at agents/src/utils.ts:492 does not implement a .then() method, so it is not thenable. Promise.all treats non-thenable values as immediately resolved, meaning the WebSocket close monitor never actually participates in error propagation.

This causes the stream to hang if the WebSocket closes unexpectedly while sendTask is blocked waiting for audio input on this.input.next() (line 344). Neither sendTask nor listenTask will detect the closure until new audio data arrives and ws.send() fails. In a silence scenario (no audio input), the stream hangs indefinitely.

Suggested change

Promise.all([sendTask(), listenTask.result, wsMonitor]),

Promise.all([sendTask(), listenTask.result, wsMonitor.result]),

Was this helpful? React with 👍 or 👎 to provide feedback.

karan-dhir added 2 commits May 15, 2026 15:44

devin-ai-integration Bot reviewed May 15, 2026

View reviewed changes

mike-r-mclaughlin requested a review from toubatbrian May 15, 2026 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inworld): add STT plugin with voice profiling#1516

feat(inworld): add STT plugin with voice profiling#1516
karan-dhir wants to merge 2 commits into
livekit:mainfrom
karan-dhir:inworld-stt-voice-profiling

karan-dhir commented May 15, 2026

Uh oh!

changeset-bot Bot commented May 15, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	Promise.all([sendTask(), listenTask.result, wsMonitor]),
	Promise.all([sendTask(), listenTask.result, wsMonitor.result]),

Conversation

karan-dhir commented May 15, 2026

Description

Changes Made

Pre-Review Checklist

Testing

Additional Notes

Uh oh!

changeset-bot Bot commented May 15, 2026

⚠️ No Changeset found

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant