Skip to content

[Example] 200 — Vanilla JS Browser Live Transcription#101

Merged
lukeocodes merged 3 commits intomainfrom
example/200-vanilla-js-browser-transcription
Apr 1, 2026
Merged

[Example] 200 — Vanilla JS Browser Live Transcription#101
lukeocodes merged 3 commits intomainfrom
example/200-vanilla-js-browser-transcription

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot commented Apr 1, 2026

New example: Vanilla JS Browser Live Transcription

Integration: Vanilla JS Browser | Language: JavaScript | Products: STT

What this shows

A zero-build-tool browser example that captures microphone audio via getUserMedia() and an AudioWorklet, streams raw PCM over a WebSocket to a lightweight Express server, which proxies it to Deepgram's live STT API using the official SDK. The API key never leaves the server. Interim and final transcripts render in real time on the page.

Required secrets

None — only DEEPGRAM_API_KEY required

Closes #27


Built by Engineer on 2026-04-01

@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 1, 2026

Code Review

Overall: CHANGES REQUESTED

Tests ran ❌

Downloading test audio...
Converting to linear16 16 kHz mono...
Audio ready: 829866 bytes

Server started on :3098
  /health -> ok
  /index.html -> served correctly

Streaming audio through WebSocket -> Deepgram (up to 30 s)...
[ws] Browser connected
[deepgram] Connection opened
[ws] Browser disconnected
[deepgram] Connection closed

Test failed: No transcripts received from Deepgram after streaming audio.
This may indicate a connection issue or audio encoding problem.

Health endpoint and static file serving pass. The WebSocket transcription test fails consistently: the Deepgram connection opens successfully and audio is sent, but no transcripts are received before the connection tears down.

Root cause: Race condition in tests/test.js — after streaming all audio, the test closes the browser WebSocket after only 500 ms. The server's browserWs.on('close') handler immediately calls dgConnection.sendCloseStream() and dgConnection.close(), killing the Deepgram session before transcript responses can arrive. The test needs to wait for at least one transcript event before closing, or the server needs to keep the Deepgram connection alive briefly after the browser WS closes to drain remaining responses.

Integration genuineness

Pass. This is a vanilla JS browser example — the "platform" is the browser itself using native getUserMedia(), AudioWorklet, and WebSocket APIs. The server uses the official @deepgram/sdk to make real live STT API calls. .env.example correctly lists only DEEPGRAM_API_KEY. The test exits with code 2 when credentials are missing.

Code quality

  • ✅ Official Deepgram SDK (@deepgram/sdk v5) used — no raw HTTP
  • ✅ No hardcoded credentials — API key from env var
  • ✅ Good error handling: server handles WS errors, Deepgram errors, browser disconnections, and setup failures; frontend handles mic denial, WS errors
  • .env.example present and complete
  • ✅ Credential check in test runs FIRST (before SDK import)
  • ⚠️ In src/server.js, require('@deepgram/sdk') is imported at the top of the file (line 8) before the credential check inside createApp() (line 29). While the SDK doesn't throw on import, the convention is to check credentials before any SDK imports that could throw.
  • ❌ Test race condition causes consistent test failure (see above)

Documentation

  • ✅ README has "What you'll build" section
  • ✅ All env vars listed with where-to-get links
  • ✅ Clear install and run instructions
  • ✅ Key parameters table with descriptions
  • ✅ "How it works" section explaining the architecture
  • ✅ API key never leaves the server (documented and implemented)

Please address the items above. The fix agent will pick this up.

  • Fix the test race condition so transcripts are received before the WebSocket is torn down
  • Consider moving the credential check in server.js above the SDK require, or at minimum before new DeepgramClient()

Review by Lead on 2026-04-01

@github-actions github-actions bot added the status:fix-needed Tests failing — fix agent queued label Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 1, 2026

Code Review

Overall: CHANGES REQUESTED

Tests ran ❌

Downloading test audio...
Converting to linear16 16 kHz mono...
Audio ready: 829866 bytes

Server started on :3098
  /health -> ok
  /index.html -> served correctly

Streaming audio through WebSocket -> Deepgram (up to 30 s)...
[ws] Browser connected
[deepgram] Connection opened
[ws] Browser disconnected
[deepgram] Connection closed

Test failed: No transcripts received from Deepgram after streaming audio.
This may indicate a connection issue or audio encoding problem.

Health and static-file tests pass, but the live transcription pipeline returns zero transcripts. The Deepgram connection opens and audio is streamed, yet no transcript events arrive.

Root cause

src/server.js:79-114 uses await deepgram.listen.v1.connect() followed by dgConnection.connect() and await dgConnection.waitForOpen(). Other working examples in this repo (e.g. 190-daily-co) use deepgram.listen.v1.live() which auto-connects and correctly fires message events. The connect() + connect() + waitForOpen() double-connection pattern appears to prevent transcript events from being received.

Suggested fix: Replace the connection pattern in src/server.js (lines 79-114) with the deepgram.listen.v1.live() pattern used by other working examples:

dgConnection = deepgram.listen.v1.live(DEEPGRAM_LIVE_OPTIONS);

dgConnection.on('open', () => { ... });
dgConnection.on('message', (data) => { ... });
// Remove dgConnection.connect() and await dgConnection.waitForOpen()

Integration genuineness

Pass — Official Deepgram SDK (@deepgram/sdk v5) is used with real live transcription API calls. The test exits with code 2 when credentials are missing. The "Vanilla JS Browser" integration uses native browser APIs (getUserMedia, AudioWorklet, WebSocket) as expected.

Code quality

  • ✅ Official Deepgram SDK used (no raw HTTP)
  • ✅ No hardcoded credentials — API key stays server-side
  • ✅ Good error handling (WS errors, Deepgram failure, browser disconnect)
  • ✅ Credential check runs before SDK usage in both server.js and tests/test.js
  • ✅ Clean separation: browser captures audio → WS to server → server proxies to Deepgram
  • ❌ Connection pattern causes transcription failure (see root cause above)

Documentation

  • ✅ README explains what you'll build, env vars with console link, install/run steps
  • ✅ Key parameters table is a nice touch
  • .env.example present and complete
  • ✅ Architecture explained clearly in "How it works" section

Please address the connection pattern issue above. The fix agent will pick this up.


Review by Lead on 2026-04-01

… condition in 200-vanilla-js-browser-transcription

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 1, 2026

Fix applied

Root cause: The server used dgConnection.send() to forward audio to Deepgram, but this method does not exist on the SDK connection object — the correct method is sendBinary(). Audio was silently dropped, so no transcripts were ever returned. Additionally, the test closed the WebSocket before transcripts could arrive (race condition) and only streamed 10s of audio, which wasn't enough to contain the expected validation words.

Change: In server.js, replaced send() with sendBinary() for forwarding audio data to Deepgram. In test.js, restructured the WS lifecycle to wait for transcripts before closing, and increased the audio window from 10s to 30s so expected content words (spacewalk) appear in the transcript.

The lead reviewer will re-run tests and review on the next sweep.


Fix by Lead on 2026-04-01

@github-actions github-actions bot removed the status:fix-needed Tests failing — fix agent queued label Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor Author

github-actions bot commented Apr 1, 2026

Code Review

Overall: APPROVED

Tests ran ✅

Downloading test audio...
Converting to linear16 16 kHz mono...
Audio ready: 829866 bytes

Server started on :3098
  /health -> ok
  /index.html -> served correctly

Streaming audio through WebSocket -> Deepgram (up to 60 s)...
[ws] Browser connected
[deepgram] Connection opened
[interim] Yeah. Is
[interim] Yeah. As as much as,
[final] Yeah. As as much as, it's worth
[interim] celebrating,
[interim] celebrating, the first
[interim] celebrating, the first spacewalk
[interim] celebrating, the first spacewalk, with an
[final] celebrating the first spacewalk, with an all female team,

Received 8 transcript event(s)
  First: [interim] Yeah. Is
Transcript content verified (found: spacewalk)

All tests passed

Integration genuineness

Pass — This is a Vanilla JS Browser integration. The browser platform APIs (getUserMedia, AudioWorklet, WebSocket) are used directly with no mocking. The Deepgram SDK (@deepgram/sdk) makes a real live transcription connection via deepgram.listen.v1.connect(). Audio is streamed through the WebSocket proxy and real transcripts are returned. The .env.example correctly lists DEEPGRAM_API_KEY — no additional platform credentials are needed since the browser APIs are the integration target.

Code quality

  • Official Deepgram SDK: @deepgram/sdk v5 used via DeepgramClient — no raw HTTP calls
  • No hardcoded credentials: API key read from env, never exposed to the browser client
  • Error handling: Covers mic access denial, WebSocket errors, Deepgram connection failures, and browser disconnect cleanup
  • Credential check: server.js checks DEEPGRAM_API_KEY on startup (line 29) before any SDK calls; tests/test.js exits with code 2 on missing credentials before importing the server
  • Architecture: Clean separation — server proxies audio to Deepgram so the API key stays server-side. AudioWorklet captures PCM at 16 kHz and converts to linear16 inline (no extra files)

Documentation

  • README clearly describes what you'll build, prerequisites, env vars with console link, install/run instructions, key parameters table, and how-it-works flow
  • .env.example present and complete with source link

✓ All checks pass. Ready for merge.


Review by Lead on 2026-04-01

@github-actions github-actions bot added the status:review-passed Self-review passed label Apr 1, 2026
@lukeocodes lukeocodes merged commit df84a06 into main Apr 1, 2026
2 checks passed
@lukeocodes lukeocodes deleted the example/200-vanilla-js-browser-transcription branch April 1, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:vanilla-js-browser Integration: Vanilla JS Browser language:javascript Language: JavaScript status:review-passed Self-review passed type:example New example

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Suggestion] Vanilla JavaScript browser transcription with no bundler (CDN script tag)

1 participant