Skip to content

Commit a9ee201

Browse files
github-actions[bot]examples-bot
andauthored
[Example] 051 — Next.js Streaming STT + TTS with Deepgram via Vercel AI SDK (#103)
## New example: Next.js Streaming STT + TTS with Deepgram via Vercel AI SDK <!-- metadata type: example number: 051 slug: nextjs-vercel-ai-sdk-streaming language: TypeScript products: stt|tts integrations: vercel-ai-sdk --> **Integration:** Vercel AI SDK + Next.js | **Language:** TypeScript | **Products:** STT, TTS ### What this shows A full-stack Next.js 15 App Router application that captures microphone audio in the browser and streams it to Deepgram for real-time transcription (nova-3), with live interim results displayed as the user speaks. Includes text-to-speech playback using Deepgram Aura 2 via the Vercel AI SDK's provider-agnostic `generateSpeech()` function. Demonstrates secure temporary API key provisioning so the main key never reaches the browser. ### Required secrets None — only `DEEPGRAM_API_KEY` required Closes #24 --- *Built by Engineer on 2026-04-01* Co-authored-by: examples-bot <noreply@deepgram.com>
1 parent bcca797 commit a9ee201

10 files changed

Lines changed: 578 additions & 0 deletions

File tree

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Deepgram — https://console.deepgram.com/
2+
DEEPGRAM_API_KEY=
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Next.js Streaming STT + TTS with Deepgram via the Vercel AI SDK
2+
3+
A full-stack Next.js 15 application that captures microphone audio in the browser and streams it to Deepgram for real-time transcription using nova-3, then reads the transcript back using Deepgram Aura 2 text-to-speech through the Vercel AI SDK's `generateSpeech()` interface. Builds on [050-vercel-ai-sdk-node](../050-vercel-ai-sdk-node/) by showing the complete browser-to-server streaming pattern.
4+
5+
## What you'll build
6+
7+
A Next.js App Router application where users click "Start Listening", speak into their microphone, and see a live transcript appear word-by-word. Interim (partial) results show in gray as Deepgram processes speech in real time. Once done, users can click "Read Back" to hear the transcript spoken aloud via Deepgram's Aura 2 TTS — powered by the Vercel AI SDK's provider-agnostic `generateSpeech()` function.
8+
9+
## Prerequisites
10+
11+
- Node.js 18+
12+
- Deepgram account — [get a free API key](https://console.deepgram.com/)
13+
- A browser with microphone access (Chrome, Firefox, Edge)
14+
15+
## Environment variables
16+
17+
Copy `.env.example` to `.env` and fill in your key:
18+
19+
| Variable | Where to find it |
20+
|----------|-----------------|
21+
| `DEEPGRAM_API_KEY` | [Deepgram console → API Keys](https://console.deepgram.com/) |
22+
23+
## Install and run
24+
25+
```bash
26+
cp .env.example .env
27+
# Add your DEEPGRAM_API_KEY to .env
28+
29+
npm install
30+
npm run dev
31+
```
32+
33+
Open [http://localhost:3000](http://localhost:3000) in your browser.
34+
35+
## Key parameters
36+
37+
| Parameter | Value | Description |
38+
|-----------|-------|-------------|
39+
| `model` | `nova-3` | Deepgram's latest and most accurate STT model |
40+
| `interim_results` | `true` | Returns partial transcripts for low-latency display |
41+
| `smart_format` | `true` | Adds punctuation, capitalization, and number formatting |
42+
| `encoding` | `linear16` | Raw PCM audio format sent from the browser to Deepgram |
43+
| `sample_rate` | `16000` | 16 kHz for STT (sufficient for speech, keeps bandwidth low) |
44+
| TTS voice | `aura-2-helena-en` | Natural-sounding female English voice for text-to-speech |
45+
46+
## How it works
47+
48+
1. **Temporary key** — The browser calls `GET /api/deepgram-key`, which uses the Deepgram SDK to mint a short-lived API key (10-second TTL) so the main key never reaches the client
49+
2. **WebSocket connection** — The browser opens a WebSocket directly to `wss://api.deepgram.com/v1/listen` using the temporary key, with nova-3, linear16 encoding, and interim results enabled
50+
3. **Microphone capture**`getUserMedia()` captures mono audio at 16 kHz; a `ScriptProcessorNode` converts float32 samples to int16 PCM and sends them over the WebSocket
51+
4. **Live transcript** — Deepgram returns JSON messages with `is_final` and interim results; final results accumulate as the transcript, while interim results show as gray preview text
52+
5. **TTS playback** — "Read Back" sends the transcript to `POST /api/speak`, which calls the Vercel AI SDK's `generateSpeech()` with `deepgram.speech('aura-2-helena-en')` and returns raw linear16 PCM audio
53+
6. **Audio playback** — The browser decodes the linear16 PCM into a float32 AudioBuffer and plays it through the Web Audio API
54+
55+
## Architecture
56+
57+
```
58+
Browser Next.js Server Deepgram
59+
│ │ │
60+
├─ GET /api/deepgram-key ───────────►│ │
61+
│ ├─ createKey() ─────────►│
62+
│◄── { key: "tmp_..." } ────────────┤◄── temporary key ──────┤
63+
│ │ │
64+
├─ WebSocket wss://api.deepgram.com/v1/listen ───────────────►│
65+
├─ send(pcm audio) ─────────────────────────────────────────►│
66+
│◄── { transcript, is_final } ───────────────────────────────┤
67+
│ │ │
68+
├─ POST /api/speak { text } ────────►│ │
69+
│ ├─ generateSpeech() ────►│
70+
│◄── audio/pcm ─────────────────────┤◄── TTS audio ─────────┤
71+
```
72+
73+
## Starter templates
74+
75+
[deepgram-starters](https://github.com/orgs/deepgram-starters/repositories)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
import type { NextConfig } from "next";
2+
3+
const nextConfig: NextConfig = {};
4+
5+
export default nextConfig;
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
{
2+
"name": "deepgram-nextjs-vercel-ai-sdk-streaming",
3+
"version": "1.0.0",
4+
"private": true,
5+
"description": "Next.js app with real-time streaming transcription and TTS using Deepgram via the Vercel AI SDK",
6+
"scripts": {
7+
"dev": "next dev",
8+
"build": "next build",
9+
"start": "next start",
10+
"test": "node tests/test.js"
11+
},
12+
"dependencies": {
13+
"@ai-sdk/deepgram": "^2.0.0",
14+
"@deepgram/sdk": "^3.11.0",
15+
"ai": "^6.0.0",
16+
"next": "^15.0.0",
17+
"react": "^19.0.0",
18+
"react-dom": "^19.0.0"
19+
},
20+
"devDependencies": {
21+
"@types/node": "^22.0.0",
22+
"@types/react": "^19.0.0",
23+
"@types/react-dom": "^19.0.0",
24+
"typescript": "^5.7.0"
25+
},
26+
"engines": {
27+
"node": ">=18"
28+
}
29+
}
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
import { NextResponse } from "next/server";
2+
import { createClient } from "@deepgram/sdk";
3+
4+
// Returns a short-lived Deepgram API key so the browser can open a
5+
// WebSocket to Deepgram directly. This avoids exposing the main key
6+
// in client-side code. The temporary key expires after 10 seconds —
7+
// long enough to establish a connection but useless if leaked later.
8+
export async function GET() {
9+
const apiKey = process.env.DEEPGRAM_API_KEY;
10+
if (!apiKey) {
11+
return NextResponse.json(
12+
{ error: "DEEPGRAM_API_KEY is not configured" },
13+
{ status: 500 },
14+
);
15+
}
16+
17+
try {
18+
const client = createClient(apiKey);
19+
20+
// ← createKey() mints a temporary key scoped to the project
21+
const { result } = await client.keys.createKey(
22+
// Use the key's own project — pass a dummy project id; the SDK
23+
// will derive it from the API key automatically when using v1.
24+
// For the temporary key approach we use manage.getProjects first.
25+
await getProjectId(client),
26+
{
27+
comment: "temporary browser key",
28+
scopes: ["usage:write"],
29+
time_to_live_in_seconds: 10,
30+
},
31+
);
32+
33+
return NextResponse.json({ key: result.key });
34+
} catch (err: unknown) {
35+
const message = err instanceof Error ? err.message : "Unknown error";
36+
console.error("Failed to create temporary Deepgram key:", message);
37+
return NextResponse.json({ error: message }, { status: 500 });
38+
}
39+
}
40+
41+
async function getProjectId(client: ReturnType<typeof createClient>) {
42+
const { result } = await client.manage.getProjects();
43+
const project = result.projects[0];
44+
if (!project) throw new Error("No Deepgram projects found");
45+
return project.project_id;
46+
}
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
import { NextRequest, NextResponse } from "next/server";
2+
import { deepgram } from "@ai-sdk/deepgram";
3+
import {
4+
experimental_generateSpeech as generateSpeech,
5+
} from "ai";
6+
7+
// POST /api/speak { text: "Hello world" }
8+
// Returns raw linear16 PCM audio (24 kHz, mono) as application/octet-stream.
9+
// Uses the Vercel AI SDK's generateSpeech() with the @ai-sdk/deepgram
10+
// provider so the same code pattern works with any AI SDK speech provider.
11+
export async function POST(req: NextRequest) {
12+
const apiKey = process.env.DEEPGRAM_API_KEY;
13+
if (!apiKey) {
14+
return NextResponse.json(
15+
{ error: "DEEPGRAM_API_KEY is not configured" },
16+
{ status: 500 },
17+
);
18+
}
19+
20+
const { text } = await req.json();
21+
if (!text || typeof text !== "string") {
22+
return NextResponse.json({ error: "text is required" }, { status: 400 });
23+
}
24+
25+
// ← generateSpeech() is provider-agnostic; deepgram.speech() routes to Deepgram Aura TTS
26+
const speech = await generateSpeech({
27+
model: deepgram.speech("aura-2-helena-en"),
28+
text,
29+
providerOptions: {
30+
deepgram: {
31+
// linear16 is raw PCM — easier for the browser to decode via AudioContext
32+
encoding: "linear16",
33+
sample_rate: 24000,
34+
},
35+
},
36+
});
37+
38+
return new NextResponse(Buffer.from(speech.audio.uint8Array), {
39+
headers: {
40+
"Content-Type": "application/octet-stream",
41+
"X-Audio-Encoding": "linear16",
42+
"X-Audio-Sample-Rate": "24000",
43+
},
44+
});
45+
}
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
import type { Metadata } from "next";
2+
3+
export const metadata: Metadata = {
4+
title: "Deepgram Streaming STT + TTS — Next.js",
5+
description:
6+
"Real-time speech-to-text and text-to-speech with Deepgram via the Vercel AI SDK",
7+
};
8+
9+
export default function RootLayout({
10+
children,
11+
}: {
12+
children: React.ReactNode;
13+
}) {
14+
return (
15+
<html lang="en">
16+
<body style={{ fontFamily: "system-ui, sans-serif", margin: "2rem" }}>
17+
{children}
18+
</body>
19+
</html>
20+
);
21+
}

0 commit comments

Comments
 (0)