[Example] 051 — Next.js Streaming STT + TTS with Deepgram via Vercel AI SDK (#103)

github-actions[bot] · examples-bot · web-flow · commit a9ee2011a391 · 2026-04-02T09:37:56.000+01:00
## New example: Next.js Streaming STT + TTS with Deepgram via Vercel AI SDK  **Integration:** Vercel AI SDK + Next.js | **Language:** TypeScript | **Products:** STT, TTS ### What this shows A full-stack Next.js 15 App Router application that captures microphone audio in the browser and streams it to Deepgram for real-time transcription (nova-3), with live interim results displayed as the user speaks. Includes text-to-speech playback using Deepgram Aura 2 via the Vercel AI SDK's provider-agnostic `generateSpeech()` function. Demonstrates secure temporary API key provisioning so the main key never reaches the browser. ### Required secrets None — only `DEEPGRAM_API_KEY` required Closes #24 --- *Built by Engineer on 2026-04-01* Co-authored-by: examples-bot <noreply@deepgram.com>
diff --git a/examples/051-nextjs-vercel-ai-sdk-streaming/.env.example b/examples/051-nextjs-vercel-ai-sdk-streaming/.env.example
@@ -0,0 +1,2 @@
+# Deepgram — https://console.deepgram.com/
+DEEPGRAM_API_KEY=
diff --git a/examples/051-nextjs-vercel-ai-sdk-streaming/README.md b/examples/051-nextjs-vercel-ai-sdk-streaming/README.md
@@ -0,0 +1,75 @@
+# Next.js Streaming STT + TTS with Deepgram via the Vercel AI SDK
+
+A full-stack Next.js 15 application that captures microphone audio in the browser and streams it to Deepgram for real-time transcription using nova-3, then reads the transcript back using Deepgram Aura 2 text-to-speech through the Vercel AI SDK's `generateSpeech()` interface. Builds on [050-vercel-ai-sdk-node](../050-vercel-ai-sdk-node/) by showing the complete browser-to-server streaming pattern.
+
+## What you'll build
+
+A Next.js App Router application where users click "Start Listening", speak into their microphone, and see a live transcript appear word-by-word. Interim (partial) results show in gray as Deepgram processes speech in real time. Once done, users can click "Read Back" to hear the transcript spoken aloud via Deepgram's Aura 2 TTS — powered by the Vercel AI SDK's provider-agnostic `generateSpeech()` function.
+
+## Prerequisites
+
+- Node.js 18+
+- Deepgram account — [get a free API key](https://console.deepgram.com/)
+- A browser with microphone access (Chrome, Firefox, Edge)
+
+## Environment variables
+
+Copy `.env.example` to `.env` and fill in your key:
+
+| Variable | Where to find it |
+|----------|-----------------|
+| `DEEPGRAM_API_KEY` | [Deepgram console → API Keys](https://console.deepgram.com/) |
+
+## Install and run
+
+```bash
+cp .env.example .env
+# Add your DEEPGRAM_API_KEY to .env
+
+npm install
+npm run dev
+```
+
+Open [http://localhost:3000](http://localhost:3000) in your browser.
+
+## Key parameters
+
+| Parameter | Value | Description |
+|-----------|-------|-------------|
+| `model` | `nova-3` | Deepgram's latest and most accurate STT model |
+| `interim_results` | `true` | Returns partial transcripts for low-latency display |
+| `smart_format` | `true` | Adds punctuation, capitalization, and number formatting |
+| `encoding` | `linear16` | Raw PCM audio format sent from the browser to Deepgram |
+| `sample_rate` | `16000` | 16 kHz for STT (sufficient for speech, keeps bandwidth low) |
+| TTS voice | `aura-2-helena-en` | Natural-sounding female English voice for text-to-speech |
+
+## How it works
+
+1. **Temporary key** — The browser calls `GET /api/deepgram-key`, which uses the Deepgram SDK to mint a short-lived API key (10-second TTL) so the main key never reaches the client
+2. **WebSocket connection** — The browser opens a WebSocket directly to `wss://api.deepgram.com/v1/listen` using the temporary key, with nova-3, linear16 encoding, and interim results enabled
+3. **Microphone capture** — `getUserMedia()` captures mono audio at 16 kHz; a `ScriptProcessorNode` converts float32 samples to int16 PCM and sends them over the WebSocket
+4. **Live transcript** — Deepgram returns JSON messages with `is_final` and interim results; final results accumulate as the transcript, while interim results show as gray preview text
+5. **TTS playback** — "Read Back" sends the transcript to `POST /api/speak`, which calls the Vercel AI SDK's `generateSpeech()` with `deepgram.speech('aura-2-helena-en')` and returns raw linear16 PCM audio
+6. **Audio playback** — The browser decodes the linear16 PCM into a float32 AudioBuffer and plays it through the Web Audio API
+
+## Architecture
+
+```
+Browser                          Next.js Server              Deepgram
+  │                                    │                        │
+  ├─ GET /api/deepgram-key ───────────►│                        │
+  │                                    ├─ createKey() ─────────►│
+  │◄── { key: "tmp_..." } ────────────┤◄── temporary key ──────┤
+  │                                    │                        │
+  ├─ WebSocket wss://api.deepgram.com/v1/listen ───────────────►│
+  ├─ send(pcm audio) ─────────────────────────────────────────►│
+  │◄── { transcript, is_final } ───────────────────────────────┤
+  │                                    │                        │
+  ├─ POST /api/speak { text } ────────►│                        │
+  │                                    ├─ generateSpeech() ────►│
+  │◄── audio/pcm ─────────────────────┤◄── TTS audio ─────────┤
+```
+
+## Starter templates
+
+[deepgram-starters](https://github.com/orgs/deepgram-starters/repositories)
diff --git a/examples/051-nextjs-vercel-ai-sdk-streaming/next.config.ts b/examples/051-nextjs-vercel-ai-sdk-streaming/next.config.ts
@@ -0,0 +1,5 @@
+import type { NextConfig } from "next";
+
+const nextConfig: NextConfig = {};
+
+export default nextConfig;
diff --git a/examples/051-nextjs-vercel-ai-sdk-streaming/package.json b/examples/051-nextjs-vercel-ai-sdk-streaming/package.json
@@ -0,0 +1,29 @@
+{
+  "name": "deepgram-nextjs-vercel-ai-sdk-streaming",
+  "version": "1.0.0",
+  "private": true,
+  "description": "Next.js app with real-time streaming transcription and TTS using Deepgram via the Vercel AI SDK",
+  "scripts": {
+    "dev": "next dev",
+    "build": "next build",
+    "start": "next start",
+    "test": "node tests/test.js"
+  },
+  "dependencies": {
+    "@ai-sdk/deepgram": "^2.0.0",
+    "@deepgram/sdk": "^3.11.0",
+    "ai": "^6.0.0",
+    "next": "^15.0.0",
+    "react": "^19.0.0",
+    "react-dom": "^19.0.0"
+  },
+  "devDependencies": {
+    "@types/node": "^22.0.0",
+    "@types/react": "^19.0.0",
+    "@types/react-dom": "^19.0.0",
+    "typescript": "^5.7.0"
+  },
+  "engines": {
+    "node": ">=18"
+  }
+}
diff --git a/examples/051-nextjs-vercel-ai-sdk-streaming/src/app/api/deepgram-key/route.ts b/examples/051-nextjs-vercel-ai-sdk-streaming/src/app/api/deepgram-key/route.ts
@@ -0,0 +1,46 @@
+import { NextResponse } from "next/server";
+import { createClient } from "@deepgram/sdk";
+
+// Returns a short-lived Deepgram API key so the browser can open a
+// WebSocket to Deepgram directly.  This avoids exposing the main key
+// in client-side code.  The temporary key expires after 10 seconds —
+// long enough to establish a connection but useless if leaked later.
+export async function GET() {
+  const apiKey = process.env.DEEPGRAM_API_KEY;
+  if (!apiKey) {
+    return NextResponse.json(
+      { error: "DEEPGRAM_API_KEY is not configured" },
+      { status: 500 },
+    );
+  }
+
+  try {
+    const client = createClient(apiKey);
+
+    // ← createKey() mints a temporary key scoped to the project
+    const { result } = await client.keys.createKey(
+      // Use the key's own project — pass a dummy project id; the SDK
+      // will derive it from the API key automatically when using v1.
+      // For the temporary key approach we use manage.getProjects first.
+      await getProjectId(client),
+      {
+        comment: "temporary browser key",
+        scopes: ["usage:write"],
+        time_to_live_in_seconds: 10,
+      },
+    );
+
+    return NextResponse.json({ key: result.key });
+  } catch (err: unknown) {
+    const message = err instanceof Error ? err.message : "Unknown error";
+    console.error("Failed to create temporary Deepgram key:", message);
+    return NextResponse.json({ error: message }, { status: 500 });
+  }
+}
+
+async function getProjectId(client: ReturnType<typeof createClient>) {
+  const { result } = await client.manage.getProjects();
+  const project = result.projects[0];
+  if (!project) throw new Error("No Deepgram projects found");
+  return project.project_id;
+}
diff --git a/examples/051-nextjs-vercel-ai-sdk-streaming/src/app/api/speak/route.ts b/examples/051-nextjs-vercel-ai-sdk-streaming/src/app/api/speak/route.ts
@@ -0,0 +1,45 @@
+import { NextRequest, NextResponse } from "next/server";
+import { deepgram } from "@ai-sdk/deepgram";
+import {
+  experimental_generateSpeech as generateSpeech,
+} from "ai";
+
+// POST /api/speak  { text: "Hello world" }
+// Returns raw linear16 PCM audio (24 kHz, mono) as application/octet-stream.
+// Uses the Vercel AI SDK's generateSpeech() with the @ai-sdk/deepgram
+// provider so the same code pattern works with any AI SDK speech provider.
+export async function POST(req: NextRequest) {
+  const apiKey = process.env.DEEPGRAM_API_KEY;
+  if (!apiKey) {
+    return NextResponse.json(
+      { error: "DEEPGRAM_API_KEY is not configured" },
+      { status: 500 },
+    );
+  }
+
+  const { text } = await req.json();
+  if (!text || typeof text !== "string") {
+    return NextResponse.json({ error: "text is required" }, { status: 400 });
+  }
+
+  // ← generateSpeech() is provider-agnostic; deepgram.speech() routes to Deepgram Aura TTS
+  const speech = await generateSpeech({
+    model: deepgram.speech("aura-2-helena-en"),
+    text,
+    providerOptions: {
+      deepgram: {
+        // linear16 is raw PCM — easier for the browser to decode via AudioContext
+        encoding: "linear16",
+        sample_rate: 24000,
+      },
+    },
+  });
+
+  return new NextResponse(Buffer.from(speech.audio.uint8Array), {
+    headers: {
+      "Content-Type": "application/octet-stream",
+      "X-Audio-Encoding": "linear16",
+      "X-Audio-Sample-Rate": "24000",
+    },
+  });
+}
diff --git a/examples/051-nextjs-vercel-ai-sdk-streaming/src/app/layout.tsx b/examples/051-nextjs-vercel-ai-sdk-streaming/src/app/layout.tsx
@@ -0,0 +1,21 @@
+import type { Metadata } from "next";
+
+export const metadata: Metadata = {
+  title: "Deepgram Streaming STT + TTS — Next.js",
+  description:
+    "Real-time speech-to-text and text-to-speech with Deepgram via the Vercel AI SDK",
+};
+
+export default function RootLayout({
+  children,
+}: {
+  children: React.ReactNode;
+}) {
+  return (
+    <html lang="en">
+      <body style={{ fontFamily: "system-ui, sans-serif", margin: "2rem" }}>
+        {children}
+      </body>
+    </html>
+  );
+}
diff --git a/examples/051-nextjs-vercel-ai-sdk-streaming/src/app/page.tsx b/examples/051-nextjs-vercel-ai-sdk-streaming/src/app/page.tsx
diff --git a/examples/051-nextjs-vercel-ai-sdk-streaming/tests/test.js b/examples/051-nextjs-vercel-ai-sdk-streaming/tests/test.js
diff --git a/examples/051-nextjs-vercel-ai-sdk-streaming/tsconfig.json b/examples/051-nextjs-vercel-ai-sdk-streaming/tsconfig.json

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+# Deepgram — https://console.deepgram.com/`
	`2`	`+DEEPGRAM_API_KEY=`