Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions examples/220-django-channels-live-stt-python/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Deepgram — https://console.deepgram.com/
DEEPGRAM_API_KEY=
86 changes: 86 additions & 0 deletions examples/220-django-channels-live-stt-python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Django Channels Real-Time Transcription with Deepgram Live STT

Build a Django 5 application that captures browser microphone audio and streams it through Django Channels WebSockets to Deepgram's Live STT API, displaying transcription results on the page in real-time. The Deepgram API key stays server-side — the browser never sees it.

## What you'll build

A Django web application that uses Django Channels to handle WebSocket connections from the browser. When a user clicks "Start Listening", the page captures microphone audio, streams it over a WebSocket to a Django Channels consumer, which forwards it to Deepgram's Live STT API (Nova-3). Transcription results flow back through the same WebSocket and appear on the page instantly.

## Prerequisites

- Python 3.11+
- Deepgram account — [get a free API key](https://console.deepgram.com/)

## Environment variables

| Variable | Where to find it |
|----------|-----------------|
| `DEEPGRAM_API_KEY` | [Deepgram console](https://console.deepgram.com/) |

Copy `.env.example` to `.env` and fill in your values.

## Install and run

```bash
cd examples/220-django-channels-live-stt-python

pip install -r requirements.txt

cp .env.example .env
# Edit .env and add your DEEPGRAM_API_KEY

python src/manage.py runserver
```

Then open http://127.0.0.1:8000 in your browser and click **Start Listening**.

## Key parameters

| Parameter | Value | Description |
|-----------|-------|-------------|
| `model` | `nova-3` | Deepgram's latest and most accurate speech recognition model |
| `smart_format` | `True` | Adds punctuation, capitalization, and number formatting |
| `interim_results` | `True` | Returns partial transcripts while you're still speaking |
| `encoding` | `linear16` | Raw 16-bit PCM audio format from the browser |
| `sample_rate` | `16000` | 16 kHz sample rate — good balance of quality and bandwidth |

## How it works

1. **Browser** — the HTML page uses `getUserMedia` to capture microphone audio, resamples it to 16 kHz linear16 PCM via a `ScriptProcessorNode`, and sends binary frames over a WebSocket to `/ws/transcribe/`
2. **Django Channels consumer** (`consumer.py`) — an `AsyncWebsocketConsumer` that on connect opens a Deepgram live STT WebSocket using the official Python SDK (`client.listen.v1.connect()`)
3. **Audio forwarding** — each binary WebSocket frame from the browser is forwarded to Deepgram via `connection.send_media()`
4. **Transcript delivery** — Deepgram fires `EventType.MESSAGE` callbacks with `ListenV1Results`; the consumer sends each transcript back to the browser as JSON with `is_final` indicating whether the result is finalized
5. **Browser display** — interim results appear greyed out and get replaced; final results are appended permanently

## Architecture

```
Browser Microphone
|
| WebSocket (binary PCM audio)
v
Django Channels Consumer
|
| Deepgram Python SDK (WebSocket)
v
Deepgram Live STT (nova-3)
|
| transcript JSON
v
Django Channels Consumer
|
| WebSocket (JSON)
v
Browser Display
```

## Related

- [Deepgram Live STT docs](https://developers.deepgram.com/docs/getting-started-with-live-streaming-audio)
- [Deepgram Python SDK](https://github.com/deepgram/deepgram-python-sdk)
- [Django Channels documentation](https://channels.readthedocs.io/)
- [Daphne ASGI server](https://github.com/django/daphne)

## Starter templates

If you want a ready-to-run base for your own project, check the [deepgram-starters](https://github.com/orgs/deepgram-starters/repositories) org — there are starter repos for every language and every Deepgram product.
5 changes: 5 additions & 0 deletions examples/220-django-channels-live-stt-python/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
django>=5.0,<6.0
channels>=4.0,<5.0
daphne>=4.0,<5.0
deepgram-sdk>=4.0.0
python-dotenv>=1.0.0
16 changes: 16 additions & 0 deletions examples/220-django-channels-live-stt-python/src/asgi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import os

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "settings")

from channels.auth import AuthMiddlewareStack
from channels.routing import ProtocolTypeRouter, URLRouter
from django.core.asgi import get_asgi_application

import urls

application = ProtocolTypeRouter(
{
"http": get_asgi_application(),
"websocket": AuthMiddlewareStack(URLRouter(urls.websocket_urlpatterns)),
}
)
75 changes: 75 additions & 0 deletions examples/220-django-channels-live-stt-python/src/consumer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
"""Django Channels WebSocket consumer that bridges browser audio to Deepgram Live STT.

Audio flows: browser microphone -> Django Channels WebSocket -> Deepgram Live STT -> transcript back to browser.
The DEEPGRAM_API_KEY stays server-side — the browser never sees it.
"""

import asyncio
import json
import os

from channels.generic.websocket import AsyncWebsocketConsumer
from deepgram import AsyncDeepgramClient
from deepgram.core.events import EventType
from deepgram.listen.v1.types import ListenV1Results


class TranscriptionConsumer(AsyncWebsocketConsumer):
"""Receives raw audio bytes from the browser, streams them to Deepgram, and
sends transcription results back as JSON messages."""

async def connect(self):
await self.accept()
self._dg_client = AsyncDeepgramClient(
api_key=os.environ["DEEPGRAM_API_KEY"]
)
# ← connect() returns a live WebSocket connection to Deepgram's STT API
self._dg_connection = await self._dg_client.listen.v1.connect(
model="nova-3",
smart_format=True,
interim_results=True,
encoding="linear16",
sample_rate=16000,
channels=1,
)

async def on_message(message) -> None:
if isinstance(message, ListenV1Results):
# message.channel.alternatives[0].transcript — the transcribed text
transcript = message.channel.alternatives[0].transcript
if transcript.strip():
await self.send(
text_data=json.dumps(
{
"transcript": transcript,
"is_final": message.is_final,
}
)
)

async def on_error(error) -> None:
await self.send(
text_data=json.dumps({"error": str(error)})
)

self._dg_connection.on(EventType.MESSAGE, on_message)
self._dg_connection.on(EventType.ERROR, on_error)

# Runs the Deepgram receive loop in the background so events dispatch
self._listener_task = asyncio.create_task(
self._dg_connection.start_listening()
)

async def disconnect(self, close_code):
if hasattr(self, "_dg_connection"):
try:
await self._dg_connection.send_close_stream()
except Exception:
pass
if hasattr(self, "_listener_task"):
self._listener_task.cancel()

async def receive(self, text_data=None, bytes_data=None):
# Browser sends raw PCM audio as binary WebSocket frames
if bytes_data:
await self._dg_connection.send_media(bytes_data)
18 changes: 18 additions & 0 deletions examples/220-django-channels-live-stt-python/src/manage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/usr/bin/env python
import os
import sys

from dotenv import load_dotenv

load_dotenv(os.path.join(os.path.dirname(__file__), "..", ".env"))


def main():
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "settings")
from django.core.management import execute_from_command_line

execute_from_command_line(sys.argv)


if __name__ == "__main__":
main()
41 changes: 41 additions & 0 deletions examples/220-django-channels-live-stt-python/src/settings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import os
from pathlib import Path

BASE_DIR = Path(__file__).resolve().parent

SECRET_KEY = os.environ.get("DJANGO_SECRET_KEY", "dev-only-insecure-key")

DEBUG = True

ALLOWED_HOSTS = ["*"]

INSTALLED_APPS = [
"daphne",
"django.contrib.staticfiles",
]

ROOT_URLCONF = "urls"

TEMPLATES = [
{
"BACKEND": "django.template.backends.django.DjangoTemplates",
"DIRS": [BASE_DIR / "templates"],
"APP_DIRS": False,
"OPTIONS": {
"context_processors": [],
},
},
]

ASGI_APPLICATION = "asgi.application"

CHANNEL_LAYERS = {
"default": {
"BACKEND": "channels.layers.InMemoryChannelLayer",
},
}

STATIC_URL = "static/"
STATICFILES_DIRS = []

DATABASES = {}
133 changes: 133 additions & 0 deletions examples/220-django-channels-live-stt-python/src/templates/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Deepgram Live Transcription — Django Channels</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: system-ui, sans-serif; max-width: 720px; margin: 2rem auto; padding: 0 1rem; color: #1a1a2e; }
h1 { margin-bottom: 0.5rem; }
p.subtitle { color: #555; margin-bottom: 1.5rem; }
button { font-size: 1rem; padding: 0.6rem 1.4rem; border: none; border-radius: 6px; cursor: pointer; }
#start { background: #13ef93; color: #1a1a2e; }
#start:disabled { opacity: 0.5; cursor: not-allowed; }
#stop { background: #e74c3c; color: #fff; margin-left: 0.5rem; }
#stop:disabled { opacity: 0.5; cursor: not-allowed; }
.controls { margin-bottom: 1.5rem; display: flex; align-items: center; gap: 0.5rem; }
#status { font-size: 0.85rem; color: #555; }
#transcript-box { background: #f5f5f5; border-radius: 8px; padding: 1rem; min-height: 200px; white-space: pre-wrap; font-size: 0.95rem; line-height: 1.6; }
.interim { color: #888; }
</style>
</head>
<body>
<h1>Deepgram Live Transcription</h1>
<p class="subtitle">Powered by Django Channels + Deepgram Nova-3</p>

<div class="controls">
<button id="start">Start Listening</button>
<button id="stop" disabled>Stop</button>
<span id="status"></span>
</div>

<div id="transcript-box"></div>

<script>
const startBtn = document.getElementById('start');
const stopBtn = document.getElementById('stop');
const status = document.getElementById('status');
const box = document.getElementById('transcript-box');

let ws, mediaStream, processor, audioCtx;

startBtn.addEventListener('click', async () => {
startBtn.disabled = true;
status.textContent = 'Requesting microphone…';

try {
mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true });
} catch (err) {
status.textContent = 'Microphone access denied';
startBtn.disabled = false;
return;
}

const protocol = location.protocol === 'https:' ? 'wss' : 'ws';
ws = new WebSocket(`${protocol}://${location.host}/ws/transcribe/`);

ws.onopen = () => {
status.textContent = 'Connected — speak now';
stopBtn.disabled = false;

audioCtx = new AudioContext({ sampleRate: 16000 });
const source = audioCtx.createMediaStreamSource(mediaStream);

// ScriptProcessorNode sends raw PCM chunks to the server.
// bufferSize 4096 at 16 kHz ≈ 256 ms per chunk — good balance of latency vs overhead.
processor = audioCtx.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
if (ws.readyState !== WebSocket.OPEN) return;
const float32 = e.inputBuffer.getChannelData(0);
// Convert float32 [-1,1] to int16 for Deepgram's linear16 encoding
const int16 = new Int16Array(float32.length);
for (let i = 0; i < float32.length; i++) {
int16[i] = Math.max(-32768, Math.min(32767, Math.round(float32[i] * 32767)));
}
ws.send(int16.buffer);
};

source.connect(processor);
processor.connect(audioCtx.destination);
};

ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.error) {
status.textContent = `Error: ${data.error}`;
return;
}
if (data.is_final) {
// Append finalized transcript as permanent text
const existing = box.querySelector('.interim');
if (existing) existing.remove();
box.textContent += data.transcript + '\n';
} else {
// Show interim result as greyed-out text that gets replaced
let interim = box.querySelector('.interim');
if (!interim) {
interim = document.createElement('span');
interim.className = 'interim';
box.appendChild(interim);
}
interim.textContent = data.transcript;
}
box.scrollTop = box.scrollHeight;
};

ws.onclose = () => {
status.textContent = 'Disconnected';
cleanup();
};

ws.onerror = () => {
status.textContent = 'WebSocket error';
cleanup();
};
});

stopBtn.addEventListener('click', () => {
if (ws) ws.close();
cleanup();
status.textContent = 'Stopped';
});

function cleanup() {
startBtn.disabled = false;
stopBtn.disabled = true;
if (processor) { processor.disconnect(); processor = null; }
if (audioCtx) { audioCtx.close(); audioCtx = null; }
if (mediaStream) { mediaStream.getTracks().forEach(t => t.stop()); mediaStream = null; }
}
</script>
</body>
</html>
12 changes: 12 additions & 0 deletions examples/220-django-channels-live-stt-python/src/urls.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from django.urls import path, re_path

from views import index
from consumer import TranscriptionConsumer

urlpatterns = [
path("", index),
]

websocket_urlpatterns = [
re_path(r"ws/transcribe/$", TranscriptionConsumer.as_asgi()),
]
Loading
Loading