Skip to content

Commit ff0748a

Browse files
github-actions[bot]examples-botlukeocodes
authored
[Example] 160 — LlamaIndex Audio Document Loader (Python) (#90)
## New example: LlamaIndex Audio Document Loader (Python) <!-- metadata type: example number: 160 slug: llamaindex-audio-loader-python language: python products: stt,intelligence integrations: llamaindex --> **Integration:** LlamaIndex | **Language:** Python | **Products:** STT, Audio Intelligence ### What this shows A custom LlamaIndex `BaseReader` that transcribes audio via Deepgram nova-3 and turns recordings into LlamaIndex Documents. Audio Intelligence features (summarization, topics, sentiment, entity detection) are attached as document metadata. Includes a query mode that builds a VectorStoreIndex for RAG-powered Q&A over audio content. ### Required secrets `OPENAI_API_KEY` — needed only for query mode (LlamaIndex default LLM and embeddings). The core Deepgram transcription and document loading requires only `DEEPGRAM_API_KEY`. --- *Built by Engineer on 2026-03-31* Co-authored-by: examples-bot <noreply@deepgram.com> Co-authored-by: Luke Oliff <luke@lukeoliff.com>
1 parent e1588e7 commit ff0748a

5 files changed

Lines changed: 401 additions & 0 deletions

File tree

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Deepgram — https://console.deepgram.com/
2+
DEEPGRAM_API_KEY=
3+
4+
# OpenAI — used by LlamaIndex default LLM and embeddings for querying the index
5+
# https://platform.openai.com/api-keys
6+
OPENAI_API_KEY=
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# LlamaIndex Audio Document Loader — Transcribe Audio into RAG Pipelines
2+
3+
Use Deepgram speech-to-text and Audio Intelligence to turn audio files into LlamaIndex Documents. Load podcasts, meetings, or lectures into a vector index and query them with natural language — all in a few lines of Python.
4+
5+
## What you'll build
6+
7+
A custom LlamaIndex `BaseReader` that transcribes audio URLs via Deepgram nova-3, enriches each Document with Audio Intelligence metadata (summary, topics, sentiment, entities), and feeds everything into a `VectorStoreIndex` for RAG-powered Q&A.
8+
9+
## Prerequisites
10+
11+
- Python 3.10+
12+
- Deepgram account — [get a free API key](https://console.deepgram.com/)
13+
- OpenAI account (for query mode) — [get an API key](https://platform.openai.com/api-keys)
14+
15+
## Environment variables
16+
17+
| Variable | Where to find it | Required for |
18+
|----------|-----------------|-------------|
19+
| `DEEPGRAM_API_KEY` | [Deepgram console](https://console.deepgram.com/) | Both modes |
20+
| `OPENAI_API_KEY` | [OpenAI dashboard](https://platform.openai.com/api-keys) | Query mode only |
21+
22+
Copy `.env.example` to `.env` and fill in your values.
23+
24+
## Install and run
25+
26+
```bash
27+
pip install -r requirements.txt
28+
29+
# Load audio into Documents — prints transcript and metadata
30+
python src/audio_loader.py https://dpgr.am/spacewalk.wav
31+
32+
# Query mode — ask a question about the audio content
33+
python src/audio_loader.py --query "What was the main topic discussed?" https://dpgr.am/spacewalk.wav
34+
```
35+
36+
## Key parameters
37+
38+
| Parameter | Value | Description |
39+
|-----------|-------|-------------|
40+
| `model` | `nova-3` | Deepgram's latest and most accurate STT model |
41+
| `smart_format` | `True` | Adds punctuation, capitalisation, and number formatting |
42+
| `summarize` | `"v2"` | Generates a short summary of the audio content |
43+
| `topics` | `True` | Detects topics discussed in the audio |
44+
| `sentiment` | `True` | Analyses overall sentiment of the content |
45+
| `detect_entities` | `True` | Extracts named entities (people, places, orgs) |
46+
47+
## How it works
48+
49+
1. `DeepgramAudioReader` implements LlamaIndex's `BaseReader` interface with a `load_data()` method
50+
2. For each audio URL, it calls Deepgram's pre-recorded API (`transcribe_url`) with Audio Intelligence features enabled — Deepgram fetches the audio server-side
51+
3. The transcript becomes `Document.text`; intelligence results (summary, topics, sentiment, entities) become `Document.metadata`
52+
4. In query mode, the Documents are embedded via OpenAI and stored in a `VectorStoreIndex` for similarity search and LLM-powered answers
53+
54+
## Extending this example
55+
56+
- **Multiple audio files** — pass several URLs to build an index across many recordings
57+
- **Custom metadata filters** — use LlamaIndex metadata filters to query only documents with specific topics or sentiment
58+
- **Swap the vector store** — replace the in-memory default with Chroma, Pinecone, or Weaviate
59+
- **Speaker diarization** — add `diarize=True` to split transcripts by speaker
60+
61+
## Related
62+
63+
- [Deepgram pre-recorded STT docs](https://developers.deepgram.com/docs/pre-recorded-audio)
64+
- [Deepgram Audio Intelligence docs](https://developers.deepgram.com/docs/audio-intelligence)
65+
- [Deepgram Python SDK](https://github.com/deepgram/deepgram-python-sdk)
66+
- [LlamaIndex custom data loaders](https://docs.llamaindex.ai/en/stable/module_guides/loading/connector/)
67+
- [LlamaIndex VectorStoreIndex](https://docs.llamaindex.ai/en/stable/module_guides/indexing/vector_store_index/)
68+
69+
## Starter templates
70+
71+
If you want a ready-to-run base for your own project, check the [deepgram-starters](https://github.com/orgs/deepgram-starters/repositories) org — there are starter repos for every language and every Deepgram product.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
deepgram-sdk>=3.0.0
2+
llama-index-core>=0.12.0
3+
llama-index-llms-openai>=0.4.0
4+
llama-index-embeddings-openai>=0.3.0
5+
python-dotenv>=1.0.0
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
"""LlamaIndex reader that transcribes audio via Deepgram and returns Documents.
2+
3+
Usage:
4+
# Load audio into LlamaIndex Documents and query them
5+
python src/audio_loader.py https://dpgr.am/spacewalk.wav
6+
7+
# Query mode — ask a question about the audio content
8+
python src/audio_loader.py --query "What is the main topic?" https://dpgr.am/spacewalk.wav
9+
"""
10+
11+
import os
12+
import sys
13+
from pathlib import Path
14+
from typing import List, Optional
15+
16+
from dotenv import load_dotenv
17+
18+
load_dotenv()
19+
20+
# SDK v5 Python: DeepgramClient reads DEEPGRAM_API_KEY from env automatically.
21+
from deepgram import DeepgramClient
22+
23+
# LlamaIndex core: Document is the atomic unit of data, BaseReader defines
24+
# the load_data() contract that all readers/loaders implement.
25+
from llama_index.core import VectorStoreIndex
26+
from llama_index.core.readers.base import BaseReader
27+
from llama_index.core.schema import Document
28+
29+
30+
class DeepgramAudioReader(BaseReader):
31+
"""Transcribes audio files using Deepgram and returns LlamaIndex Documents.
32+
33+
Each audio URL becomes one Document whose text is the transcript.
34+
Deepgram Audio Intelligence results (summary, topics, sentiment) are
35+
attached as document metadata for filtering and enrichment in RAG pipelines.
36+
"""
37+
38+
def __init__(
39+
self,
40+
model: str = "nova-3",
41+
smart_format: bool = True,
42+
summarize: Optional[str] = "v2",
43+
topics: bool = True,
44+
sentiment: bool = True,
45+
detect_entities: bool = True,
46+
language: str = "en",
47+
) -> None:
48+
self.model = model
49+
self.smart_format = smart_format
50+
self.summarize = summarize
51+
self.topics = topics
52+
self.sentiment = sentiment
53+
self.detect_entities = detect_entities
54+
self.language = language
55+
self._client = DeepgramClient()
56+
57+
def load_data(self, audio_urls: List[str]) -> List[Document]:
58+
"""Transcribe each audio URL and return a list of Documents.
59+
60+
This follows the same pattern as llama-index-readers-assemblyai:
61+
audio in → transcription API → Document objects out.
62+
"""
63+
documents = []
64+
for url in audio_urls:
65+
doc = self._transcribe_url(url)
66+
documents.append(doc)
67+
return documents
68+
69+
def _transcribe_url(self, url: str) -> Document:
70+
"""Transcribe a single audio URL and build a Document with metadata."""
71+
# ← transcribe_url has Deepgram fetch the audio server-side
72+
response = self._client.listen.v1.media.transcribe_url(
73+
url=url,
74+
model=self.model,
75+
smart_format=self.smart_format,
76+
# Audio Intelligence features run on the same transcription call —
77+
# they are parameters, not separate endpoints.
78+
summarize=self.summarize,
79+
topics=self.topics,
80+
sentiment=self.sentiment,
81+
detect_entities=self.detect_entities,
82+
language=self.language,
83+
)
84+
85+
# response.results.channels[0].alternatives[0].transcript
86+
channel = response.results.channels[0]
87+
alt = channel.alternatives[0]
88+
transcript = alt.transcript
89+
confidence = alt.confidence
90+
words = alt.words
91+
duration = words[-1].end if words else 0.0
92+
93+
metadata = {
94+
"source": url,
95+
"duration_seconds": duration,
96+
"confidence": confidence,
97+
"model": self.model,
98+
"language": self.language,
99+
}
100+
101+
# Audio Intelligence results live at response.results.{feature}
102+
summary = getattr(response.results, "summary", None)
103+
if summary and hasattr(summary, "short"):
104+
metadata["summary"] = summary.short
105+
106+
topics_result = getattr(response.results, "topics", None)
107+
if topics_result and hasattr(topics_result, "segments"):
108+
topic_list = []
109+
for segment in topics_result.segments:
110+
for topic in getattr(segment, "topics", []):
111+
if hasattr(topic, "topic"):
112+
topic_list.append(topic.topic)
113+
metadata["topics"] = list(dict.fromkeys(topic_list))
114+
115+
sentiments_result = getattr(response.results, "sentiments", None)
116+
if sentiments_result and hasattr(sentiments_result, "average"):
117+
metadata["average_sentiment"] = sentiments_result.average.sentiment
118+
119+
entities_result = getattr(response.results, "entities", None)
120+
if entities_result and hasattr(entities_result, "segments"):
121+
entity_list = []
122+
for segment in entities_result.segments:
123+
if hasattr(segment, "value"):
124+
entity_list.append(f"{segment.entity_type}: {segment.value}")
125+
metadata["entities"] = list(dict.fromkeys(entity_list))
126+
127+
return Document(text=transcript, metadata=metadata)
128+
129+
130+
def run_load(audio_urls: List[str]) -> None:
131+
"""Load audio into Documents and print their content and metadata."""
132+
reader = DeepgramAudioReader()
133+
documents = reader.load_data(audio_urls)
134+
135+
for i, doc in enumerate(documents):
136+
print(f"\n{'='*60}")
137+
print(f"Document {i+1}")
138+
print(f"{'='*60}")
139+
print(f"Source: {doc.metadata.get('source', 'unknown')}")
140+
print(f"Duration: {doc.metadata.get('duration_seconds', 0):.1f}s")
141+
print(f"Confidence: {doc.metadata.get('confidence', 0):.0%}")
142+
if "summary" in doc.metadata:
143+
print(f"Summary: {doc.metadata['summary']}")
144+
if "topics" in doc.metadata:
145+
print(f"Topics: {', '.join(doc.metadata['topics'][:5])}")
146+
if "entities" in doc.metadata:
147+
print(f"Entities: {', '.join(doc.metadata['entities'][:5])}")
148+
print(f"\nTranscript preview:\n {doc.text[:300]}...")
149+
150+
151+
def run_query(audio_urls: List[str], question: str) -> None:
152+
"""Load audio, build a VectorStoreIndex, and query it.
153+
154+
This demonstrates the full RAG pipeline: audio → Deepgram → Documents →
155+
embeddings → vector index → LLM-powered query.
156+
Requires OPENAI_API_KEY for LlamaIndex default LLM and embeddings.
157+
"""
158+
if not os.environ.get("OPENAI_API_KEY"):
159+
print("Error: OPENAI_API_KEY is not set.", file=sys.stderr)
160+
print("The query engine needs an LLM. Get a key at https://platform.openai.com/api-keys", file=sys.stderr)
161+
sys.exit(1)
162+
163+
reader = DeepgramAudioReader()
164+
documents = reader.load_data(audio_urls)
165+
166+
print(f"Loaded {len(documents)} document(s), building index...")
167+
168+
# VectorStoreIndex embeds the documents and stores them for similarity search.
169+
# Default uses OpenAI text-embedding-ada-002 for embeddings and gpt-3.5-turbo for queries.
170+
index = VectorStoreIndex.from_documents(documents)
171+
query_engine = index.as_query_engine()
172+
173+
response = query_engine.query(question)
174+
175+
print(f"\n{'='*60}")
176+
print(f"Question: {question}")
177+
print(f"{'='*60}")
178+
print(f"\n{response}")
179+
180+
181+
def main() -> None:
182+
if len(sys.argv) < 2:
183+
print("Usage:")
184+
print(" python src/audio_loader.py <audio-url> [<audio-url> ...]")
185+
print(" python src/audio_loader.py --query 'Your question' <audio-url> [<audio-url> ...]")
186+
sys.exit(1)
187+
188+
if sys.argv[1] == "--query":
189+
if len(sys.argv) < 4:
190+
print("Error: provide a question and at least one audio URL", file=sys.stderr)
191+
sys.exit(1)
192+
question = sys.argv[2]
193+
audio_urls = sys.argv[3:]
194+
run_query(audio_urls, question)
195+
else:
196+
audio_urls = sys.argv[1:]
197+
run_load(audio_urls)
198+
199+
200+
if __name__ == "__main__":
201+
main()

0 commit comments

Comments
 (0)