[Example] 160 — LlamaIndex Audio Document Loader (Python) (#90)

github-actions[bot] · examples-bot · lukeocodes · web-flow · commit ff0748aa739f · 2026-04-01T18:46:04.000Z
## New example: LlamaIndex Audio Document Loader (Python)

&lt;!-- metadata
type: example
number: 160
slug: llamaindex-audio-loader-python
language: python
products: stt,intelligence
integrations: llamaindex
--&gt;

**Integration:** LlamaIndex | **Language:** Python | **Products:** STT,
Audio Intelligence

### What this shows
A custom LlamaIndex `BaseReader` that transcribes audio via Deepgram
nova-3 and turns recordings into LlamaIndex Documents. Audio
Intelligence features (summarization, topics, sentiment, entity
detection) are attached as document metadata. Includes a query mode that
builds a VectorStoreIndex for RAG-powered Q&amp;A over audio content.

### Required secrets
`OPENAI_API_KEY` — needed only for query mode (LlamaIndex default LLM
and embeddings). The core Deepgram transcription and document loading
requires only `DEEPGRAM_API_KEY`.

---
*Built by Engineer on 2026-03-31*

Co-authored-by: examples-bot &lt;noreply@deepgram.com&gt;
Co-authored-by: Luke Oliff &lt;luke@lukeoliff.com&gt;
diff --git a/examples/160-llamaindex-audio-loader-python/.env.example b/examples/160-llamaindex-audio-loader-python/.env.example
@@ -0,0 +1,6 @@
+# Deepgram — https://console.deepgram.com/
+DEEPGRAM_API_KEY=
+
+# OpenAI — used by LlamaIndex default LLM and embeddings for querying the index
+# https://platform.openai.com/api-keys
+OPENAI_API_KEY=
diff --git a/examples/160-llamaindex-audio-loader-python/README.md b/examples/160-llamaindex-audio-loader-python/README.md
@@ -0,0 +1,71 @@
+# LlamaIndex Audio Document Loader — Transcribe Audio into RAG Pipelines
+
+Use Deepgram speech-to-text and Audio Intelligence to turn audio files into LlamaIndex Documents. Load podcasts, meetings, or lectures into a vector index and query them with natural language — all in a few lines of Python.
+
+## What you'll build
+
+A custom LlamaIndex `BaseReader` that transcribes audio URLs via Deepgram nova-3, enriches each Document with Audio Intelligence metadata (summary, topics, sentiment, entities), and feeds everything into a `VectorStoreIndex` for RAG-powered Q&A.
+
+## Prerequisites
+
+- Python 3.10+
+- Deepgram account — [get a free API key](https://console.deepgram.com/)
+- OpenAI account (for query mode) — [get an API key](https://platform.openai.com/api-keys)
+
+## Environment variables
+
+| Variable | Where to find it | Required for |
+|----------|-----------------|-------------|
+| `DEEPGRAM_API_KEY` | [Deepgram console](https://console.deepgram.com/) | Both modes |
+| `OPENAI_API_KEY` | [OpenAI dashboard](https://platform.openai.com/api-keys) | Query mode only |
+
+Copy `.env.example` to `.env` and fill in your values.
+
+## Install and run
+
+```bash
+pip install -r requirements.txt
+
+# Load audio into Documents — prints transcript and metadata
+python src/audio_loader.py https://dpgr.am/spacewalk.wav
+
+# Query mode — ask a question about the audio content
+python src/audio_loader.py --query "What was the main topic discussed?" https://dpgr.am/spacewalk.wav
+```
+
+## Key parameters
+
+| Parameter | Value | Description |
+|-----------|-------|-------------|
+| `model` | `nova-3` | Deepgram's latest and most accurate STT model |
+| `smart_format` | `True` | Adds punctuation, capitalisation, and number formatting |
+| `summarize` | `"v2"` | Generates a short summary of the audio content |
+| `topics` | `True` | Detects topics discussed in the audio |
+| `sentiment` | `True` | Analyses overall sentiment of the content |
+| `detect_entities` | `True` | Extracts named entities (people, places, orgs) |
+
+## How it works
+
+1. `DeepgramAudioReader` implements LlamaIndex's `BaseReader` interface with a `load_data()` method
+2. For each audio URL, it calls Deepgram's pre-recorded API (`transcribe_url`) with Audio Intelligence features enabled — Deepgram fetches the audio server-side
+3. The transcript becomes `Document.text`; intelligence results (summary, topics, sentiment, entities) become `Document.metadata`
+4. In query mode, the Documents are embedded via OpenAI and stored in a `VectorStoreIndex` for similarity search and LLM-powered answers
+
+## Extending this example
+
+- **Multiple audio files** — pass several URLs to build an index across many recordings
+- **Custom metadata filters** — use LlamaIndex metadata filters to query only documents with specific topics or sentiment
+- **Swap the vector store** — replace the in-memory default with Chroma, Pinecone, or Weaviate
+- **Speaker diarization** — add `diarize=True` to split transcripts by speaker
+
+## Related
+
+- [Deepgram pre-recorded STT docs](https://developers.deepgram.com/docs/pre-recorded-audio)
+- [Deepgram Audio Intelligence docs](https://developers.deepgram.com/docs/audio-intelligence)
+- [Deepgram Python SDK](https://github.com/deepgram/deepgram-python-sdk)
+- [LlamaIndex custom data loaders](https://docs.llamaindex.ai/en/stable/module_guides/loading/connector/)
+- [LlamaIndex VectorStoreIndex](https://docs.llamaindex.ai/en/stable/module_guides/indexing/vector_store_index/)
+
+## Starter templates
+
+If you want a ready-to-run base for your own project, check the [deepgram-starters](https://github.com/orgs/deepgram-starters/repositories) org — there are starter repos for every language and every Deepgram product.
diff --git a/examples/160-llamaindex-audio-loader-python/requirements.txt b/examples/160-llamaindex-audio-loader-python/requirements.txt
@@ -0,0 +1,5 @@
+deepgram-sdk>=3.0.0
+llama-index-core>=0.12.0
+llama-index-llms-openai>=0.4.0
+llama-index-embeddings-openai>=0.3.0
+python-dotenv>=1.0.0
diff --git a/examples/160-llamaindex-audio-loader-python/src/audio_loader.py b/examples/160-llamaindex-audio-loader-python/src/audio_loader.py
@@ -0,0 +1,201 @@
+"""LlamaIndex reader that transcribes audio via Deepgram and returns Documents.
+
+Usage:
+    # Load audio into LlamaIndex Documents and query them
+    python src/audio_loader.py https://dpgr.am/spacewalk.wav
+
+    # Query mode — ask a question about the audio content
+    python src/audio_loader.py --query "What is the main topic?" https://dpgr.am/spacewalk.wav
+"""
+
+import os
+import sys
+from pathlib import Path
+from typing import List, Optional
+
+from dotenv import load_dotenv
+
+load_dotenv()
+
+# SDK v5 Python: DeepgramClient reads DEEPGRAM_API_KEY from env automatically.
+from deepgram import DeepgramClient
+
+# LlamaIndex core: Document is the atomic unit of data, BaseReader defines
+# the load_data() contract that all readers/loaders implement.
+from llama_index.core import VectorStoreIndex
+from llama_index.core.readers.base import BaseReader
+from llama_index.core.schema import Document
+
+
+class DeepgramAudioReader(BaseReader):
+    """Transcribes audio files using Deepgram and returns LlamaIndex Documents.
+
+    Each audio URL becomes one Document whose text is the transcript.
+    Deepgram Audio Intelligence results (summary, topics, sentiment) are
+    attached as document metadata for filtering and enrichment in RAG pipelines.
+    """
+
+    def __init__(
+        self,
+        model: str = "nova-3",
+        smart_format: bool = True,
+        summarize: Optional[str] = "v2",
+        topics: bool = True,
+        sentiment: bool = True,
+        detect_entities: bool = True,
+        language: str = "en",
+    ) -> None:
+        self.model = model
+        self.smart_format = smart_format
+        self.summarize = summarize
+        self.topics = topics
+        self.sentiment = sentiment
+        self.detect_entities = detect_entities
+        self.language = language
+        self._client = DeepgramClient()
+
+    def load_data(self, audio_urls: List[str]) -> List[Document]:
+        """Transcribe each audio URL and return a list of Documents.
+
+        This follows the same pattern as llama-index-readers-assemblyai:
+        audio in → transcription API → Document objects out.
+        """
+        documents = []
+        for url in audio_urls:
+            doc = self._transcribe_url(url)
+            documents.append(doc)
+        return documents
+
+    def _transcribe_url(self, url: str) -> Document:
+        """Transcribe a single audio URL and build a Document with metadata."""
+        # ← transcribe_url has Deepgram fetch the audio server-side
+        response = self._client.listen.v1.media.transcribe_url(
+            url=url,
+            model=self.model,
+            smart_format=self.smart_format,
+            # Audio Intelligence features run on the same transcription call —
+            # they are parameters, not separate endpoints.
+            summarize=self.summarize,
+            topics=self.topics,
+            sentiment=self.sentiment,
+            detect_entities=self.detect_entities,
+            language=self.language,
+        )
+
+        # response.results.channels[0].alternatives[0].transcript
+        channel = response.results.channels[0]
+        alt = channel.alternatives[0]
+        transcript = alt.transcript
+        confidence = alt.confidence
+        words = alt.words
+        duration = words[-1].end if words else 0.0
+
+        metadata = {
+            "source": url,
+            "duration_seconds": duration,
+            "confidence": confidence,
+            "model": self.model,
+            "language": self.language,
+        }
+
+        # Audio Intelligence results live at response.results.{feature}
+        summary = getattr(response.results, "summary", None)
+        if summary and hasattr(summary, "short"):
+            metadata["summary"] = summary.short
+
+        topics_result = getattr(response.results, "topics", None)
+        if topics_result and hasattr(topics_result, "segments"):
+            topic_list = []
+            for segment in topics_result.segments:
+                for topic in getattr(segment, "topics", []):
+                    if hasattr(topic, "topic"):
+                        topic_list.append(topic.topic)
+            metadata["topics"] = list(dict.fromkeys(topic_list))
+
+        sentiments_result = getattr(response.results, "sentiments", None)
+        if sentiments_result and hasattr(sentiments_result, "average"):
+            metadata["average_sentiment"] = sentiments_result.average.sentiment
+
+        entities_result = getattr(response.results, "entities", None)
+        if entities_result and hasattr(entities_result, "segments"):
+            entity_list = []
+            for segment in entities_result.segments:
+                if hasattr(segment, "value"):
+                    entity_list.append(f"{segment.entity_type}: {segment.value}")
+            metadata["entities"] = list(dict.fromkeys(entity_list))
+
+        return Document(text=transcript, metadata=metadata)
+
+
+def run_load(audio_urls: List[str]) -> None:
+    """Load audio into Documents and print their content and metadata."""
+    reader = DeepgramAudioReader()
+    documents = reader.load_data(audio_urls)
+
+    for i, doc in enumerate(documents):
+        print(f"\n{'='*60}")
+        print(f"Document {i+1}")
+        print(f"{'='*60}")
+        print(f"Source: {doc.metadata.get('source', 'unknown')}")
+        print(f"Duration: {doc.metadata.get('duration_seconds', 0):.1f}s")
+        print(f"Confidence: {doc.metadata.get('confidence', 0):.0%}")
+        if "summary" in doc.metadata:
+            print(f"Summary: {doc.metadata['summary']}")
+        if "topics" in doc.metadata:
+            print(f"Topics: {', '.join(doc.metadata['topics'][:5])}")
+        if "entities" in doc.metadata:
+            print(f"Entities: {', '.join(doc.metadata['entities'][:5])}")
+        print(f"\nTranscript preview:\n  {doc.text[:300]}...")
+
+
+def run_query(audio_urls: List[str], question: str) -> None:
+    """Load audio, build a VectorStoreIndex, and query it.
+
+    This demonstrates the full RAG pipeline: audio → Deepgram → Documents →
+    embeddings → vector index → LLM-powered query.
+    Requires OPENAI_API_KEY for LlamaIndex default LLM and embeddings.
+    """
+    if not os.environ.get("OPENAI_API_KEY"):
+        print("Error: OPENAI_API_KEY is not set.", file=sys.stderr)
+        print("The query engine needs an LLM. Get a key at https://platform.openai.com/api-keys", file=sys.stderr)
+        sys.exit(1)
+
+    reader = DeepgramAudioReader()
+    documents = reader.load_data(audio_urls)
+
+    print(f"Loaded {len(documents)} document(s), building index...")
+
+    # VectorStoreIndex embeds the documents and stores them for similarity search.
+    # Default uses OpenAI text-embedding-ada-002 for embeddings and gpt-3.5-turbo for queries.
+    index = VectorStoreIndex.from_documents(documents)
+    query_engine = index.as_query_engine()
+
+    response = query_engine.query(question)
+
+    print(f"\n{'='*60}")
+    print(f"Question: {question}")
+    print(f"{'='*60}")
+    print(f"\n{response}")
+
+
+def main() -> None:
+    if len(sys.argv) < 2:
+        print("Usage:")
+        print("  python src/audio_loader.py <audio-url> [<audio-url> ...]")
+        print("  python src/audio_loader.py --query 'Your question' <audio-url> [<audio-url> ...]")
+        sys.exit(1)
+
+    if sys.argv[1] == "--query":
+        if len(sys.argv) < 4:
+            print("Error: provide a question and at least one audio URL", file=sys.stderr)
+            sys.exit(1)
+        question = sys.argv[2]
+        audio_urls = sys.argv[3:]
+        run_query(audio_urls, question)
+    else:
+        audio_urls = sys.argv[1:]
+        run_load(audio_urls)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/examples/160-llamaindex-audio-loader-python/tests/test_example.py b/examples/160-llamaindex-audio-loader-python/tests/test_example.py