Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 98 additions & 17 deletions chapters/audio-intelligence/audio-to-llm.mdx
Original file line number Diff line number Diff line change
@@ -1,35 +1,90 @@
---
title: Audio to LLM
description: "Ask any question or analysis, as you would do with an assistant"
description: "Run your own prompts on a pre-recorded transcript with an LLM - summaries, Q&A, extraction, and more."
---

import PrerecordedBadge from "/snippets/badges/prerecorded.mdx"

<PrerecordedBadge />
<Note>
This feature is in **Alpha** state.

We're looking for feedback to improve this feature, [share yours on Discord](https://discord.gg/T22a4ETUQp).
</Note>
**Audio to LLM** runs once the transcription is generated. You provide **one or more prompts**; each prompt is executed against the **transcript text** from the same job using the configured model, yielding **one LLM response per prompt**. Use it to extract action items, answer questions about the recording, or run any text analysis you express in natural language.

The **Audio to LLM** feature applies your own prompts to the audio transcription. For now, it is only available for **pre-recorded** audio.
Unlike the built-in [Summarization](/chapters/audio-intelligence/summarization) feature — which produces a fixed-format summary — Audio to LLM lets you write **your own instructions**: ask for a summary in the exact format, tone, and level of detail your product needs, or combine a summary with other analyses (action items, compliance checks) in a single request.

## Usage
Enable Audio to LLM by setting the appropriate flag and providing your prompts:

1. Include `audio_to_llm: true` and an `audio_to_llm_config` object (at minimum, a `prompts` array) in your [pre-recorded transcription request](/chapters/pre-recorded-stt/quickstart).
2. Gladia transcribes the audio, along with any other audio-intelligence options you enabled on that request.

Check warning on line 17 in chapters/audio-intelligence/audio-to-llm.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gladia-95) - vale-spellcheck

chapters/audio-intelligence/audio-to-llm.mdx#L17

Did you really mean 'Gladia'?
3. Each prompt is run on the resulting transcript via the LLM.
4. The API returns **one result object per prompt** (same order as `prompts`), each containing the original `prompt` and the model `response`.

<Note>
Audio to LLM sends **plain transcript text** to the model. Raw audio and other fields from the transcription response are **not** added to the LLM prompt context.
</Note>

## Model selection

By default the model used to execute your prompts is **[GPT 5.4 Nano](https://openrouter.ai/openai/gpt-5.4-nano)** (`openai/gpt-5.4-nano`), a fast option suited to high-volume summaries and extraction. The `model` can be customized when you need stronger reasoning, richer analysis, longer outputs, or behavior that fits a specific model.

You can use **any model listed on [OpenRouter](https://openrouter.ai/models)** by setting the `model` key. Prices reflect the public OpenRouter rate plus a platform fee added by Gladia.

Check warning on line 29 in chapters/audio-intelligence/audio-to-llm.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gladia-95) - vale-spellcheck

chapters/audio-intelligence/audio-to-llm.mdx#L29

Did you really mean 'Gladia'?

## Example

A single prompt is enough to get started (you can omit `model` to use the default):

```json Pre-recorded
{
"audio_to_llm": true,
"audio_to_llm_config": {
"prompts": [
"Summarize the transcript in three bullet points."
]
}
}
```

Example response shape for one prompt:

```json Pre-recorded
{
"success": true,
"is_empty": false,
"results": [
{
"success": true,
"is_empty": false,
"results": {
"prompt": "Summarize the transcript in three bullet points.",
"response": "- Intro and context\n- Main discussion\n- Conclusion and next steps"
},
"exec_time": 1.4122809978485107,
"error": null
}
],
"exec_time": 4.521103805541992,
"error": null
}
```

## Example: post-meeting workflow

For a **post-meeting** pass, you might ask for bullet takeaways, a short summary, and follow-up actions for the next meeting:

```json Pre-recorded
{
"audio_to_llm": true,
"audio_to_llm_config": {
"model": "openai/gpt-5.4",
"prompts": [
"Extract the key points from the transcription as bullet points",
"Generate a title from this transcription"
"Summarize the meeting as bullet points: main topics, decisions, and open questions.",
"Give a concise paragraph summarizing what this meeting was about and the outcome.",
"List action items and follow-ups to prepare for the next meeting; include owners if they were mentioned."
]
}
}
```

With this code, your output will look like this:
With this configuration, your output might look like this:

```json Pre-recorded
{
Expand All @@ -40,8 +95,8 @@
"success": true,
"is_empty": false,
"results": {
"prompt": "Extract the key points from the transcription as bullet points",
"response": "The main entities key points from the transcription are:\n- ..."
"prompt": "Summarize the meeting as bullet points: main topics, decisions, and open questions.",
"response": "- **Roadmap Q2**: Team aligned on shipping the billing integration first.\n- **Decision**: Weekly sync moved to Tuesday.\n- **Open question**: Whether to support SSO in v1 is still TBD."
},
"exec_time": 1.7726809978485107,
"error": null
Expand All @@ -50,16 +105,42 @@
"success": true,
"is_empty": false,
"results": {
"prompt": "Generate a title from this transcription",
"response": "The Great Title"
"prompt": "Give a concise paragraph summarizing what this meeting was about and the outcome.",
"response": "The group reviewed Q2 priorities, agreed to prioritize billing, and rescheduled the standing meeting. SSO scope was left for a follow-up once design signs off."
},
"exec_time": 1.7832809978258485,
"exec_time": 1.5122809978485107,
"error": null
},
{
"success": true,
"is_empty": false,
"results": {
"prompt": "List action items and follow-ups to prepare for the next meeting; include owners if they were mentioned.",
"response": "- **Alex**: Finalize SSO requirements doc by Friday.\n- **Jamie**: Share billing API cutover checklist with the team.\n- **Everyone**: Review the updated roadmap draft before next sync."
},
"exec_time": 1.8932809978258485,
"error": null
}
],
"exec_time": 6.127103805541992,
"exec_time": 6.267103805541992,
"error": null
}
```

You'll find the results for each prompt under the `results` key.
## Response shape

- Top-level `results` is an **array** with **one entry per prompt**, in the **same order** as `audio_to_llm_config.prompts`.
- Each entry includes `success`, optional `error`, timing fields, and nested `results.prompt` / `results.response` with the LLM output for that prompt.

## Pricing

The input provided to the LLM is the full transcription. All prices are per 1M tokens and include platform fees (30%).

| Model | `model` config | Context Window | Input | Output |
|-------|----------------|----------------|----------------|-----------------|
| [OpenAI: GPT-5.4 Nano](https://openrouter.ai/openai/gpt-5.4-nano) | `openai/gpt-5.4-nano` | 400k | $0.26 | $1.76 |
| [OpenAI: GPT-5.4](https://openrouter.ai/openai/gpt-5.4) | `openai/gpt-5.4` | 1.1M | $3.25 | $19.50 |
| [Anthropic: Claude Opus 4.7](https://openrouter.ai/anthropic/claude-opus-4.7) | `anthropic/claude-opus-4.7` | 1M | $6.50 | $32.50 |
| [Google: Gemini 3.1 Pro Preview](https://openrouter.ai/google/gemini-3.1-pro-preview) | `google/gemini-3.1-pro-preview` | 1M | $2.60 | $15.60 |
| [xAI: Grok 4.20](https://openrouter.ai/x-ai/grok-4.20) | `x-ai/grok-4.20` | 2M | $2.60 | $7.80 |
| [Meta: Llama 4 Maverick](https://openrouter.ai/meta-llama/llama-4-maverick) | `meta-llama/llama-4-maverick` | 1M | $0.20 | $0.78 |
21 changes: 21 additions & 0 deletions chapters/audio-intelligence/summarization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,14 @@

- **general**: Balanced summary for most use cases; good readability and coverage.
- **concise**: Shorter output for quick overviews or previews; fewer details.
- **bullet_points**: Lists key takeaways; ideal for action items, meeting notes, or highlights.

Check warning on line 28 in chapters/audio-intelligence/summarization.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gladia-95) - vale-spellcheck

chapters/audio-intelligence/summarization.mdx#L28

Did you really mean 'bullet_points'?

## Model selection

By default the model used for summarization is **[GPT 5.4 Nano](https://openrouter.ai/openai/gpt-5.4-nano)** (`openai/gpt-5.4-nano`), a fast option suited to high-volume summaries and extraction. Use the **`model`** field inside **`summarization_config`** to override the default when stronger reasoning, richer analysis, longer outputs, or another provider or model family is needed.

You can use **any model listed on [OpenRouter](https://openrouter.ai/models)** by setting the `model` key. Prices reflect the public OpenRouter rate plus a platform fee added by Gladia.

Check warning on line 34 in chapters/audio-intelligence/summarization.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gladia-95) - vale-spellcheck

chapters/audio-intelligence/summarization.mdx#L34

Did you really mean 'Gladia'?

## Usage
To enable summarization simply set the `"summarization"` parameter to true
<CodeGroup>
Expand All @@ -35,6 +41,7 @@
"summarization": true,
"summarization_config": {
"type": "concise",
"model": "openai/gpt-5.4",
}
}
```
Expand All @@ -44,6 +51,7 @@
"summarization": true,
"summarization_config": {
"type": "concise",
"model": "openai/gpt-5.4",
}
},
"messages_config": {
Expand Down Expand Up @@ -72,3 +80,16 @@
</CodeGroup>

You'll find the summarization of your audio under the `results` key.

## Pricing

The input provided to the LLM is the full transcription. All prices are per 1M tokens and include platform fees (30%).

| Model | `model` config | Context Window | Input | Output |
|-------|----------------|----------------|----------------|-----------------|
| [OpenAI: GPT-5.4 Nano](https://openrouter.ai/openai/gpt-5.4-nano) | `openai/gpt-5.4-nano` | 400k | $0.26 | $1.76 |
| [OpenAI: GPT-5.4](https://openrouter.ai/openai/gpt-5.4) | `openai/gpt-5.4` | 1.1M | $3.25 | $19.50 |
| [Anthropic: Claude Opus 4.7](https://openrouter.ai/anthropic/claude-opus-4.7) | `anthropic/claude-opus-4.7` | 1M | $6.50 | $32.50 |
| [Google: Gemini 3.1 Pro Preview](https://openrouter.ai/google/gemini-3.1-pro-preview) | `google/gemini-3.1-pro-preview` | 1M | $2.60 | $15.60 |
| [xAI: Grok 4.20](https://openrouter.ai/x-ai/grok-4.20) | `x-ai/grok-4.20` | 2M | $2.60 | $7.80 |
| [Meta: Llama 4 Maverick](https://openrouter.ai/meta-llama/llama-4-maverick) | `meta-llama/llama-4-maverick` | 1M | $0.20 | $0.78 |
Loading