diff --git a/chapters/audio-intelligence/audio-to-llm.mdx b/chapters/audio-intelligence/audio-to-llm.mdx index 220e06e..7c45644 100644 --- a/chapters/audio-intelligence/audio-to-llm.mdx +++ b/chapters/audio-intelligence/audio-to-llm.mdx @@ -1,35 +1,90 @@ --- title: Audio to LLM -description: "Ask any question or analysis, as you would do with an assistant" +description: "Run your own prompts on a pre-recorded transcript with an LLM - summaries, Q&A, extraction, and more." --- import PrerecordedBadge from "/snippets/badges/prerecorded.mdx" - -This feature is in **Alpha** state. -We're looking for feedback to improve this feature, [share yours on Discord](https://discord.gg/T22a4ETUQp). - +**Audio to LLM** runs once the transcription is generated. You provide **one or more prompts**; each prompt is executed against the **transcript text** from the same job using the configured model, yielding **one LLM response per prompt**. Use it to extract action items, answer questions about the recording, or run any text analysis you express in natural language. -The **Audio to LLM** feature applies your own prompts to the audio transcription. For now, it is only available for **pre-recorded** audio. +Unlike the built-in [Summarization](/chapters/audio-intelligence/summarization) feature — which produces a fixed-format summary — Audio to LLM lets you write **your own instructions**: ask for a summary in the exact format, tone, and level of detail your product needs, or combine a summary with other analyses (action items, compliance checks) in a single request. ## Usage -Enable Audio to LLM by setting the appropriate flag and providing your prompts: + +1. Include `audio_to_llm: true` and an `audio_to_llm_config` object (at minimum, a `prompts` array) in your [pre-recorded transcription request](/chapters/pre-recorded-stt/quickstart). +2. Gladia transcribes the audio, along with any other audio-intelligence options you enabled on that request. +3. Each prompt is run on the resulting transcript via the LLM. +4. The API returns **one result object per prompt** (same order as `prompts`), each containing the original `prompt` and the model `response`. + + + Audio to LLM sends **plain transcript text** to the model. Raw audio and other fields from the transcription response are **not** added to the LLM prompt context. + + +## Model selection + +By default the model used to execute your prompts is **[GPT 5.4 Nano](https://openrouter.ai/openai/gpt-5.4-nano)** (`openai/gpt-5.4-nano`), a fast option suited to high-volume summaries and extraction. The `model` can be customized when you need stronger reasoning, richer analysis, longer outputs, or behavior that fits a specific model. + +You can use **any model listed on [OpenRouter](https://openrouter.ai/models)** by setting the `model` key. Prices reflect the public OpenRouter rate plus a platform fee added by Gladia. + +## Example + +A single prompt is enough to get started (you can omit `model` to use the default): + +```json Pre-recorded +{ + "audio_to_llm": true, + "audio_to_llm_config": { + "prompts": [ + "Summarize the transcript in three bullet points." + ] + } +} +``` + +Example response shape for one prompt: + +```json Pre-recorded +{ + "success": true, + "is_empty": false, + "results": [ + { + "success": true, + "is_empty": false, + "results": { + "prompt": "Summarize the transcript in three bullet points.", + "response": "- Intro and context\n- Main discussion\n- Conclusion and next steps" + }, + "exec_time": 1.4122809978485107, + "error": null + } + ], + "exec_time": 4.521103805541992, + "error": null +} +``` + +## Example: post-meeting workflow + +For a **post-meeting** pass, you might ask for bullet takeaways, a short summary, and follow-up actions for the next meeting: ```json Pre-recorded { "audio_to_llm": true, "audio_to_llm_config": { + "model": "openai/gpt-5.4", "prompts": [ - "Extract the key points from the transcription as bullet points", - "Generate a title from this transcription" + "Summarize the meeting as bullet points: main topics, decisions, and open questions.", + "Give a concise paragraph summarizing what this meeting was about and the outcome.", + "List action items and follow-ups to prepare for the next meeting; include owners if they were mentioned." ] } } ``` -With this code, your output will look like this: +With this configuration, your output might look like this: ```json Pre-recorded { @@ -40,8 +95,8 @@ With this code, your output will look like this: "success": true, "is_empty": false, "results": { - "prompt": "Extract the key points from the transcription as bullet points", - "response": "The main entities key points from the transcription are:\n- ..." + "prompt": "Summarize the meeting as bullet points: main topics, decisions, and open questions.", + "response": "- **Roadmap Q2**: Team aligned on shipping the billing integration first.\n- **Decision**: Weekly sync moved to Tuesday.\n- **Open question**: Whether to support SSO in v1 is still TBD." }, "exec_time": 1.7726809978485107, "error": null @@ -50,16 +105,42 @@ With this code, your output will look like this: "success": true, "is_empty": false, "results": { - "prompt": "Generate a title from this transcription", - "response": "The Great Title" + "prompt": "Give a concise paragraph summarizing what this meeting was about and the outcome.", + "response": "The group reviewed Q2 priorities, agreed to prioritize billing, and rescheduled the standing meeting. SSO scope was left for a follow-up once design signs off." }, - "exec_time": 1.7832809978258485, + "exec_time": 1.5122809978485107, + "error": null + }, + { + "success": true, + "is_empty": false, + "results": { + "prompt": "List action items and follow-ups to prepare for the next meeting; include owners if they were mentioned.", + "response": "- **Alex**: Finalize SSO requirements doc by Friday.\n- **Jamie**: Share billing API cutover checklist with the team.\n- **Everyone**: Review the updated roadmap draft before next sync." + }, + "exec_time": 1.8932809978258485, "error": null } ], - "exec_time": 6.127103805541992, + "exec_time": 6.267103805541992, "error": null } ``` -You'll find the results for each prompt under the `results` key. +## Response shape + +- Top-level `results` is an **array** with **one entry per prompt**, in the **same order** as `audio_to_llm_config.prompts`. +- Each entry includes `success`, optional `error`, timing fields, and nested `results.prompt` / `results.response` with the LLM output for that prompt. + +## Pricing + +The input provided to the LLM is the full transcription. All prices are per 1M tokens and include platform fees (30%). + +| Model | `model` config | Context Window | Input | Output | +|-------|----------------|----------------|----------------|-----------------| +| [OpenAI: GPT-5.4 Nano](https://openrouter.ai/openai/gpt-5.4-nano) | `openai/gpt-5.4-nano` | 400k | $0.26 | $1.76 | +| [OpenAI: GPT-5.4](https://openrouter.ai/openai/gpt-5.4) | `openai/gpt-5.4` | 1.1M | $3.25 | $19.50 | +| [Anthropic: Claude Opus 4.7](https://openrouter.ai/anthropic/claude-opus-4.7) | `anthropic/claude-opus-4.7` | 1M | $6.50 | $32.50 | +| [Google: Gemini 3.1 Pro Preview](https://openrouter.ai/google/gemini-3.1-pro-preview) | `google/gemini-3.1-pro-preview` | 1M | $2.60 | $15.60 | +| [xAI: Grok 4.20](https://openrouter.ai/x-ai/grok-4.20) | `x-ai/grok-4.20` | 2M | $2.60 | $7.80 | +| [Meta: Llama 4 Maverick](https://openrouter.ai/meta-llama/llama-4-maverick) | `meta-llama/llama-4-maverick` | 1M | $0.20 | $0.78 | diff --git a/chapters/audio-intelligence/summarization.mdx b/chapters/audio-intelligence/summarization.mdx index b1b2ca3..fbab505 100644 --- a/chapters/audio-intelligence/summarization.mdx +++ b/chapters/audio-intelligence/summarization.mdx @@ -27,6 +27,12 @@ If no `summarization_config` is provided, `general` type will be used by default - **concise**: Shorter output for quick overviews or previews; fewer details. - **bullet_points**: Lists key takeaways; ideal for action items, meeting notes, or highlights. +## Model selection + +By default the model used for summarization is **[GPT 5.4 Nano](https://openrouter.ai/openai/gpt-5.4-nano)** (`openai/gpt-5.4-nano`), a fast option suited to high-volume summaries and extraction. Use the **`model`** field inside **`summarization_config`** to override the default when stronger reasoning, richer analysis, longer outputs, or another provider or model family is needed. + +You can use **any model listed on [OpenRouter](https://openrouter.ai/models)** by setting the `model` key. Prices reflect the public OpenRouter rate plus a platform fee added by Gladia. + ## Usage To enable summarization simply set the `"summarization"` parameter to true @@ -35,6 +41,7 @@ To enable summarization simply set the `"summarization"` parameter to true "summarization": true, "summarization_config": { "type": "concise", + "model": "openai/gpt-5.4", } } ``` @@ -44,6 +51,7 @@ To enable summarization simply set the `"summarization"` parameter to true "summarization": true, "summarization_config": { "type": "concise", + "model": "openai/gpt-5.4", } }, "messages_config": { @@ -72,3 +80,16 @@ The transcription result will contain a ```"summarization"``` key with the outpu You'll find the summarization of your audio under the `results` key. + +## Pricing + +The input provided to the LLM is the full transcription. All prices are per 1M tokens and include platform fees (30%). + +| Model | `model` config | Context Window | Input | Output | +|-------|----------------|----------------|----------------|-----------------| +| [OpenAI: GPT-5.4 Nano](https://openrouter.ai/openai/gpt-5.4-nano) | `openai/gpt-5.4-nano` | 400k | $0.26 | $1.76 | +| [OpenAI: GPT-5.4](https://openrouter.ai/openai/gpt-5.4) | `openai/gpt-5.4` | 1.1M | $3.25 | $19.50 | +| [Anthropic: Claude Opus 4.7](https://openrouter.ai/anthropic/claude-opus-4.7) | `anthropic/claude-opus-4.7` | 1M | $6.50 | $32.50 | +| [Google: Gemini 3.1 Pro Preview](https://openrouter.ai/google/gemini-3.1-pro-preview) | `google/gemini-3.1-pro-preview` | 1M | $2.60 | $15.60 | +| [xAI: Grok 4.20](https://openrouter.ai/x-ai/grok-4.20) | `x-ai/grok-4.20` | 2M | $2.60 | $7.80 | +| [Meta: Llama 4 Maverick](https://openrouter.ai/meta-llama/llama-4-maverick) | `meta-llama/llama-4-maverick` | 1M | $0.20 | $0.78 |