Skip to content

Latest commit

 

History

History
683 lines (550 loc) · 30.3 KB

File metadata and controls

683 lines (550 loc) · 30.3 KB

API Reference

All workflows accept an optional credentials object for runtime credential injection. This is inherited from the base MuxAIOptions interface and is not repeated for each workflow below.

getSummaryAndTags(assetId, options?)

Analyzes a Mux video or audio asset and returns AI-generated metadata.

Parameters:

  • assetId (string) - Mux asset ID (video or audio-only)
  • options (optional) - Configuration options

Options:

  • provider?: 'openai' | 'anthropic' | 'google' - AI provider (default: 'openai')
  • tone?: 'neutral' | 'playful' | 'professional' - Analysis tone (default: 'neutral')
  • model?: string - AI model to use (defaults: gpt-5.1, claude-sonnet-4-5, or gemini-3-flash-preview)
  • languageCode?: string - Language code for transcript track selection (e.g., 'en', 'fr'). When omitted, prefers English if available.
  • outputLanguageCode?: string - BCP 47 language code (e.g., 'en', 'fr', 'ja') for the generated title, description, and tags. When omitted or set to 'auto', auto-detects from the selected transcript track's language. Falls back to unconstrained (LLM decides) if no language metadata is available.
  • includeTranscript?: boolean - Include transcript in analysis (default: true)
  • cleanTranscript?: boolean - Remove VTT timestamps and formatting from transcript (default: true)
  • imageSubmissionMode?: 'url' | 'base64' - How to submit storyboard to AI providers (default: 'url')
  • imageDownloadOptions?: object - Options for image download when using base64 mode
    • timeout?: number - Request timeout in milliseconds (default: 10000)
    • retries?: number - Maximum retry attempts (default: 3)
    • retryDelay?: number - Base delay between retries in milliseconds (default: 1000)
    • maxRetryDelay?: number - Maximum delay between retries in milliseconds (default: 10000)
    • exponentialBackoff?: boolean - Whether to use exponential backoff (default: true)
  • promptOverrides?: object - Override specific sections of the prompt for custom use cases
    • task?: string - Override the main task instruction
    • title?: string - Override title generation guidance
    • description?: string - Override description generation guidance
    • keywords?: string - Override keywords generation guidance
    • qualityGuidelines?: string - Override quality guidelines

Returns:

interface SummaryAndTagsResult {
  assetId: string;
  title: string; // Short title
  description: string; // Detailed description
  tags: string[]; // Up to 10 relevant keywords
  storyboardUrl?: string; // Video storyboard URL (undefined for audio-only assets)
  usage?: TokenUsage; // Token usage from the AI provider
  transcriptText?: string; // Raw transcript text (when includeTranscript is true)
}

getModerationScores(assetId, options?)

Analyzes a Mux asset for inappropriate content using OpenAI's Moderation API or Hive's Moderation API.

  • For video assets, this moderates storyboard thumbnails (image moderation).
  • For audio-only assets, this moderates the underlying transcript text (text moderation).

Parameters:

  • assetId (string) - Mux asset ID (video or audio-only)
  • options (optional) - Configuration options

Options:

  • provider?: 'openai' | 'hive' - Moderation provider (default: 'openai')
  • model?: string - OpenAI moderation model to use (default: omni-moderation-latest)
  • languageCode?: string - Transcript language code when moderating audio-only assets (optional)
  • thresholds?: { sexual?: number; violence?: number } - Custom thresholds (default: {sexual: 0.7, violence: 0.8})
  • thumbnailInterval?: number - Seconds between thumbnails for long videos (default: 10)
  • thumbnailWidth?: number - Thumbnail width in pixels (default: 640)
  • maxSamples?: number - Maximum number of thumbnails to sample. Acts as a cap: if thumbnailInterval produces fewer samples than this limit the interval is respected; otherwise samples are evenly distributed with first and last frames pinned. (default: unlimited)
  • maxConcurrent?: number - Maximum concurrent API requests (default: 5)
  • imageSubmissionMode?: 'url' | 'base64' - How to submit images to AI providers (default: 'url')
  • imageDownloadOptions?: object - Options for image download when using base64 mode
    • timeout?: number - Request timeout in milliseconds (default: 10000)
    • retries?: number - Maximum retry attempts (default: 3)
    • retryDelay?: number - Base delay between retries in milliseconds (default: 1000)
    • maxRetryDelay?: number - Maximum delay between retries in milliseconds (default: 10000)
    • exponentialBackoff?: boolean - Whether to use exponential backoff (default: true)

Hive note (audio-only): transcript moderation submits text_data and requires a Hive Text Moderation project/API key. If you use a Visual Moderation key, Hive will reject the request (see Hive Text Moderation docs).

Returns:

{
  assetId: string;
  mode: 'thumbnails' | 'transcript';
  isAudioOnly: boolean;
  thumbnailScores: Array<{ // Individual thumbnail results
    url: string;
    time?: number; // Time in seconds of the thumbnail within the video
    sexual: number; // 0-1 score
    violence: number; // 0-1 score
    error: boolean;
    errorMessage?: string;
  }>;
  maxScores: { // Highest scores across all thumbnails (or transcript chunks for audio-only)
    sexual: number;
    violence: number;
  };
  coverage: {
    requestedSampleCount: number;
    successfulSampleCount: number;
    failedSampleCount: number;
    sampleCoverage: number; // 0-1 fraction of requested samples that succeeded
    isPartial: boolean; // true when some samples failed but the workflow still returned a result
    isLowConfidence: boolean; // true when coverage is thin and thresholds should be interpreted cautiously
  };
  exceedsThreshold: boolean; // true if content should be flagged
  thresholds: { // Threshold values used
    sexual: number;
    violence: number;
  };
  usage?: TokenUsage; // Workflow usage metadata
}

hasBurnedInCaptions(assetId, options?)

Analyzes video frames to detect burned-in captions (hardcoded subtitles) that are permanently embedded in the video image.

Parameters:

  • assetId (string) - Mux video asset ID
  • options (optional) - Configuration options

Options:

  • provider?: 'openai' | 'anthropic' | 'google' - AI provider (default: 'openai')
  • model?: string - AI model to use (defaults: gpt-5.1, claude-sonnet-4-5, or gemini-3-flash-preview)
  • imageSubmissionMode?: 'url' | 'base64' - How to submit storyboard to AI providers (default: 'url')
  • imageDownloadOptions?: object - Options for image download when using base64 mode
    • timeout?: number - Request timeout in milliseconds (default: 10000)
    • retries?: number - Maximum retry attempts (default: 3)
    • retryDelay?: number - Base delay between retries in milliseconds (default: 1000)
    • maxRetryDelay?: number - Maximum delay between retries in milliseconds (default: 10000)
    • exponentialBackoff?: boolean - Whether to use exponential backoff (default: true)
  • promptOverrides?: object - Override specific sections of the detection prompt
    • task?: string - Override the main analysis task instruction
    • analysisSteps?: string - Override the step-by-step analysis procedure
    • positiveIndicators?: string - Override criteria for classifying text as captions
    • negativeIndicators?: string - Override criteria for ruling out captions

Returns:

{
  assetId: string;
  hasBurnedInCaptions: boolean; // Whether burned-in captions were detected
  confidence: number; // Confidence score (0.0-1.0)
  detectedLanguage: string | null; // Language of detected captions, or null
  storyboardUrl: string; // URL to analyzed storyboard
  usage?: TokenUsage; // Token usage from the AI provider
}

Detection Logic:

  • Analyzes video storyboard frames to identify text overlays
  • Distinguishes between actual captions and marketing/end-card text
  • Text appearing only in final 1-2 frames is classified as marketing copy
  • Caption text must appear across multiple frames throughout the timeline
  • Optimized prompts minimize false positives

askQuestions(assetId, questions, options?)

Answer questions about asset content by analyzing storyboard frames and optional transcripts. For audio-only assets, this workflow analyzes transcript text only. By default, answers are "yes"/"no", but you can override the allowed responses.

Parameters:

  • assetId (string) - Mux asset ID (video or audio-only)
  • questions (array) - Array of question objects
    • Each question object must have a question field (string)
    • Each question may optionally include answerOptions?: string[] (defaults to ["yes", "no"])
    • Example: [{ question: "What is the production quality?", answerOptions: ["amateur", "semi-pro", "professional"] }]
  • options (optional) - Configuration options

Options:

  • provider?: 'openai' | 'anthropic' | 'google' - AI provider (default: 'openai')
  • model?: string - AI model to use (defaults: gpt-5.1, claude-sonnet-4-5, or gemini-3-flash-preview)
  • languageCode?: string - Language code for transcript track selection (e.g., 'en', 'fr'). When omitted, prefers English if available.
  • includeTranscript?: boolean - Include transcript in analysis (default: true, required for audio-only assets)
  • cleanTranscript?: boolean - Remove VTT timestamps and formatting from transcript (default: true)
  • imageSubmissionMode?: 'url' | 'base64' - How to submit storyboard to AI providers (default: 'url')
  • imageDownloadOptions?: object - Options for image download when using base64 mode
    • timeout?: number - Request timeout in milliseconds (default: 10000)
    • retries?: number - Maximum retry attempts (default: 3)
    • retryDelay?: number - Base delay between retries in milliseconds (default: 1000)
    • maxRetryDelay?: number - Maximum delay between retries in milliseconds (default: 10000)
    • exponentialBackoff?: boolean - Whether to use exponential backoff (default: true)
  • storyboardWidth?: number - Storyboard resolution in pixels (default: 640)

Returns:

interface AskQuestionsResult {
  assetId: string;
  answers: Array<{
    question: string; // The original question
    answer: string | null; // Answer from allowed options (null when skipped)
    confidence: number; // Confidence score (0.0-1.0)
    reasoning: string; // AI's explanation based on observable evidence or why the question was skipped
    skipped: boolean; // True when the question was not answerable from the asset content
  }>;
  storyboardUrl?: string; // URL to analyzed storyboard (undefined for audio-only assets)
  usage?: TokenUsage; // Token usage from the AI provider
  transcriptText?: string; // Raw transcript (when includeTranscript is true)
}

Examples:

// Single question
const result = await askQuestions("asset-id", [
  { question: "Does this video contain cooking?" }
]);

console.log(result.answers[0].answer); // "yes" or "no" by default
console.log(result.answers[0].confidence); // 0.95
console.log(result.answers[0].reasoning); // "A chef prepares ingredients..."

// Multiple questions (efficient single API call)
const result = await askQuestions("asset-id", [
  { question: "Does this video contain people?" },
  { question: "Is this video in color?" },
  { question: "Does this video contain violence?" }
]);

// Without transcript (visual-only analysis)
const result = await askQuestions("asset-id", questions, {
  includeTranscript: false
});

// Per-question answer options — mix yes/no with classification scales
const result = await askQuestions("asset-id", [
  { question: "Does this contain cooking?" }, // answer options default to yes/no
  { question: "What is the production quality?", answerOptions: ["amateur", "semi-pro", "professional"] },
  { question: "What is the primary content type?", answerOptions: ["tutorial", "entertainment", "news", "advertisement"] },
  { question: "What is the overall sentiment?", answerOptions: ["positive", "neutral", "negative"] },
]);

Tips for Effective Questions:

  • Be specific and focused on observable evidence
  • Frame questions positively (prefer "Is X present?" over "Is X not present?")
  • Avoid ambiguous or subjective questions
  • Questions should have clear answers that map to your allowed options
  • The AI prioritizes visual evidence when transcript and visuals conflict

generateEngagementInsights(assetId, options?)

Generate AI-powered insights explaining viewer engagement patterns by analyzing hotspot data, heatmap statistics, visual frames, and transcripts.

Parameters:

  • assetId (string) - Mux asset ID
  • options (optional) - Configuration options

Options:

  • provider?: 'openai' | 'anthropic' | 'google' - AI provider (default: 'openai')

  • model?: string - AI model to use (defaults: gpt-5.1, claude-sonnet-4-5, or gemini-3-flash-preview)

  • hotspotLimit?: number - Number of engagement moments to analyze per direction (default: 5, range: 1-10). Note: actual moment count may be up to 2x this value since both peaks and valleys are fetched.

  • timeframe?: string - Engagement data timeframe (default: '7:days')

    • Examples: '60:minutes', '24:hours', '7:days', '30:days'
  • skipShots?: boolean - Skip shots integration, use thumbnails instead (default: false). Recommended for latency-sensitive use cases.

Returns:

interface EngagementInsightsResult {
  assetId: string;
  momentInsights: Array<{
    startMs: number; // Start time in milliseconds
    endMs: number; // End time in milliseconds
    timestamp: string; // Human-readable timestamp (e.g., "2:15")
    engagementScore: number; // Normalized score (0.0-1.0)
    insight: string; // Explanation of engagement pattern
  }>;
  overallInsight: {
    summary: string; // Overall engagement summary
    trends: string[]; // Key trends identified
  };
  usage?: { // Token usage statistics
    inputTokens: number;
    outputTokens: number;
    totalTokens: number;
  };
}

Examples:

// Basic usage - informational insights
const result = await generateEngagementInsights("asset-id");

result.momentInsights.forEach(m => {
  console.log(`${m.timestamp}: ${m.insight}`);
});

// Custom timeframe
const result = await generateEngagementInsights("asset-id", {
  timeframe: "30:days",
  hotspotLimit: 5,
});

console.log(result.overallInsight.summary);
console.log("Trends:", result.overallInsight.trends);

// Low-latency mode (skip shots polling)
const result = await generateEngagementInsights("asset-id", {
  skipShots: true,
});

Requirements:

  • Newer or low-view videos may not have sufficient engagement data
  • Works with both video and audio-only assets (audio-only skips visual analysis)

Use Cases:

  • Content optimization based on viewer behavior
  • Understanding what drives re-watching and engagement
  • Identifying pacing issues and drop-off points
  • A/B testing video variations
  • Providing engagement feedback to content creators

translateCaptions(assetId, trackId, toLanguageCode, options?)

Translates existing captions from one language to another and optionally adds them as a new track to the Mux asset. The source language is inferred from the track's metadata.

Parameters:

  • assetId (string) - Mux asset ID (video or audio-only)
  • trackId (string) - ID of the source caption track to translate
  • toLanguageCode (string) - Target language code (e.g., 'es', 'fr', 'de')
  • options - Configuration options

Options:

  • provider: 'openai' | 'anthropic' | 'google' - AI provider (required)
  • model?: string - Model to use (defaults to the provider's chat model if omitted)
  • uploadToMux?: boolean - Whether to upload translated track to Mux (default: true)
  • s3Endpoint?: string - S3-compatible storage endpoint
  • s3Region?: string - S3 region (default: 'auto')
  • s3Bucket?: string - S3 bucket name
  • storageAdapter?: StorageAdapter - Optional adapter with putObject and createPresignedGetUrl methods
  • s3SignedUrlExpirySeconds?: number - Expiry duration in seconds for S3 presigned GET URLs (default: 86400 / 24 hours)
  • chunking?: object - Optional VTT-aware chunking controls for large caption translations
    • enabled?: boolean - Set to false to translate all cues in a single structured request (default: true)
    • minimumAssetDurationSeconds?: number - Prefer a single request until the asset is at least this long (default: 1800)
    • targetChunkDurationSeconds?: number - Soft target for chunk duration once chunking starts (default: 1800)
    • maxConcurrentTranslations?: number - Max number of concurrent translation requests when chunking (default: 4)
    • maxCuesPerChunk?: number - Hard cap for cues included in a single AI translation chunk (default: 80)
    • maxCueTextTokensPerChunk?: number - Approximate cap for cue text tokens included in a single AI translation chunk (default: 2000)

Returns:

interface TranslationResult {
  assetId: string;
  trackId: string; // Source track ID
  sourceLanguageCode: string; // Inferred from track metadata
  targetLanguageCode: string;
  sourceLanguage: LanguageCodePair; // { iso639_1: string; iso639_3: string }
  targetLanguage: LanguageCodePair; // { iso639_1: string; iso639_3: string }
  originalVtt: string; // Original VTT content
  translatedVtt: string; // Translated VTT content
  uploadedTrackId?: string; // Mux track ID (if uploaded)
  presignedUrl?: string; // S3 presigned URL (default expiry: 24 hours)
  usage?: TokenUsage; // Token usage from the AI provider
}

Supported Languages: All ISO 639-1 language codes are automatically supported using Intl.DisplayNames. Examples: Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Japanese (ja), Korean (ko), Chinese (zh), Russian (ru), Arabic (ar), Hindi (hi), Thai (th), Swahili (sw), and many more.

Chunking Behavior:

  • Chunking is enabled by default for translateCaptions
  • Shorter assets are translated in a single request until minimumAssetDurationSeconds is reached
  • When chunking is active, requests stay aligned to VTT cues and the final VTT is rebuilt locally
  • Chunk size is bounded by both cue count and approximate cue text token budget

editCaptions(assetId, trackId, options)

Edits a caption track using LLM-powered profanity censorship, static find/replace, or both. Optionally uploads the edited track to Mux.

Parameters:

  • assetId (string) - Mux asset ID (video or audio-only)
  • trackId (string) - ID of the caption track to edit
  • options - Configuration options

Options:

  • provider?: 'openai' | 'anthropic' | 'google' - AI provider (required when autoCensorProfanity is set)
  • model?: string - Model to use (defaults to the provider's chat model if omitted)
  • autoCensorProfanity?: object - LLM-powered profanity censorship (optional)
    • mode?: 'blank' | 'remove' | 'mask' - Replacement strategy (default: 'blank')
      • 'blank': shit[____] (bracketed underscores matching word length)
      • 'remove': word removed entirely
      • 'mask': shit???? (question marks matching word length)
    • alwaysCensor?: string[] - Words to always censor regardless of LLM output
    • neverCensor?: string[] - Words to never censor even if the LLM flags them (takes precedence over alwaysCensor)
  • replacements?: Array<{ find: string; replace: string }> - Static find/replace pairs (optional, no LLM needed)
  • uploadToMux?: boolean - Whether to upload edited track to Mux (default: true)
  • deleteOriginalTrack?: boolean - Whether to delete the original track after uploading the edited one (default: true)
  • s3Endpoint?: string - S3-compatible storage endpoint
  • s3Region?: string - S3 region (default: 'auto')
  • s3Bucket?: string - S3 bucket name
  • trackNameSuffix?: string - Suffix appended to the original track name in parentheses (default: 'edited', e.g. "Subtitles (edited)")
  • storageAdapter?: StorageAdapter - Optional adapter with putObject and createPresignedGetUrl methods
  • s3SignedUrlExpirySeconds?: number - Expiry duration in seconds for S3 presigned GET URLs (default: 86400 / 24 hours)

At least one of autoCensorProfanity or replacements must be provided.

Returns:

interface ReplacementRecord {
  cueStartTime: number; // Start time of the cue where the replacement occurred (seconds)
  before: string; // Original word/phrase
  after: string; // Replacement text
}

interface EditCaptionsResult {
  assetId: string;
  trackId: string;
  originalVtt: string; // Original VTT content
  editedVtt: string; // Edited VTT content
  totalReplacementCount: number; // Total replacements across all operations
  autoCensorProfanity?: { // Present when autoCensorProfanity was used
    replacements: ReplacementRecord[]; // Each censored word with cue timing
  };
  replacements?: { // Present when replacements were used
    replacements: ReplacementRecord[]; // Each static replacement with cue timing
  };
  uploadedTrackId?: string; // Mux track ID (if uploaded)
  presignedUrl?: string; // S3 presigned URL (default expiry: 24 hours)
  usage?: TokenUsage; // Token usage (only present if LLM was used)
}

generateChapters(assetId, options?)

Generates AI-powered chapter markers by analyzing video or audio transcripts. Creates logical chapter breaks based on topic changes and content transitions.

Parameters:

  • assetId (string) - Mux asset ID (video or audio-only)
  • options (optional) - Configuration options

Options:

  • languageCode?: string - Language code for captions (e.g., 'en', 'es', 'fr'). When omitted, prefers English if available.
  • outputLanguageCode?: string - BCP 47 language code (e.g., 'en', 'fr', 'ja') for the generated chapter titles. When omitted or set to 'auto', auto-detects from the selected transcript track's language. Falls back to unconstrained (LLM decides) if no language metadata is available.
  • provider?: 'openai' | 'anthropic' | 'google' - AI provider (default: 'openai')
  • model?: string - AI model to use (defaults: gpt-5.1, claude-sonnet-4-5, or gemini-3-flash-preview)
  • promptOverrides?: object - Override specific sections of the chaptering prompt
    • task?: string - Override the main task instruction
    • outputFormat?: string - Override the expected output format description
    • chapterGuidelines?: string - Override chapter count and formatting guidelines
    • titleGuidelines?: string - Override chapter title style guidelines
  • minChaptersPerHour?: number - Minimum chapters to generate per hour of content (default: 3)
  • maxChaptersPerHour?: number - Maximum chapters to generate per hour of content (default: 8)

Returns:

{
  assetId: string;
  languageCode?: string; // Resolved from input or track metadata
  chapters: Array<{
    startTime: number; // Chapter start time in seconds
    title: string; // Descriptive chapter title
  }>;
  usage?: TokenUsage; // Token usage from the AI provider
}

Requirements:

  • Asset must have a ready caption/transcript track
  • When languageCode is omitted, prefers an English track if available
  • Uses existing auto-generated or uploaded captions/transcripts

Example Output:

// Perfect format for Mux Player
player.addChapters([
  { startTime: 0, title: "Introduction and Setup" },
  { startTime: 45, title: "Main Content Discussion" },
  { startTime: 120, title: "Conclusion" }
]);

translateAudio(assetId, toLanguageCode, options?)

Creates AI-dubbed audio tracks from existing media content using ElevenLabs voice cloning and translation. Uses the default audio track on your asset. Source language is auto-detected unless fromLanguageCode is provided.

Parameters:

  • assetId (string) - Mux asset ID (video or audio-only; must have audio.m4a static rendition)
  • toLanguageCode (string) - Target language code (e.g., 'es', 'fr', 'de')
  • options (optional) - Configuration options

Options:

  • provider?: 'elevenlabs' - AI provider (default: 'elevenlabs')
  • fromLanguageCode?: string - Optional source language code passed to ElevenLabs source_lang (ISO 639-1 or ISO 639-3, default: auto-detect)
  • numSpeakers?: number - Number of speakers (default: 0 for auto-detect)
  • uploadToMux?: boolean - Whether to upload dubbed track to Mux (default: true)
  • s3Endpoint?: string - S3-compatible storage endpoint
  • s3Region?: string - S3 region (default: 'auto')
  • s3Bucket?: string - S3 bucket name
  • storageAdapter?: StorageAdapter - Optional adapter with putObject and createPresignedGetUrl methods
  • s3SignedUrlExpirySeconds?: number - Expiry duration in seconds for S3 presigned GET URLs (default: 86400 / 24 hours)

Returns:

interface TranslateAudioResult {
  assetId: string;
  targetLanguageCode: string;
  targetLanguage: LanguageCodePair; // { iso639_1: string; iso639_3: string }
  dubbingId: string; // ElevenLabs dubbing job ID
  uploadedTrackId?: string; // Mux audio track ID (if uploaded)
  presignedUrl?: string; // S3 presigned URL (default expiry: 24 hours)
  usage?: TokenUsage; // Workflow usage metadata
}

Requirements:

  • Asset must have an audio.m4a static rendition (auto-requested if missing)
  • ElevenLabs API key with Creator plan or higher
  • S3-compatible storage for Mux ingestion

Supported Languages: ElevenLabs supports 32+ languages with automatic language name detection via Intl.DisplayNames. Supported languages include English, Spanish, French, German, Italian, Portuguese, Polish, Japanese, Korean, Chinese, Russian, Arabic, Hindi, Thai, and many more. Track names are automatically generated (e.g., "Polish (auto-dubbed)").

generateEmbeddings(assetId, options?)

Generate vector embeddings for transcript chunks from video or audio assets for semantic search.

Deprecated: generateVideoEmbeddings is deprecated. Use generateEmbeddings instead.

Parameters:

  • assetId (string) - Mux asset ID (video or audio-only)
  • options (optional) - Configuration options

Options:

  • provider?: 'openai' | 'google' - Embedding provider (default: 'openai')
  • model?: string - Model to use (defaults: text-embedding-3-small for OpenAI, gemini-embedding-001 for Google)
  • chunkingStrategy?: object - How to chunk the transcript
    • type: 'token' | 'vtt' - Chunking method
    • maxTokens?: number - Maximum tokens per chunk (default: 500)
    • overlap?: number - Token overlap between chunks (for type: 'token', default: 100)
    • overlapCues?: number - VTT cue overlap between chunks (for type: 'vtt', default: 2)
  • languageCode?: string - Language code for transcript track selection. When omitted, prefers English if available.
  • batchSize?: number - Maximum number of chunks to process concurrently (default: 5)

Returns:

{
  assetId: string;
  chunks: Array<{
    chunkId: string;
    embedding: number[]; // Vector embedding
    metadata: {
      startTime?: number; // Chunk start time in seconds
      endTime?: number; // Chunk end time in seconds
      tokenCount: number;
    };
  }>;
  averagedEmbedding: number[]; // Single embedding for entire transcript
  provider: string;
  model: string;
  metadata: {
    totalChunks: number;
    totalTokens: number;
    chunkingStrategy: string;
    embeddingDimensions: number;
    generatedAt: string;
  };
  usage?: TokenUsage; // Workflow usage metadata
}

Custom Prompts with promptOverrides

Customize specific sections of the summarization prompt for different use cases like SEO, social media, or technical analysis. See the Prompt Customization guide for a full overview of the prompt builder pattern.

Tip: Before adding overrides, read through the default summarization prompt template in src/workflows/summarization.ts (the summarizationPromptBuilder config) so that you have clear context on what each section does and what you're changing.

import { getSummaryAndTags } from "@mux/ai/workflows";

// SEO-optimized metadata
const seoResult = await getSummaryAndTags(assetId, {
  tone: "professional",
  promptOverrides: {
    task: "Generate SEO-optimized metadata that maximizes discoverability.",
    title: "Create a search-optimized title (50-60 chars) with primary keyword front-loaded.",
    keywords: "Focus on high search volume terms and long-tail keywords.",
  },
});

// Social media optimized for engagement
const socialResult = await getSummaryAndTags(assetId, {
  promptOverrides: {
    title: "Create a scroll-stopping headline using emotional triggers or curiosity gaps.",
    description: "Write shareable copy that creates FOMO and works without watching the video.",
    keywords: "Generate hashtag-ready keywords for trending and niche community tags.",
  },
});

// Technical/production analysis
const technicalResult = await getSummaryAndTags(assetId, {
  tone: "professional",
  promptOverrides: {
    task: "Analyze cinematography, lighting, and production techniques.",
    title: "Describe the production style or filmmaking technique.",
    description: "Provide a technical breakdown of camera work, lighting, and editing.",
    keywords: "Use industry-standard production terminology.",
  },
});

Available override sections:

Section Description
task Main instruction for what to analyze
title Guidance for generating the title
description Guidance for generating the description
keywords Guidance for generating keywords/tags
qualityGuidelines General quality instructions

Each override can be a simple string (replaces the section content) or a full PromptSection object for advanced control over XML tag names and attributes.

Common Types

TokenUsage

Returned by all workflows in the usage field:

interface TokenUsage {
  inputTokens?: number; // Tokens in the input prompt
  outputTokens?: number; // Tokens generated in the output
  totalTokens?: number; // Total tokens consumed
  reasoningTokens?: number; // Chain-of-thought reasoning tokens
  cachedInputTokens?: number; // Input tokens served from cache
  metadata?: {
    assetDurationSeconds?: number;
    thumbnailCount?: number;
  };
}

LanguageCodePair

Returned by translateCaptions and translateAudio:

interface LanguageCodePair {
  iso639_1: string; // Two-letter code (e.g., "en", "es") — use for Mux/browser players
  iso639_3: string; // Three-letter code (e.g., "eng", "spa") — use for ElevenLabs
}