Skip to content

Add Support for Audio Generation Orchestrator #825

@nickna

Description

@nickna

Description

Extend the newly refactored MediaGenerationOrchestrator base class to support audio generation workflows.

Background

The recent refactoring created a generic MediaGenerationOrchestrator<TRequest, TResponse, TEventRequest> base class that successfully handles both image and video generation. This same pattern can be extended to support audio generation.

Requirements

1. Create AudioGenerationOrchestrator

  • Extend MediaGenerationOrchestrator base class
  • Implement abstract methods for audio-specific logic
  • Handle audio-specific formats (MP3, WAV, OGG, etc.)
  • Support both TTS and music generation workflows

2. Audio-Specific Processing

  • Implement audio file validation
  • Support streaming audio generation
  • Handle audio metadata (duration, bitrate, sample rate)
  • Implement audio transcoding if needed

3. Event Support

  • AudioGenerationRequested
  • AudioGenerationStarted
  • AudioGenerationProgress
  • AudioGenerationCompleted
  • AudioGenerationFailed
  • AudioGenerationCancelled

4. Provider Integration

  • Support ElevenLabs provider (already in ProviderType enum)
  • Support Ultravox provider (already in ProviderType enum)
  • Consider adding OpenAI TTS support

Implementation Notes

  • Follow the same patterns used in ImageGenerationOrchestrator and VideoGenerationOrchestrator
  • Reuse existing media storage infrastructure
  • Consider audio-specific cost calculation (per character/second)
  • Add appropriate unit tests following MediaGenerationOrchestratorTestBase pattern

Benefits

  • Unified architecture for all media generation
  • Consistent error handling and retry logic
  • Reusable infrastructure and patterns
  • Easy to maintain and extend

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions