[FEATURE] Text-to-Speech support (RubyLLM.speak)

### Scope check

- [x] This is **core LLM communication** (not application logic)
- [x] This **benefits most users** (not just my use case)
- [x] This **can't be solved in application code** with current RubyLLM
- [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)

### Due diligence

- [x] I searched existing issues
- [x] I checked the documentation

### What problem does this solve?

RubyLLM provides a beautiful, unified interface for LLM capabilities — `chat`, `paint`, `embed`, `transcribe`. But audio is only half the story: we can turn speech into text (`transcribe`), but not text into speech.

Developers building voice-enabled apps, accessibility features, or content pipelines currently have to drop out of RubyLLM's DSL to wire up TTS manually — choosing an HTTP client, handling binary responses, managing provider auth separately. This breaks the "one gem, consistent interface" experience that makes RubyLLM great.

 This issue proposes two related features:
 1. **`RubyLLM.speak`** — core TTS API (primary focus)
 2. **SSML Builder DSL** — Ruby DSL for building SSML documents (future phase, inspired by `RubyLLM::Schema`)

### Proposed solution

 ### API

 **Simple usage:**

 ```ruby
 speech = RubyLLM.speak("Hello, world!")
 speech.save("hello.mp3")
 ```

 **With options:**

 ```ruby
 speech = RubyLLM.speak("Hello, world!", model: "tts-1-hd", format: "wav")
 speech.save("hello.wav")
 ```

 **Per-call context:**

 ```ruby
 context = RubyLLM.context { |c| c.openai_api_key = "sk-..." }
 speech = context.speak("Hello!")
 ```

 ### Response Object: `RubyLLM::Speech`

 Following the pattern of `Transcription`, `Image`, `Embedding`:

 ```ruby
 class RubyLLM::Speech
   attr_reader :data      # Audio bytes (binary string)
   attr_reader :model     # Model ID used
   attr_reader :format    # Output format (mp3, wav, aac, flac, pcm)

   def save(path)         # Write audio to file
   def to_blob            # Raw binary data
   def mime_type          # e.g. "audio/mpeg"
 end
 ```

 ### Provider Examples

 **OpenAI** (`/v1/audio/speech`):

 ```ruby
 # lib/ruby_llm/providers/openai/speech.rb
 module RubyLLM::Providers::OpenAI::Speech
   def speak(text, model:, format: "mp3", **options)
     response = connection.post("/v1/audio/speech", {
       model: model,
       input: text,
       voice: "alloy",          # Single default voice for now
       response_format: format
     })

     { audio_data: response.body, format: format }
   end
 end
 ```

 **Azure** (`/cognitiveservices/v1`):

 ```ruby
 # lib/ruby_llm/providers/azure/speech.rb
 module RubyLLM::Providers::Azure::Speech
   def speak(input, model:, format: "mp3", **options)
     ssml = ssml?(input) ? input : wrap_in_ssml(input, voice: "en-US-AvaMultilingualNeural")

     response = connection.post("cognitiveservices/v1") do |req|
       req.headers["Content-Type"] = "application/ssml+xml"
       req.headers["X-Microsoft-OutputFormat"] = audio_format(format)
       req.headers["Ocp-Apim-Subscription-Key"] = config.azure_speech_api_key
       req.body = ssml
     end

     { audio_data: response.body, format: format }
   end

   private

   def ssml?(input)
     input.strip.start_with?("<speak")
   end

   def wrap_in_ssml(text, voice:)
     <<~SSML
       <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
         <voice name="#{voice}">#{text}</voice>
       </speak>
     SSML
   end
 end
 ```

 ### Configuration

 ```ruby
 RubyLLM.configure do |config|
   config.default_speech_model = "tts-1"  # New config attribute
 end
 ```

 ### Files to Add/Modify

 | File | Change |
 |------|--------|
 | `lib/ruby_llm.rb` | Add `self.speak` method |
 | `lib/ruby_llm/speech.rb` | New `Speech` class |
 | `lib/ruby_llm/configuration.rb` | Add `default_speech_model` |
 | `lib/ruby_llm/providers/openai.rb` | Include `Speech` module |
 | `lib/ruby_llm/providers/openai/speech.rb` | New provider implementation |
 | `lib/ruby_llm/providers/openai/capabilities.rb` | Add `speech: true` |
 | `lib/ruby_llm/providers/azure/speech.rb` | New provider implementation |
 | Model registry | Register TTS models (`tts-1`, `tts-1-hd`) |

### Why this belongs in RubyLLM

TTS isn't a simple API wrapper you'd write in application code. It requires:

 - **Model resolution** across providers — the same model name might map to different endpoints on OpenAI vs Azure vs Google. RubyLLM's `Models.resolve` already handles this.
 - **Provider abstraction** — each TTS provider has different auth mechanisms, endpoints, request/response formats, and audio output options. App code shouldn't know these details.
 - **Configuration management** — API keys, defaults, per-call overrides. RubyLLM's `Configuration` and `Context` system already solves this.
 - **Binary response handling** — TTS returns audio bytes, not JSON. This needs different connection/parsing logic that belongs in the provider layer.

Most importantly: `transcribe` (audio → text) is already in RubyLLM. `speak` (text → audio) is its natural counterpart. Leaving it out means developers use RubyLLM for 90% of their LLM needs but have to roll their own for this one capability — exactly the fragmentation RubyLLM was built to eliminate.

 ### Non-Goals (for initial PR)

 - **Multiple voices / voice selection** — each provider uses a single sensible default voice for now
 - **SSML support** — separate issue
 - **Streaming audio** — can layer on later
 - **Other providers beyond OpenAI + Azure** — can be added incrementally

 ## Related

 - Originated from discussion: https://github.com/crmne/ruby_llm/discussions/637
 - SSML Builder DSL: [will open as separate issue]

File	Change
`lib/ruby_llm.rb`	Add `self.speak` method
`lib/ruby_llm/speech.rb`	New `Speech` class
`lib/ruby_llm/configuration.rb`	Add `default_speech_model`
`lib/ruby_llm/providers/openai.rb`	Include `Speech` module
`lib/ruby_llm/providers/openai/speech.rb`	New provider implementation
`lib/ruby_llm/providers/openai/capabilities.rb`	Add `speech: true`
`lib/ruby_llm/providers/azure/speech.rb`	New provider implementation
Model registry	Register TTS models (`tts-1`, `tts-1-hd`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Text-to-Speech support (RubyLLM.speak) #651

Scope check

Due diligence

What problem does this solve?

Proposed solution

API

Response Object: `RubyLLM::Speech`

Provider Examples

Configuration

Files to Add/Modify

Why this belongs in RubyLLM

Non-Goals (for initial PR)

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Text-to-Speech support (RubyLLM.speak) #651

Description

Scope check

Due diligence

What problem does this solve?

Proposed solution

API

Response Object: RubyLLM::Speech

Provider Examples

Configuration

Files to Add/Modify

Why this belongs in RubyLLM

Non-Goals (for initial PR)

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Response Object: `RubyLLM::Speech`