[FEATURE] Expose stop_reason from streaming responses

## Summary

When streaming responses from Anthropic, the `stop_reason` field is sent in `message_delta` SSE events but not captured or exposed by RubyLLM. This field is important for understanding why a response ended.

## Background

Anthropic's streaming API sends `stop_reason` in the final `message_delta` event:

```json
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":42}}
```

Possible values:
- `end_turn` - Normal completion
- `tool_use` - Model wants to call a tool
- `max_tokens` - Response truncated due to token limit
- `pause_turn` - Server paused for native tool execution (web_search, etc.)

## Current Behavior

`StreamAccumulator` doesn't capture `stop_reason`. The final `Message` has no way to indicate why the response ended.

## Proposed Solution

1. Add `stop_reason` attribute to `Message` class
2. Capture from `message_delta` events in `StreamAccumulator`
3. Attach to final `Message` in `to_message`

```ruby
# In StreamAccumulator
def add(chunk)
  # ... existing logic ...
  
  if chunk.respond_to?(:raw_sse_data) && chunk.raw_sse_data
    data = chunk.raw_sse_data
    if data["type"] == "message_delta" && data.dig("delta", "stop_reason")
      @stop_reason = data.dig("delta", "stop_reason")
    end
  end
end

def to_message(response = nil)
  message = # ... existing logic ...
  message.stop_reason = @stop_reason if @stop_reason
  message
end
```

## Use Cases

1. **Truncation detection**: `max_tokens` indicates the response was cut off - may need to continue
2. **Native tool handling**: `pause_turn` indicates server-side tool execution - may need special handling
3. **Observability**: Log non-standard stop reasons for debugging
4. **Flow control**: Different behavior based on why the model stopped

## Prerequisites

This requires access to raw SSE data on chunks. Issue #567 proposes exposing `raw_sse_data` for server tool filtering - the same infrastructure would enable this feature.

## Current Workaround

We're using a monkey patch:

```ruby
module RubyLLM
  class StreamAccumulator
    attr_accessor :stop_reason

    alias_method :add_before_stop_reason, :add
    def add(chunk)
      add_before_stop_reason(chunk)
      if chunk.respond_to?(:raw_sse_data) && chunk.raw_sse_data
        data = chunk.raw_sse_data
        if data["type"] == "message_delta" && data.dig("delta", "stop_reason")
          @stop_reason = data.dig("delta", "stop_reason")
        end
      end
    end

    alias_method :to_message_before_stop_reason, :to_message
    def to_message(response = nil)
      message = response ? to_message_before_stop_reason(response) : to_message_before_stop_reason
      if @stop_reason
        message.instance_variable_set(:@stop_reason, @stop_reason)
        message.define_singleton_method(:stop_reason) { @stop_reason }
      end
      message
    end
  end
end
```

## Related

- #567 - Server tools support (proposes `raw_sse_data` exposure)
- #414 - Observability middleware (could provide hooks for this)

## References

- [Anthropic Streaming Messages](https://docs.anthropic.com/en/docs/build-with-claude/streaming)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Expose stop_reason from streaming responses #568

Summary

Background

Current Behavior

Proposed Solution

Use Cases

Prerequisites

Current Workaround

Related

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Expose stop_reason from streaming responses #568

Description

Summary

Background

Current Behavior

Proposed Solution

Use Cases

Prerequisites

Current Workaround

Related

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions