Skip to content

[FEATURE] Anthropic Server Tools (web_search, web_fetch) - Native Support + Streaming Fix #567

@erenth

Description

@erenth

Summary

Anthropic's API includes server-executed tools (web_search, web_fetch) that run on Anthropic's infrastructure. These are distinct from regular tools—they require beta headers, special payload format, and introduce a streaming edge case that corrupts regular tool call arguments.

This is core LLM communication, not an application-level feature. The recent extended thinking implementation (#551 → PR #552, merged in v1.10.0) establishes the pattern for provider-specific API features. Server tools follow the same pattern.

Background: Issue #205 and Remaining Gaps

Issue #205 ("Claude server tools support") was closed via commit 10eaba3, which improved with_params to prioritize user params. However, practical usage reveals three remaining gaps:

  1. Array replacement: Utils.deep_merge replaces arrays instead of concatenating. Using with_params({ tools: [native_tools] }) replaces all existing tools instead of adding to them. (Related: [BUG] OpenAi tool_choice being overridden when using with_params #317 showed similar override issues with tool_choice)

  2. Beta headers: Server tools require anthropic-beta headers that aren't automatically added.

  3. Streaming corruption: StreamAccumulator appends server_tool_use JSON deltas to regular tool calls, causing JSON::ParserError. (Different from [BUG] Tool call streaming accumulates all argument deltas before releasing them all at once #228—that's about buffering timing; this is about wrong assignment to incorrect tool IDs)

Problem 1: Streaming Tool Call Corruption

When using server tools with streaming, content_block_delta events with input_json_delta data are processed by StreamAccumulator.accumulate_tool_calls. The issue:

# lib/ruby_llm/stream_accumulator.rb:66-83
def accumulate_tool_calls(new_tool_calls)
  new_tool_calls.each_value do |tool_call|
    if tool_call.id
      # ... stores tool with ID
      @latest_tool_call_id = tool_call.id
    else
      # Server tool deltas have nil IDs - they get appended here!
      existing = @tool_calls[@latest_tool_call_id]
      existing.arguments << tool_call.arguments if existing
    end
  end
end

Root cause: Server tool (server_tool_use) streaming deltas have nil tool IDs because they're executed server-side. RubyLLM appends their JSON fragments to @latest_tool_call_id, corrupting regular tool arguments.

Example error:

{
  "error_class": "JSON::ParserError",
  "error_message": "unexpected token at end of stream '{\"query\":' at line 1 column 47"
}

Stream format showing the issue:

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"tool_use","id":"toolu_01ABC","name":"my_tool","input":{}}}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"server_tool_use","id":"srvtoolu_01XYZ","name":"web_search","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":1,"delta":{"type":"input_json_delta","partial_json":"{\"query\":"}}

The index:1 delta (server tool) gets wrongly appended to toolu_01ABC (regular tool).

Problem 2: No Native Way to Add Server Tools

The with_params approach has a fundamental limitation:

# This REPLACES existing tools instead of adding to them
chat.with_params(tools: [
  { type: "web_search_20250305", name: "web_search", max_uses: 10 }
])

This happens because Utils.deep_merge uses replacement semantics for arrays:

# lib/ruby_llm/utils.rb:35-43
def deep_merge(original, overrides)
  original.merge(overrides) do |_key, original_value, overrides_value|
    if original_value.is_a?(Hash) && overrides_value.is_a?(Hash)
      deep_merge(original_value, overrides_value)
    else
      overrides_value  # Arrays REPLACE, not concatenate
    end
  end
end

Proposed Solutions

Solution 1: Filter Server Tool Deltas in StreamAccumulator (Bug Fix)

Track server_tool_use content block indices and skip their input_json_delta events. This follows the same pattern as the extended thinking implementation (#552) which modified StreamAccumulator.add to handle thinking_delta events:

class StreamAccumulator
  def add(chunk)
    # If chunk has raw SSE data (from streaming)
    if chunk.respond_to?(:raw_sse_data) && chunk.raw_sse_data
      data = chunk.raw_sse_data

      case data["type"]
      when "content_block_start"
        if data.dig("content_block", "type") == "server_tool_use"
          @server_tool_indices ||= Set.new
          @server_tool_indices.add(data["index"])
        end
      when "content_block_delta"
        # Skip server tool JSON deltas - they'd corrupt regular tools
        if @server_tool_indices&.include?(data["index"]) &&
           data.dig("delta", "type") == "input_json_delta"
          return
        end
      when "message_stop"
        @server_tool_indices = nil
      end
    end

    # Continue with normal processing
    # ... existing add logic
  end
end

Why this is safe: Server tools are executed by Anthropic. Their results are injected back into Claude's context as web_search_tool_result blocks. The client doesn't need to track their arguments—Claude's response already reflects seeing the results.

Prerequisite: This requires raw_sse_data on chunks. The extended thinking implementation (#552) already added this for Anthropic streaming to capture thinking_delta events. If that's not yet exposed, the same pattern applies.

Solution 2: Native Server Tools Method (Feature)

Following the pattern from #343 discussion where the maintainer preferred integrating options into existing methods, server tools could be added via a dedicated method:

chat = RubyLLM.chat(model: "claude-sonnet-4-20250514")
  .with_server_tools(:web_search, :web_fetch)
  .with_tool(MyCustomTool)
  .ask("What's the latest news about Ruby?")

This would:

  1. Add beta headers automatically (anthropic-beta: web-search-2025-03-05,web-fetch-2025-09-10)
  2. Inject server tool definitions into the payload's tools array (concatenating, not replacing)
  3. Enable the streaming filter for server_tool_use events

Alternative: Integrate into with_tools similar to #343's proposed pattern:

chat.with_tools(MyTool, server: [:web_search, :web_fetch])

Solution 3 (Alternative): Add add_params Method

For more general use cases beyond server tools, add an additive variant of with_params:

# In RubyLLM::Chat
def add_params(**params)
  @params = Utils.deep_merge_additive(@params || {}, params)
  self
end

# In RubyLLM::Utils
def deep_merge_additive(original, overrides)
  original.merge(overrides) do |_key, orig_val, override_val|
    if orig_val.is_a?(Hash) && override_val.is_a?(Hash)
      deep_merge_additive(orig_val, override_val)
    elsif orig_val.is_a?(Array) && override_val.is_a?(Array)
      orig_val + override_val  # Concatenate instead of replace
    else
      override_val
    end
  end
end

Usage:

chat
  .with_tool(MyTool)  # Regular tools
  .add_params(tools: [{ type: "web_search_20250305", name: "web_search" }])

This addresses the general array-replacement limitation that affects multiple use cases.

Why This Belongs in RubyLLM (Not Application Code)

Per the CONTRIBUTING guidelines, RubyLLM focuses on "core LLM communication." Server tools are:

  1. An Anthropic API feature - They're part of the Messages API, not application-level tooling
  2. Provider-specific streaming behavior - The corruption bug is in RubyLLM's StreamAccumulator, not application code
  3. Precedent: Extended thinking ([FEATURE] Better Thinking/Thinking Streaming support #551Add Extended Thinking support for reasoning models #552) - Same pattern: provider-specific API features that required streaming modifications, message handling, and configuration methods

The streaming fix in particular cannot be solved in application code without monkey-patching StreamAccumulator.

Implementation Considerations

  1. Backward compatibility: All solutions are additive—existing code continues to work
  2. Minimal scope: The streaming fix is ~20 lines; server tools config follows existing patterns
  3. No external dependencies: Uses existing RubyLLM patterns (capabilities, config, params)
  4. Testing: Can follow the VCR cassette pattern from Add Extended Thinking support for reasoning models #552

Related Issues

Issue Relevance
#205 Original server tools request - closed but gaps remain
#228 Streaming tool accumulation (different issue - timing vs. wrong assignment)
#317 tool_choice override - similar deep_merge limitation
#343 Tool control parameters - relevant for API design pattern
#551#552 Extended thinking - precedent for provider-specific streaming features

Cross-Provider Precedent

The community-built ruby_llm-responses_api gem demonstrates similar need for OpenAI's native tools (web_search_preview, code_interpreter, file_search). It uses with_params(tools: [...]) which works but has the same array replacement limitation.

This suggests native/server-executed tools are a cross-provider pattern that would benefit from first-class RubyLLM support, potentially with a unified API abstracting provider differences.

References

Current Workaround

We're currently using monkey patches in production:

# 1. Add beta headers to Anthropic provider
module RubyLLM::Providers
  class Anthropic
    alias_method :original_headers, :headers
    def headers
      original_headers.merge("anthropic-beta" => "web-search-2025-03-05,web-fetch-2025-09-10")
    end
  end
end

# 2. Inject native tools into payload
module RubyLLM::Providers::Anthropic::Chat
  alias_method :original_add_optional_fields, :add_optional_fields
  def add_optional_fields(payload, system_content:, tools:, temperature:)
    original_add_optional_fields(payload, system_content:, tools:, temperature:)
    payload[:tools] ||= []
    payload[:tools].concat(NATIVE_TOOLS)
  end
end

# 3. Filter streaming corruption in StreamAccumulator
class RubyLLM::StreamAccumulator
  alias_method :add_before_filter, :add
  def add(chunk)
    # Track server_tool_use indices, skip their input_json_delta events
    if chunk.respond_to?(:raw_sse_data) && chunk.raw_sse_data
      data = chunk.raw_sse_data
      case data["type"]
      when "content_block_start"
        if data.dig("content_block", "type") == "server_tool_use"
          @server_tool_indices ||= Set.new
          @server_tool_indices.add(data["index"])
        end
      when "content_block_delta"
        return if @server_tool_indices&.include?(data["index"]) &&
                  data.dig("delta", "type") == "input_json_delta"
      when "message_stop"
        @server_tool_indices = nil
      end
    end
    add_before_filter(chunk)
  end
end

This works but requires maintaining patches across RubyLLM updates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions