Skip to content

[BOT ISSUE] Anthropic beta.messages.toolRunner() agentic loop not instrumented as parent span #1827

@braintrust-bot

Description

@braintrust-bot

Summary

The Anthropic TypeScript SDK provides a documented client.beta.messages.toolRunner() helper that runs an automated multi-turn tool execution loop. While individual beta.messages.create calls within the loop ARE traced (because the wrapper proxies create), the overall agentic execution has no parent TASK span. Users see individual LLM calls but cannot group them as a single agent execution.

This is inconsistent with how other agentic surfaces in this repo are instrumented:

  • Google ADK Runner.runAsync → parent TASK span
  • Claude Agent SDK query → parent span with tool sub-spans
  • AI SDK Agent.generate / Agent.stream → parent span
  • OpenAI Agents integration → full agent tracing with hierarchy

What instrumentation is missing

  • Wrapper (js/src/wrappers/anthropic.ts): betaProxy() only intercepts beta.messages, and messagesProxy() only intercepts create (line 79). The toolRunner method passes through unproxied with no parent span.
  • Channels (js/src/instrumentation/plugins/anthropic-channels.ts): Only messagesCreate and betaMessagesCreate channels are defined. No channel for toolRunner.
  • Plugin (js/src/instrumentation/plugins/anthropic-plugin.ts): No handler for toolRunner calls.
  • Vendor types (js/src/vendor-sdk-types/anthropic.ts): AnthropicMessages interface only declares create. No toolRunner method.

A grep for toolRunner, tool_runner, and runTools across js/src/ returns zero matches.

Concrete impact

When a user calls:

const result = await client.beta.messages.toolRunner({
  model: 'claude-sonnet-4-5-20250929',
  max_tokens: 1024,
  tools: [weatherTool, calculatorTool],
  messages: [{ role: 'user', content: 'What is the temperature in NYC in Celsius?' }],
});

They see N individual beta.messages.create spans (one per loop iteration), but:

  • No parent span grouping all iterations
  • No tool execution sub-spans for the run callbacks
  • No aggregate metrics (total tokens, total duration across all iterations)
  • No way to distinguish a toolRunner agentic execution from independent create calls

Braintrust docs status

not_found — The Braintrust Anthropic integration docs at https://www.braintrust.dev/docs/integrations/ai-providers/anthropic document messages.create only. The toolRunner helper is not mentioned.

Upstream references

Local files inspected

  • js/src/wrappers/anthropic.tsbetaProxy and messagesProxy only intercept create
  • js/src/vendor-sdk-types/anthropic.tsAnthropicMessages only declares create
  • js/src/instrumentation/plugins/anthropic-channels.ts — only messagesCreate and betaMessagesCreate
  • js/src/instrumentation/plugins/anthropic-plugin.ts — only subscribes to message create channels
  • e2e/scenarios/anthropic-instrumentation/ — no toolRunner test scenarios

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions