Skip to content

🤖 feat: emit OpenTelemetry traces/spans for agent activity#3483

Open
dcieslak19973 wants to merge 2 commits into
coder:mainfrom
dcieslak19973:feat/otel-tracing
Open

🤖 feat: emit OpenTelemetry traces/spans for agent activity#3483
dcieslak19973 wants to merge 2 commits into
coder:mainfrom
dcieslak19973:feat/otel-tracing

Conversation

@dcieslak19973
Copy link
Copy Markdown

Summary

Adds an opt-in OpenTelemetry tracing path so mux agent activity can be observed in any OTLP-compatible backend (Jaeger, Grafana Tempo, SigNoz, Honeycomb, ...), in the same spirit as codex-cli and opencode. Each agent turn becomes one trace rooted at a mux.stream span, with the AI SDK's built-in telemetry contributing the nested LLM/tool spans (ai.streamText, ai.streamText.doStream, ai.toolCall) carrying standard gen_ai.* attributes.

Background

codex-cli (service.name=codex_cli_rs) and opencode both ship OTEL exporters that emit per-session/per-request spans for observability. mux already has anonymous product telemetry (PostHog) but no distributed tracing for inspecting agent turns. This adds that, using the standard OpenTelemetry SDK rather than re-implementing those projects' code.

Implementation

  • TracingService (src/node/services/tracingService.ts): registers a global NodeTracerProvider with an OTLP/HTTP exporter. Opt-in and OFF by default — enabled only when a standard OTEL env var is set (OTEL_EXPORTER_OTLP_ENDPOINT / OTEL_EXPORTER_OTLP_TRACES_ENDPOINT / OTEL_SERVICE_NAME), and never when MUX_DISABLE_TELEMETRY=1 or OTEL_SDK_DISABLED=true. Startup-safe (all setup wrapped in try/catch → no-op fallback). Wired through ServiceContainer (init + shutdown) and mux server via createCoreServices.
  • Stream instrumentation (streamManager.ts): each turn opens a mux.stream span carrying mux.workspace.id, mux.workspace.name, mux.agent.mode, mux.thinking_level, gen_ai.request.model. streamText() is invoked inside that span's context (and retried requests reuse it), so the AI SDK's spans nest beneath it. The span is closed in the stream's guaranteed-cleanup path (and the abort-before-start bail path) to avoid leaks.
  • Privacy: prompt/response bodies are redacted by default (matching codex's log_user_prompt=false); opt in with MUX_OTEL_RECORD_IO=1.
  • Docs: new docs/reference/tracing.mdx (+ nav/redirect) and a cross-link from the existing telemetry page.

Licensing

This is an original implementation built directly on the upstream OpenTelemetry SDK and the AI SDK's experimental_telemetry hook — no code is vendored from codex-cli or opencode, and the span/attribute names targeted are open OpenTelemetry semantic conventions. The new dependencies (@opentelemetry/*) are all Apache-2.0, compatible with mux's AGPL-3.0.

Validation

  • New behavioral tests in tracingService.test.ts cover the env gating matrix (opt-in, opt-outs, blank values) and the disabled-instance no-op contracts, plus an enabled round-trip asserting our span is the active context (the mechanism that lets AI SDK spans nest). Confirmed a single hoisted @opentelemetry/api@1.9.0 instance (shared with ai), so context propagation/nesting holds.
  • lint, typecheck (both tsconfigs), check_eager_imports, prettier, and the streamManager suite all pass.

Risks

Touches the streaming hot path in streamManager.ts, but all tracing calls are guarded behind this.tracingService? and become no-ops when tracing is disabled (the default), so behavior is unchanged unless an OTEL endpoint is configured.


Generated with mux • Model: claude-opus-4-8

dcieslak19973 and others added 2 commits June 7, 2026 14:26
Add an opt-in OTLP tracing path so mux agent turns can be observed in any
OpenTelemetry backend, similar to codex-cli and opencode. New TracingService
(built directly on the upstream OTEL SDK; no vendored code) registers a global
tracer provider gated by standard OTEL env vars. StreamManager wraps each turn
in a mux.stream span and enables the AI SDK's experimental_telemetry so
ai.streamText/ai.toolCall spans (gen_ai.* attributes) nest beneath it. Prompts
redacted by default (MUX_OTEL_RECORD_IO to opt in). Off unless an OTLP endpoint
is configured; disabled by MUX_DISABLE_TELEMETRY / OTEL_SDK_DISABLED.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dcieslak19973
Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant