Skip to content

feat(telemetry): opt-in OpenTelemetry tracing for agents#2069

Draft
Mgczacki wants to merge 1 commit into
dimensionalOS:mainfrom
Mgczacki:agent_observability
Draft

feat(telemetry): opt-in OpenTelemetry tracing for agents#2069
Mgczacki wants to merge 1 commit into
dimensionalOS:mainfrom
Mgczacki:agent_observability

Conversation

@Mgczacki
Copy link
Copy Markdown

@Mgczacki Mgczacki commented May 13, 2026

Closes #2053.

Why

When iterating on agent changes I kept reaching for trace visibility — to see
how tool calls nested, where time was going, when the LLM was getting confused
mid-turn. I built a quick Langfuse-specific integration locally for my own dev
loop. It worked well enough that contributing it back seemed worth doing —
hierarchical-trace tooling is becoming a standard best practice for agent
development, and it's the kind of thing nobody wants to re-roll per project.

Rather than ship the Langfuse-specific version, I generalized it to an
OpenTelemetry pipeline. One integration covers all four backends named in
#2053 — Langfuse, Arize Phoenix, LangSmith, Opik — plus any other
OTLP-compatible backend (Jaeger, Tempo, Honeycomb, ...) for free. Vendor
selection is by env var, not code change. The agent itself never imports a
vendor SDK.

An example of how this looks in practice can be seen here: https://us.cloud.langfuse.com/project/cmp23t80n09ooad08jnw1lksy/traces/d1b1d6c95ec9107fdcc4df5fa094edef?observation=trace-d1b1d6c95ec9107fdcc4df5fa094edef&timestamp=2026-05-13T05:33:39.063Z

Backend OTLP endpoint Session/thread grouping
Langfuse https://{us,cloud}.langfuse.com/api/public/otel ✓ via session.id (OpenInference)
Arize Phoenix / AX local Phoenix collector, or arize.com ✓ via session.id (OpenInference; Arize owns this conv)
LangSmith LangSmith OTEL endpoint ✓ via langsmith.trace.session_id (LangSmith namespace)
Opik comet.com OTEL endpoint ✓ via thread_id (added in comet-ml/opik#3441)

What's in this PR

dimos/telemetry/ — new self-contained module

  • __init__.py — public API: span, enable, configure_tracing, session_attributes, DimosInstrumentor.
  • _manager.py — single module-global TracerManager (tracer + _export_enabled).
  • _api.py — the span(name, **attrs) context manager. No-op when off.
  • instrumentor.pyDimosInstrumentor(BaseInstrumentor) with a top-level try/except fallback to a stub class when dimos[otel] isn't installed.
  • test_telemetry.py — 6 pytest tests covering the no-op default, dotted-key attribute pass-through, session attribute shape, enable() short-circuits, lazy DimosInstrumentor resolution, and the strict-opt-in import contract (subprocess test).

Agent integration

McpClient._process_message wraps the per-turn state_graph.stream(...) loop in dimos.telemetry.span("agent.turn", ...). Each McpClient instance generates a UUID at construction; every turn's root span is tagged via the session_attributes() helper (sets session.id, langsmith.trace.session_id, and thread_id) so all turns from one agent instance appear under a single session/thread in the backend UI.

Extras

pyproject.toml adds [project.optional-dependencies] otel = [...] with the OTEL API/SDK, OTLP HTTP exporter, opentelemetry-instrumentation, and openinference-instrumentation-langchain for LangChain auto-spans.

Strict opt-in — what "core path unchanged" means here

  • The base install ships no opentelemetry packages.
  • import dimos.telemetry triggers zero opentelemetry imports, even when dimos[otel] is installed. Pure stdlib at import time. Locked in by a subprocess test.
  • The span() helper short-circuits on a single boolean check when tracing is off.
  • Cost for users who don't enable tracing is one os.environ.get() call at package import.

Three ways to turn it on

1. Env-driven (recommended). Install the extra and set OTEL_EXPORTER_OTLP_ENDPOINT. enable() runs automatically on first import of dimos.telemetry. Auth via OTEL_EXPORTER_OTLP_HEADERS; service name via OTEL_SERVICE_NAME.

2. Caller-owned provider. dimos.telemetry.configure_tracing(my_provider) when the host app already runs OTEL.

3. Standard OTEL idiom. DimosInstrumentor().instrument(tracer_provider=...).

Concrete usage

Langfuse:

OTEL_EXPORTER_OTLP_ENDPOINT='https://us.cloud.langfuse.com/api/public/otel' \
OTEL_EXPORTER_OTLP_HEADERS='Authorization=Basic <base64(public_key:secret_key)>' \
OTEL_SERVICE_NAME='dimos' \
uv run dimos ...

Phoenix (local):

phoenix serve &
OTEL_EXPORTER_OTLP_ENDPOINT='http://localhost:6006' uv run dimos ...

LangSmith:

OTEL_EXPORTER_OTLP_ENDPOINT='https://api.smith.langchain.com/otel' \
OTEL_EXPORTER_OTLP_HEADERS='x-api-key=<your_key>' \
uv run dimos ...

When the env var is unset (the default), behavior is unchanged from main.

Out of scope (intentionally) — happy to follow up

  • Vendor-native pretty rendering. Langfuse's own langfuse.langchain.CallbackHandler renders messages/tool-calls more richly than generic OpenInference attributes. Adding it would conflict with the OpenInference auto-instrumentor (duplicate spans), so it belongs in a follow-up [langfuse] extra that swaps the LangChain instrumentation path.
  • Prompt versioning, datasets, evals. These use vendor-native SDKs rather than OTLP. Separate scope.
  • Tracing other DimOS modules. Only the agent is wrapped for now; dimos.telemetry.span() is the shared helper any module can adopt.

Repo checks

  • ruff format --check
  • ruff check
  • mypy dimos/telemetry/
  • pytest dimos/telemetry/ → 6 passed in 0.16s

Design choices reviewers may ask about

  • Why three session/thread attribute keys? Verified against each backend's current docs. There's no single convention yet: session.id (OpenInference) covers Langfuse + Phoenix; langsmith.trace.session_id covers LangSmith; thread_id covers Opik. Setting all three covers every backend named in the issue without runtime detection. Documented in session_attributes().
  • Why does auto-enable run at import time when the env var is set? Setting OTEL_EXPORTER_OTLP_ENDPOINT is the user's opt-in signal — eager setup ensures the first agent turn is already traced. When unset, the cost is a single os.environ.get() call.
  • uv.lock noise. Five new packages (the OTEL stack — openinference-instrumentation-langchain, openinference-instrumentation, openinference-semantic-conventions, opentelemetry-exporter-otlp-proto-http, opentelemetry-instrumentation) plus a one-line transitive patch bump (reportlab 4.5.0 → 4.5.1) that uv lock picked up when regenerating.

Add `dimos.telemetry`, a self-contained module that wraps the agent's
per-turn execution in an OTEL span. The base install is unaffected:
this package imports no opentelemetry packages at module load, and
`dimos.telemetry.span` is a silent no-op until tracing is wired up.

Wiring options:

  * Env-driven (recommended). Install the extra and set
    `OTEL_EXPORTER_OTLP_ENDPOINT`. `enable()` runs automatically on first
    import and configures the OTLP HTTP exporter plus, when available,
    `openinference-instrumentation-langchain` for LangChain auto-spans.

  * Caller-owned provider: `configure_tracing(my_provider)`.

  * Standard OTEL idiom: `DimosInstrumentor().instrument(...)`. The class
    is resolved lazily via module-level `__getattr__` so the heavy
    `opentelemetry.instrumentation` import only runs on attribute access.

Vendor-agnostic via OTLP: Langfuse, Arize Phoenix, LangSmith, and Opik
all accept the same pipeline; selection is by env var, not code.

Each McpClient instance now generates a UUID at construction and stamps
it on every `agent.turn` span via `session_attributes()`, which sets
both the OpenInference `session.id` (Langfuse, Phoenix) and
`langsmith.trace.session_id` (LangSmith). Backends group all per-turn
traces from one instance into a single session in their UI. Opik has no
OTEL→Threads mapping yet (comet-ml/opik#3441); use its native SDK there.
@Mgczacki Mgczacki force-pushed the agent_observability branch from 0c9395d to 2419dfc Compare May 13, 2026 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optional LangSmith / Langfuse integration for agent observability

1 participant