Skip to content

Comments

feat(otel): Add OpenTelemetry GenAI instrumentation to Copilot Chat#3917

Draft
zhichli wants to merge 20 commits intomainfrom
zhichli/otel
Draft

feat(otel): Add OpenTelemetry GenAI instrumentation to Copilot Chat#3917
zhichli wants to merge 20 commits intomainfrom
zhichli/otel

Conversation

@zhichli
Copy link
Member

@zhichli zhichli commented Feb 21, 2026

Summary

Adds opt-in OpenTelemetry instrumentation to Copilot Chat following the OTel GenAI semantic conventions. Emits traces, metrics, and events for LLM calls, tool executions, agent orchestration, and embeddings. Existing telemetry (ITelemetryService) is unchanged.

What's included

Phase 0 — Foundation

  • IOTelService interface + OTelServiceImpl (Node) with DI registration
  • NoopOtelService for disabled/test/web paths
  • Config resolver with layered env precedence (COPILOT_OTEL_* > OTEL_* > VS Code settings > defaults)
  • GenAI semantic convention constants (genAiAttributes.ts)
  • Message formatters for OTel GenAI JSON schema
  • Metric instruments (gen_ai.client.token.usage, gen_ai.client.operation.duration, copilot_chat.*)
  • Event emitters (gen_ai.client.inference.operation.details, session/tool/agent events)
  • File exporters (JSON-lines fallback for CI/offline)
  • OTLP HTTP + gRPC + console exporter support

Phase 1 — Wiring into chat extension

  • Inference spans (chat {model}) in chatMLFetcher.ts — model, tokens, TTFT, finish reasons
  • Tool spans (execute_tool {name}) in toolsService.ts — tool name/type/id, args/results (opt-in)
  • Agent spans (invoke_agent {participant}) in toolCallingLoop.ts — parent span for full hierarchy
  • Embeddings spans (embeddings {model}) in remoteEmbeddingsComputer.ts
  • Content capture — full messages, responses, system instructions, tool definitions (opt-in via COPILOT_OTEL_CAPTURE_CONTENT=true)
  • Metrics recording at all instrumentation points
  • Diagnostic exporter logs first successful export for easy verification

Activation

Off by default. Enable via env vars:

COPILOT_OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Respects telemetry.telemetryLevel — globally disabled when telemetry is off.

Span hierarchy (Agent mode)

invoke_agent copilot                    [INTERNAL]
  ├── chat gpt-4o                       [CLIENT]
  ├── execute_tool readFile             [INTERNAL]
  ├── execute_tool runCommand           [INTERNAL]
  ├── chat gpt-4o                       [CLIENT]
  └── ...

Testing

  • 63 unit tests across 6 test files covering config, formatters, events, metrics, file exporters, noop service
  • Verified E2E with local Jaeger (OTLP HTTP on :4318)

Risk

  • Bundle size: OTel deps added (~200KB budget)
  • Zero overhead when disabled (noop providers)
  • No changes to existing ITelemetryService code paths

Phase 0 complete:
- spec.md: Full spec with decisions, GenAI semconv, dual-write, eval signals,
  lessons from Gemini CLI + Claude Code
- plan.md: E2E demo plan (chat ext + eval repo + Azure backend)
- src/platform/otel/: IOTelService, config, attributes, metrics, events,
  message formatters, NodeOTelService, file exporters
- package.json: Added @opentelemetry/* dependencies

OTel opt-in behind OTEL_EXPORTER_OTLP_ENDPOINT env var.
- Register IOTelService in DI (NodeOTelService when enabled, NoopOTelService when disabled)
- Add OTelContrib lifecycle contribution for OTel init/shutdown
- Add `chat {model}` inference span in ChatMLFetcherImpl._doFetchAndStreamChat()
- Add `execute_tool {name}` span in ToolsService.invokeTool()
- Add `invoke_agent {participant}` parent span in ToolCallingLoop.run()
- Record gen_ai.client.operation.duration, tool call count/duration, agent metrics
- Thread IOTelService through all ToolCallingLoop subclasses
- Update test files with NoopOTelService
- Zero overhead when OTel is disabled (noop providers, no dynamic imports)
- Add `embeddings {model}` span in RemoteEmbeddingsComputer.computeEmbeddings()
- Add VS Code settings under github.copilot.chat.otel.* in package.json
  (enabled, exporterType, otlpEndpoint, captureContent, outfile)
- Wire VS Code settings into resolveOTelConfig in services.ts
- Add unit tests for:
  - resolveOTelConfig: env precedence, kill switch, all config paths (16 tests)
  - NoopOTelService: zero-overhead noop behavior (8 tests)
  - GenAiMetrics: metric recording with correct attributes (7 tests)
…porters

- messageFormatters: 18 tests covering toInputMessages, toOutputMessages,
  toSystemInstructions, toToolDefinitions (edge cases, empty inputs, invalid JSON)
- genAiEvents: 9 tests covering all 4 event emitters, content capture on/off
- fileExporters: 5 tests covering write/read round-trip for span, log, metric
  exporters plus aggregation temporality

Total OTel test suite: 63 tests across 6 files
Add gen_ai.client.token.usage (input/output) and copilot_chat.time_to_first_token
histogram metrics at the fetchMany success path where token counts and TTFT
are available from the processSuccessfulResponse result.
… token usage

Wire emitInferenceDetailsEvent into fetchMany success path where full
token usage (prompt_tokens, completion_tokens), resolved model, request ID,
and finish reasons are available from processSuccessfulResponse.

This follows the OTel GenAI spec pattern:
- Spans: timing + hierarchy + error tracking
- Events: full request/response details including token counts

The data mirrors what RequestLogger captures for chat-export-logs.json.
Per the OTel GenAI agent spans spec, add gen_ai.usage.input_tokens and
gen_ai.usage.output_tokens as Recommended attributes on the invoke_agent span.

Tokens are accumulated across all LLM turns by listening to onDidReceiveResponse
events during the agent loop, then set on the span before it ends.

Ref: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
Defer the `chat {model}` span completion from _doFetchAndStreamChat to
fetchMany where processSuccessfulResponse has extracted token counts.

The chat span now carries:
- gen_ai.usage.input_tokens (prompt_tokens)
- gen_ai.usage.output_tokens (completion_tokens)
- gen_ai.response.model (resolved model)

The span handle is returned from _doFetchAndStreamChat via the result
object so fetchMany can set attributes and end it after tokens are known.

This matches the chat-export-logs.json pattern where each request entry
carries full usage data alongside the response.
…gs/results)

- Chat spans: add copilot.debug_name attribute for identifying orphan spans
- Chat spans: capture gen_ai.input.messages and gen_ai.output.messages when captureContent enabled
- Tool spans: capture gen_ai.tool.call.arguments and gen_ai.tool.call.result when captureContent enabled
- Extension chat endpoint: capture input/output messages when captureContent enabled
- Add CopilotAttr.DEBUG_NAME constant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant