Distributed Tracing

This guide shows how to enable OpenTelemetry distributed tracing across llm-d components.

Components

Component	Chart / Config	What gets traced
vLLM (prefill + decode)	ModelService `tracing:`	Inference engine spans
Routing proxy (P/D sidecar)	ModelService `tracing:`	KV transfer coordination
EPP / Inference Scheduler	GAIE `inferenceExtension.tracing:`	Request routing, endpoint scoring, KV-cache indexing

All components export traces via OTLP gRPC to an OpenTelemetry Collector, which filters out noise (e.g., /metrics scraping spans), batches traces, and forwards them to a backend (Jaeger, Tempo, etc.).

Quick Start: Deploy OTel Collector + Jaeger

The install script deploys both an OTel Collector and Jaeger all-in-one into a namespace of your choice. Deploy them into the same namespace as your llm-d workload so components can reach the collector at http://otel-collector:4317.

# Deploy OTel Collector + Jaeger into your llm-d namespace
../scripts/install-otel-collector-jaeger.sh -n <your-namespace>

# Uninstall
../scripts/install-otel-collector-jaeger.sh -u -n <your-namespace>

If the OpenTelemetry Operator is installed, the script deploys the collector as an OpenTelemetryCollector CR (otel-collector-operator.yaml). Otherwise it uses a standalone Deployment (otel-collector.yaml).

Access the Jaeger UI:

kubectl port-forward -n <your-namespace> svc/jaeger-collector 16686:16686
# Open http://localhost:16686

Note: Jaeger all-in-one is an in-memory deployment for development and testing.

Manual deployment

If you prefer to apply manifests directly:

kubectl apply -n <your-namespace> -f jaeger-all-in-one.yaml -f otel-collector.yaml  # standalone
# or, if the OTel Operator is installed:
kubectl apply -n <your-namespace> -f otel-collector-operator.yaml  # operator CR

Enable Tracing

By default, all chart values point to http://otel-collector:4317 (same namespace). When tracing is enabled and the OTel Collector is deployed alongside your workload, traces flow automatically.

ModelService (vLLM + routing proxy)

Uncomment the tracing: section in your ms-*/values.yaml:

tracing:
  enabled: true
  otlpEndpoint: "http://otel-collector:4317"
  sampling:
    sampler: "parentbased_traceidratio"
    samplerArg: "1.0"  # 100% for dev; use "0.1" (10%) in production

This injects --otlp-traces-endpoint and --collect-detailed-traces args into vLLM, and OTEL_* environment variables into both vLLM and routing-proxy containers.

GAIE / EPP (Inference Scheduler)

Add or uncomment the tracing: section under inferenceExtension: in your gaie-*/values.yaml:

inferenceExtension:
  tracing:
    enabled: true
    otelExporterEndpoint: "http://otel-collector:4317"
    sampling:
      sampler: "parentbased_traceidratio"
      samplerArg: "1.0"

Kustomize / Raw Manifests

For guides that use raw manifests (e.g., wide-ep-lws, recipes/vllm), add tracing flags to your vllm serve command and OTEL env vars to the container:

# Add to your vllm serve command:
#   --otlp-traces-endpoint http://otel-collector:4317
#   --collect-detailed-traces all

# Add to the container's env:
env:
- name: OTEL_SERVICE_NAME
  value: "vllm-decode"  # or "vllm-prefill"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
  value: "http://otel-collector:4317"
- name: OTEL_TRACES_SAMPLER
  value: "parentbased_traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
  value: "1.0"

OpenTelemetry Collector

The OTel Collector sits between llm-d components and Jaeger, providing:

Noise filtering: Drops trace spans generated by Prometheus /metrics scraping
Batching: Groups spans for efficient export
Multi-backend export: Can forward traces to multiple backends simultaneously

The collector is deployed by install-otel-collector-jaeger.sh and all chart defaults already point to it. See the collector config in otel-collector.yaml or otel-collector-operator.yaml.

Verifying Traces

Send an inference request through llm-d
Open the Jaeger UI (http://localhost:16686)
Select a service (e.g., vllm-decode, llm-d-inference-scheduler) and click Find Traces
You should see spans for inference requests, routing decisions, and KV cache operations

If you only see generic GET spans, check that:

collectDetailedTraces is set to "all" for vLLM
The EPP/inference-scheduler image includes tracing instrumentation (llm-d-inference-scheduler, not upstream epp)

Production Recommendations

Sampling: Set samplerArg to "0.1" (10%) or lower to reduce overhead
Collector: Use a collector to batch, filter, and route traces to a persistent backend
Backend: Use Jaeger with Elasticsearch/Cassandra storage, or Grafana Tempo for long-term retention
Service names: Customize via tracing.serviceNames in ModelService values to distinguish clusters/environments

Reference: Environment Variables

When tracing is enabled, these environment variables are set on vLLM and routing-proxy containers (automatically by the ModelService chart, or manually for raw manifests):

Variable	Description
`OTEL_SERVICE_NAME`	Service identifier (e.g., `vllm-decode`, `routing-proxy`)
`OTEL_EXPORTER_OTLP_ENDPOINT`	Collector endpoint (`http://otel-collector:4317`)
`OTEL_TRACES_SAMPLER`	Sampler type (e.g., `parentbased_traceidratio`)
`OTEL_TRACES_SAMPLER_ARG`	Sampling ratio (`1.0` = 100%, `0.1` = 10%)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed Tracing

Components

Quick Start: Deploy OTel Collector + Jaeger

Manual deployment

Enable Tracing

ModelService (vLLM + routing proxy)

GAIE / EPP (Inference Scheduler)

Kustomize / Raw Manifests

OpenTelemetry Collector

Verifying Traces

Production Recommendations

Reference: Environment Variables

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Distributed Tracing

Components

Quick Start: Deploy OTel Collector + Jaeger

Manual deployment

Enable Tracing

ModelService (vLLM + routing proxy)

GAIE / EPP (Inference Scheduler)

Kustomize / Raw Manifests

OpenTelemetry Collector

Verifying Traces

Production Recommendations

Reference: Environment Variables