This guide shows how to enable OpenTelemetry distributed tracing across llm-d components.
| Component | Chart / Config | What gets traced |
|---|---|---|
| vLLM (prefill + decode) | ModelService tracing: |
Inference engine spans |
| Routing proxy (P/D sidecar) | ModelService tracing: |
KV transfer coordination |
| EPP / Inference Scheduler | GAIE inferenceExtension.tracing: |
Request routing, endpoint scoring, KV-cache indexing |
All components export traces via OTLP gRPC to an OpenTelemetry Collector, which filters out noise (e.g., /metrics scraping spans), batches traces, and forwards them to a backend (Jaeger, Tempo, etc.).
The install script deploys both an OTel Collector and Jaeger all-in-one into a namespace of your choice. Deploy them into the same namespace as your llm-d workload so components can reach the collector at http://otel-collector:4317.
# Deploy OTel Collector + Jaeger into your llm-d namespace
../scripts/install-otel-collector-jaeger.sh -n <your-namespace>
# Uninstall
../scripts/install-otel-collector-jaeger.sh -u -n <your-namespace>If the OpenTelemetry Operator is installed, the script deploys the collector as an OpenTelemetryCollector CR (otel-collector-operator.yaml). Otherwise it uses a standalone Deployment (otel-collector.yaml).
Access the Jaeger UI:
kubectl port-forward -n <your-namespace> svc/jaeger-collector 16686:16686
# Open http://localhost:16686Note: Jaeger all-in-one is an in-memory deployment for development and testing.
If you prefer to apply manifests directly:
kubectl apply -n <your-namespace> -f jaeger-all-in-one.yaml -f otel-collector.yaml # standalone
# or, if the OTel Operator is installed:
kubectl apply -n <your-namespace> -f otel-collector-operator.yaml # operator CRBy default, all chart values point to http://otel-collector:4317 (same namespace). When tracing is enabled and the OTel Collector is deployed alongside your workload, traces flow automatically.
Uncomment the tracing: section in your ms-*/values.yaml:
tracing:
enabled: true
otlpEndpoint: "http://otel-collector:4317"
sampling:
sampler: "parentbased_traceidratio"
samplerArg: "1.0" # 100% for dev; use "0.1" (10%) in productionThis injects --otlp-traces-endpoint and --collect-detailed-traces args into vLLM, and OTEL_* environment variables into both vLLM and routing-proxy containers.
Add or uncomment the tracing: section under inferenceExtension: in your gaie-*/values.yaml:
inferenceExtension:
tracing:
enabled: true
otelExporterEndpoint: "http://otel-collector:4317"
sampling:
sampler: "parentbased_traceidratio"
samplerArg: "1.0"For guides that use raw manifests (e.g., wide-ep-lws, recipes/vllm), add tracing flags to your vllm serve command and OTEL env vars to the container:
# Add to your vllm serve command:
# --otlp-traces-endpoint http://otel-collector:4317
# --collect-detailed-traces all
# Add to the container's env:
env:
- name: OTEL_SERVICE_NAME
value: "vllm-decode" # or "vllm-prefill"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector:4317"
- name: OTEL_TRACES_SAMPLER
value: "parentbased_traceidratio"
- name: OTEL_TRACES_SAMPLER_ARG
value: "1.0"The OTel Collector sits between llm-d components and Jaeger, providing:
- Noise filtering: Drops trace spans generated by Prometheus
/metricsscraping - Batching: Groups spans for efficient export
- Multi-backend export: Can forward traces to multiple backends simultaneously
The collector is deployed by install-otel-collector-jaeger.sh and all chart defaults already point to it. See the collector config in otel-collector.yaml or otel-collector-operator.yaml.
- Send an inference request through llm-d
- Open the Jaeger UI (
http://localhost:16686) - Select a service (e.g.,
vllm-decode,llm-d-inference-scheduler) and click Find Traces - You should see spans for inference requests, routing decisions, and KV cache operations
If you only see generic GET spans, check that:
collectDetailedTracesis set to"all"for vLLM- The EPP/inference-scheduler image includes tracing instrumentation (
llm-d-inference-scheduler, not upstreamepp)
- Sampling: Set
samplerArgto"0.1"(10%) or lower to reduce overhead - Collector: Use a collector to batch, filter, and route traces to a persistent backend
- Backend: Use Jaeger with Elasticsearch/Cassandra storage, or Grafana Tempo for long-term retention
- Service names: Customize via
tracing.serviceNamesin ModelService values to distinguish clusters/environments
When tracing is enabled, these environment variables are set on vLLM and routing-proxy containers (automatically by the ModelService chart, or manually for raw manifests):
| Variable | Description |
|---|---|
OTEL_SERVICE_NAME |
Service identifier (e.g., vllm-decode, routing-proxy) |
OTEL_EXPORTER_OTLP_ENDPOINT |
Collector endpoint (http://otel-collector:4317) |
OTEL_TRACES_SAMPLER |
Sampler type (e.g., parentbased_traceidratio) |
OTEL_TRACES_SAMPLER_ARG |
Sampling ratio (1.0 = 100%, 0.1 = 10%) |