Context
In production mode, the OTel Collector architecture evolves from a simple DaemonSet to a DaemonSet + Gateway pattern. The DaemonSet collects node-level telemetry and forwards it to the Gateway, which handles centralized processing, enrichment, and routing.
Architecture
Nodes:
Pod → OTLP → OTel DaemonSet ─┐
Pod → OTLP → OTel DaemonSet ─┼──→ OTel Gateway (2 replicas) ──┬──→ Prometheus
Pod → OTLP → OTel DaemonSet ─┘ (batch, filter, route) ├──→ Loki
└──→ Tempo (future)
Requirements
Gateway deployment
- 2 replicas with pod anti-affinity
- Receives OTLP from DaemonSets (internal) and from apps (external via Service)
- Processes: batch, memory_limiter, k8sattributes, filter
- Routes to Prometheus (remote write), Loki (push), Tempo (future)
DaemonSet changes (vs startup)
- Instead of exporting directly to backends, export to the Gateway
- Reduces load on backends (Gateway handles batching/buffering)
Gateway config additions vs startup
processors:
filter:
logs:
exclude:
match_type: regexp
bodies:
- "health check"
- "readiness probe"
transform:
log_statements:
- context: log
statements:
- set(severity_text, "INFO") where severity_text == ""
Observability of the Collector itself
- Export Collector internal metrics to Prometheus
- Dashboard for Collector health (queue size, dropped spans, export errors)
Acceptance criteria
Context
In production mode, the OTel Collector architecture evolves from a simple DaemonSet to a DaemonSet + Gateway pattern. The DaemonSet collects node-level telemetry and forwards it to the Gateway, which handles centralized processing, enrichment, and routing.
Architecture
Requirements
Gateway deployment
DaemonSet changes (vs startup)
Gateway config additions vs startup
Observability of the Collector itself
Acceptance criteria