Skip to content

Edge telemetry: vector-only (drop otelcol, Loki, Tempo)#20

Merged
samcm merged 4 commits into
masterfrom
vector-edge-cutover
Jun 4, 2026
Merged

Edge telemetry: vector-only (drop otelcol, Loki, Tempo)#20
samcm merged 4 commits into
masterfrom
vector-edge-cutover

Conversation

@samcm

@samcm samcm commented Jun 4, 2026

Copy link
Copy Markdown
Member

Consolidates edge telemetry onto Vector, removing otelcol-contrib, Loki, and Tempo.

Why: otelcol's filelog can't enrich Docker logs with per-container metadata, and the docker_observer+receiver_creator workaround is port-oriented — it duplicates multi-port containers, misses port-less ones (e.g. validator), and races on metadata (~27% of logs landed with no container.name). This is a known dead end (contrib #44555, closed stale; community consensus is "use a sidecar like Vector"). Vector's docker_logs reads the Docker API directly, so metadata is inline and clean for every container.

Changes:

  • logs: docker_logs → OTLP /v1/logs, full metadata (container.name/container.image.name + network/testnet/instance/ethereum_cl/ethereum_el/forwarder), reduce-batched (~500 records/envelope).
  • traces: opentelemetry source (use_otlp_decoding: true) on 4317/4318 → OTLP /v1/traces; client batching preserved (1 request in = 1 out). The trace-batching issue from the earlier Vector attempt was a pre-use_otlp_decoding artifact and is resolved.
  • otelcol-contrib removed (otelcol_contrib_cleanup=true); clients re-pointed otelcolvector; Vector 0.55.0 → 0.56.0; Loki + Tempo dropped.

Verified live on lighthouse-geth-super-1: logs 0% empty-metadata, 0 dups, every container covered (incl. port-less validator); traces flowing to platform batched; otelcol gone; vector healthchecks pass.

Rollout note: applying fleet-wide re-points + recreates the client containers (telemetry endpoint change) — i.e. it rolls the clients.

Consolidates edge telemetry onto Vector and removes otelcol-contrib:
- vector docker_logs -> OTLP /v1/logs with clean per-container metadata
  (container.name/image + network/testnet/instance/ethereum_cl/ethereum_el/
  forwarder), reduce-batched (~500 records/envelope). docker_logs reads the
  Docker API, so every container is covered with no duplication, no missed
  port-less containers, and no empty-metadata race (the problems otelcol's
  filelog + docker_observer hit).
- vector opentelemetry source (use_otlp_decoding) on :4317/:4318 -> OTLP
  /v1/traces; the client's OTLP batching is preserved (1 request in = 1 out).
- otelcol_contrib_cleanup=true removes the otelcol container.
- clients re-pointed: --telemetry-collector-url / --rpc.telemetry.endpoint
  otelcol -> vector.
- vector 0.55.0 -> 0.56.0.
- Loki and Tempo sinks dropped (no longer needed).

Verified live on lighthouse-geth-super-1: logs 0% empty-metadata, 0 dups, all
containers incl. validator; traces flowing to platform batched.
samcm added 3 commits June 4, 2026 12:46
The shaper extracts the source log level (logfmt level=xxx, else a level token
near the start of the line) and sets SeverityNumber on the OTel scale
(TRACE=1, DEBUG=5, INFO=9, WARN=13, ERROR=17, CRIT=18, FATAL=21). SeverityText
keeps the source's exact text; only the number is normalised, and unrecognised
lines are left unset rather than guessed (previously every line was hardcoded
INFO).
JSON loggers (e.g. xatu-sentry: {"level":"info",...}) weren't caught by the
logfmt/text level matchers, so they shipped with severity unset. Parse the line
as JSON when it starts with "{" and read .level / .severity before falling back
to the text matchers. The raw line is still the log body; only severity is
derived.
…ol config

Ingress identity (OTLP auth user + ingress_user tag) now derives from
ethereum_network_name instead of secret_loki.username, so log/trace
attribution stays correct when the sops username isn't bumped between
devnet iterations. Remove the dead otelcol_contrib_config block (the
container is already removed via otelcol_contrib_cleanup); keep only the
Vector-based config.
@samcm samcm merged commit 82d300f into master Jun 4, 2026
1 check passed
@samcm samcm deleted the vector-edge-cutover branch June 4, 2026 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant