Skip to content

otel-collector in CrashLoopBackOff #153

@avdhoot

Description

@avdhoot

Hi

I deployed helm with default values. otel-collector is in CrashLoopBackOff. PFA logs..
chart version: v1.0.0

Starting log rotation for: /etc/otel/supervisor-data/agent.log
Settings: Max size=16MB, Max archives=1, Check interval=60s
CUSTOM_OTELCOL_CONFIG_FILE not set, using default configuration
{"level":"info","ts":1764098479.7409554,"logger":"supervisor","caller":"supervisor/supervisor.go:319","msg":"Supervisor starting","id":"019abc76-95cd-7c02-8a86-2406b24ec2ca"}
{"level":"info","ts":1764098479.7415736,"logger":"supervisor","caller":"supervisor/supervisor.go:1096","msg":"No last received remote config found"}
{"level":"info","ts":1764098479.7458155,"logger":"supervisor","caller":"supervisor/supervisor.go:637","msg":"Connected to the server."}
{"level":"info","ts":1764098479.752698,"logger":"supervisor","caller":"supervisor/supervisor.go:637","msg":"Connected to the server."}
.
.
.
 ❯ kubectl describe pod clickstack-otel-collector-c5b54dc5f-vp2v2 -n clickstack
Name:             clickstack-otel-collector-c5b54dc5f-vp2v2
Namespace:        clickstack
Priority:         0
Service Account:  default
Node:             i-06b1e61bbf1bc7ad0/172.30.1.89
Start Time:       Tue, 25 Nov 2025 14:47:54 -0500
Labels:           app=otel-collector
                  app.kubernetes.io/instance=clickstack
                  app.kubernetes.io/name=clickstack
                  pod-template-hash=c5b54dc5f
Annotations:      <none>
Status:           Running
IP:               100.96.12.117
IPs:
  IP:           100.96.12.117
Controlled By:  ReplicaSet/clickstack-otel-collector-c5b54dc5f
Containers:
  otel-collector:
    Container ID:   containerd://c60361c904131c40cabf98bf0ffcc8a47192b5fbb2682b96757b85bb40f62143
    Image:          docker.hyperdx.io/hyperdx/hyperdx-otel-collector:2.7.1
    Image ID:       docker.hyperdx.io/hyperdx/hyperdx-otel-collector@sha256:0de6070d526f5fe91267f52243713d97c3582485f633ad9e9e6627fe4ffbbdf3
    Ports:          13133/TCP, 24225/TCP, 4317/TCP, 4318/TCP, 8888/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Tue, 25 Nov 2025 14:50:58 -0500
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Tue, 25 Nov 2025 14:49:24 -0500
      Finished:     Tue, 25 Nov 2025 14:50:54 -0500
    Ready:          False
    Restart Count:  2
    Liveness:       http-get http://:13133/ delay=10s timeout=5s period=30s #success=1 #failure=3
    Readiness:      http-get http://:13133/ delay=5s timeout=5s period=10s #success=1 #failure=3
    Environment:
      CLICKHOUSE_ENDPOINT:                        tcp://clickstack-clickhouse:9000?dial_timeout=10s
      CLICKHOUSE_SERVER_ENDPOINT:                 clickstack-clickhouse:9000
      CLICKHOUSE_PROMETHEUS_METRICS_ENDPOINT:     clickstack-clickhouse:9363
      OPAMP_SERVER_URL:                           http://clickstack-app:4320
      HYPERDX_LOG_LEVEL:                          DEBUG
      HYPERDX_OTEL_EXPORTER_CLICKHOUSE_DATABASE:  default
      HYPERDX_API_KEY:                            <set to the key 'api-key' in secret 'clickstack-app-secrets'>  Optional: false
      CLICKHOUSE_USER:                            XXXXX
      CLICKHOUSE_PASSWORD:                        XXXXX
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-69j2z (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  kube-api-access-69j2z:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    Optional:                false
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  3m55s                  default-scheduler  Successfully assigned clickstack/clickstack-otel-collector-c5b54dc5f-vp2v2 to i-06b1e61bbf1bc7ad0
  Normal   Pulled     2m25s (x2 over 3m55s)  kubelet            Container image "docker.hyperdx.io/hyperdx/hyperdx-otel-collector:2.7.1" already present on machine
  Normal   Created    2m25s (x2 over 3m55s)  kubelet            Created container: otel-collector
  Normal   Started    2m25s (x2 over 3m55s)  kubelet            Started container otel-collector
  Normal   Killing    2m25s                  kubelet            Container otel-collector failed liveness probe, will be restarted
  Warning  Unhealthy  115s (x14 over 3m45s)  kubelet            Readiness probe failed: Get "http://100.96.12.117:13133/": dial tcp 100.96.12.117:13133: connect: connection refused
  Warning  Unhealthy  115s (x4 over 3m25s)   kubelet            Liveness probe failed: Get "http://100.96.12.117:13133/": dial tcp 100.96.12.117:13133: connect: connection refused

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions