Skip to content

Managing Images

Evan Nemerson edited this page Nov 26, 2025 · 2 revisions

Managing Container Images for CloudZero Agent

Introduction

The CloudZero Agent uses 5 different container images to provide its functionality. Some organizations require private repositories for all images in their cluster for security, compliance, or network isolation reasons. These organizations will need to mirror all five images to their private registry and configure the CloudZero Agent to use those mirrored images.

Image Version Requirements

Each chart release uses specific versions of each image, and the chart has only been tested in that exact configuration. Using other versions is not tested or supported and may result in unexpected behavior or failures.

  • CloudZero Agent image: The version will be the same as the chart version (e.g., chart version 1.2.9 uses image tag 1.2.9)
  • Other images: There is no relation between the chart version and the third-party image versions

Important: Version numbers will change with each chart release. Always check the current values.yaml file for the exact version numbers corresponding to your chart release.

The Five CloudZero Agent Images

1. CloudZero Agent (Main Component)

Purpose: The CloudZero Agent image that contains executables for CloudZero components. The CloudZero agent functionality itself is provided by Prometheus.

Default Repository: ghcr.io/cloudzero/cloudzero-agent/cloudzero-agent

Configuration Path: components.agent.image

What it contains:

  • cloudzero-agent-validator - Deployment validation and status reporting
  • cloudzero-collector - Prometheus metrics collection
  • cloudzero-webhook - Kubernetes admission controller
  • cloudzero-shipper - S3 upload orchestration
  • cloudzero-certifik8s - Certificate management utilities

As well as other miscellaneous tools.

2. Prometheus

Purpose: The Prometheus server that scrapes metrics from various sources in your cluster.

Default Image: quay.io/prometheus/prometheus:v3.7.3

Configuration Path: components.prometheus.image

What it does:

  • Scrapes metrics from Kubernetes API server, kubelet, and other sources
  • Stores metrics temporarily before they're processed by the CloudZero Agent
  • Provides the metrics collection infrastructure

Note: Prometheus is not used when components.agent.mode is set to "clustered". In clustered mode, Grafana Alloy is used instead.

3. Prometheus Config Reloader

Purpose: A sidecar container that watches for Prometheus configuration changes and reloads Prometheus when needed.

Default Image: quay.io/prometheus-operator/prometheus-config-reloader:v0.87.0

Configuration Path: components.prometheusReloader.image

What it does:

  • Monitors Prometheus configuration files for changes
  • Automatically reloads Prometheus when configuration is updated
  • Ensures Prometheus stays in sync with configuration changes

Note: The Prometheus Config Reloader is only used when Prometheus is deployed (i.e., when components.agent.mode is not set to "clustered"). In clustered mode with Alloy, configuration reloading is handled differently.

4. Kube State Metrics

Purpose: Exposes Kubernetes object state as Prometheus metrics.

Default Image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.17.0

Configuration Path: kubeStateMetrics.image

What it does:

  • Converts Kubernetes object state into Prometheus metrics
  • Provides metrics about deployments, pods, services, and other Kubernetes resources
  • Essential for cost attribution and resource monitoring

Note: Kube State Metrics uses a different configuration structure because it's implemented as a Helm subchart. This means we don't have the same level of control to integrate it seamlessly into our configuration system as we do with the other components.

5. Grafana Alloy

Purpose: A modern observability data collector that can replace Prometheus in clustered mode deployments. Alloy provides better performance and native horizontal scalability compared to Prometheus.

Default Image: docker.io/grafana/alloy:v1.11.3

Configuration Path: components.agent.clusteredNode.image

What it does:

  • Collects metrics from various sources in your cluster using the River configuration language
  • Provides native horizontal scalability through clustering support
  • Offers better performance characteristics than Prometheus for high-scale deployments
  • Forwards metrics to the CloudZero aggregator for processing

When it's used:

  • Alloy is only used when components.agent.mode is set to "clustered"
  • In other deployment modes (agent, server, federated), Prometheus is used instead
  • The clustered mode is EXPERIMENTAL and provides an alternative to Prometheus for metrics collection

Mirroring Images to Private Registry

The process of mirroring images to your private registry will vary depending on your organization's infrastructure and policies. Common approaches include:

  • Using container registry mirroring tools
  • Automated CI/CD pipelines that pull and push images
  • Manual docker pull and docker push commands
  • Registry proxy configurations

Note: The exact version numbers for each image can be found in the values.yaml file for your chart release. These versions will change with each chart release, so always reference the current values file.

Image Pull Secrets

If your private registry requires authentication, you'll need to configure image pull secrets. These can be configured globally for all components or individually per component:

  • Global pull secrets: Applied to all components via defaults.image.pullSecrets
  • Component-specific pull secrets: Override global settings for specific components
  • Kube State Metrics pull secrets: Configured separately via kubeStateMetrics.pullSecrets

Configuration Example

Here's a complete example of how to configure CloudZero Agent to use private registry images. Note: The version numbers shown are examples - always check your chart's values.yaml file for the correct versions.

# Basic CloudZero configuration
apiKey: "your-cloudzero-api-key"
clusterName: "your-cluster-name"

# Component-specific image configurations
components:
  agent:
    image:
      repository: example.com/cloudzero-agent/cloudzero-agent
    clusteredNode:
      image:
        repository: example.com/grafana/alloy
  prometheus:
    image:
      repository: example.com/prometheus/prometheus
  prometheusReloader:
    image:
      repository: example.com/prometheus-operator/prometheus-config-reloader

# Kube State Metrics configuration (note: different structure due to subchart)
kubeStateMetrics:
  image:
    registry: example.com
    repository: kube-state-metrics/kube-state-metrics

Additionally, if your private registry requires an image pull secret, you can create the secret in the same namespace as the CloudZero Agent is being installed to, then reference it in your overrides:

defaults:
  image:
    pullSecrets:
      - name: my-private-registry-secret
kubeStateMetrics:
  pullSecrets:
    - name: my-private-registry-secret

Troubleshooting

Common Issues

1. Image Pull Errors

If you see "ImagePullBackOff" or "ErrImagePull" errors:

# Check pod status
kubectl get pods -n cloudzero

# Check pod events
kubectl describe pod <pod-name> -n cloudzero

# Check if pull secrets are correct
kubectl get secret my-registry-secret -o yaml

2. Wrong Image Being Used

Verify your values file is being applied correctly:

# Check the actual values being used
helm get values cloudzero-agent

# Verify image repositories in running pods
kubectl get pod <pod-name> -n cloudzero -o jsonpath='{.spec.containers[*].image}'

3. Authentication Issues

If images fail to pull due to authentication:

# Test registry access from a pod
kubectl run test-pod --image=example.com/cloudzero-agent/cloudzero-agent:1.2.9 \
  --rm -it --restart=Never --image-pull-policy=Always

# Check if pull secret is correctly configured
kubectl get secret my-registry-secret -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d

Verification Steps

1. Verify All Images Are Correct

# Check all running containers and their images
kubectl get pods -n cloudzero -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].image}{"\n"}{end}'

2. Verify Pull Secrets Are Applied

# Check if pull secrets are present in pod specs
kubectl get pod <pod-name> -n cloudzero -o yaml | grep -A 5 imagePullSecrets

3. Test Registry Connectivity

# Test from within the cluster
kubectl run registry-test --image=example.com/cloudzero-agent/cloudzero-agent:1.2.9 \
  --rm -it --restart=Never --image-pull-policy=Always

Clone this wiki locally