# Managing Container Images for CloudZero Agent ## Introduction The CloudZero Agent uses **5 different container images** to provide its functionality. Some organizations require private repositories for all images in their cluster for security, compliance, or network isolation reasons. These organizations will need to mirror all five images to their private registry and configure the CloudZero Agent to use those mirrored images. ## Image Version Requirements Each chart release uses specific versions of each image, and the chart has only been tested in that exact configuration. Using other versions is **not tested or supported** and may result in unexpected behavior or failures. - **CloudZero Agent image**: The version will be the same as the chart version (e.g., chart version 1.2.9 uses image tag 1.2.9) - **Other images**: There is no relation between the chart version and the third-party image versions **Important**: Version numbers will change with each chart release. Always check the current `values.yaml` file for the exact version numbers corresponding to your chart release. ## The Five CloudZero Agent Images ### 1. CloudZero Agent (Main Component) **Purpose**: The CloudZero Agent image that contains executables for CloudZero components. The CloudZero agent functionality itself is provided by Prometheus. **Default Repository**: `ghcr.io/cloudzero/cloudzero-agent/cloudzero-agent` **Configuration Path**: `components.agent.image` **What it contains**: - `cloudzero-agent-validator` - Deployment validation and status reporting - `cloudzero-collector` - Prometheus metrics collection - `cloudzero-webhook` - Kubernetes admission controller - `cloudzero-shipper` - S3 upload orchestration - `cloudzero-certifik8s` - Certificate management utilities As well as other miscellaneous tools. ### 2. Prometheus **Purpose**: The Prometheus server that scrapes metrics from various sources in your cluster. **Default Image**: `quay.io/prometheus/prometheus:v3.7.3` **Configuration Path**: `components.prometheus.image` **What it does**: - Scrapes metrics from Kubernetes API server, kubelet, and other sources - Stores metrics temporarily before they're processed by the CloudZero Agent - Provides the metrics collection infrastructure **Note**: Prometheus is not used when `components.agent.mode` is set to `"clustered"`. In clustered mode, Grafana Alloy is used instead. ### 3. Prometheus Config Reloader **Purpose**: A sidecar container that watches for Prometheus configuration changes and reloads Prometheus when needed. **Default Image**: `quay.io/prometheus-operator/prometheus-config-reloader:v0.87.0` **Configuration Path**: `components.prometheusReloader.image` **What it does**: - Monitors Prometheus configuration files for changes - Automatically reloads Prometheus when configuration is updated - Ensures Prometheus stays in sync with configuration changes **Note**: The Prometheus Config Reloader is only used when Prometheus is deployed (i.e., when `components.agent.mode` is not set to `"clustered"`). In clustered mode with Alloy, configuration reloading is handled differently. ### 4. Kube State Metrics **Purpose**: Exposes Kubernetes object state as Prometheus metrics. **Default Image**: `registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.17.0` **Configuration Path**: `kubeStateMetrics.image` **What it does**: - Converts Kubernetes object state into Prometheus metrics - Provides metrics about deployments, pods, services, and other Kubernetes resources - Essential for cost attribution and resource monitoring **Note**: Kube State Metrics uses a different configuration structure because it's implemented as a Helm subchart. This means we don't have the same level of control to integrate it seamlessly into our configuration system as we do with the other components. ### 5. Grafana Alloy **Purpose**: A modern observability data collector that can replace Prometheus in clustered mode deployments. Alloy provides better performance and native horizontal scalability compared to Prometheus. **Default Image**: `docker.io/grafana/alloy:v1.11.3` **Configuration Path**: `components.agent.clusteredNode.image` **What it does**: - Collects metrics from various sources in your cluster using the River configuration language - Provides native horizontal scalability through clustering support - Offers better performance characteristics than Prometheus for high-scale deployments - Forwards metrics to the CloudZero aggregator for processing **When it's used**: - Alloy is only used when `components.agent.mode` is set to `"clustered"` - In other deployment modes (agent, server, federated), Prometheus is used instead - The clustered mode is EXPERIMENTAL and provides an alternative to Prometheus for metrics collection ## Mirroring Images to Private Registry The process of mirroring images to your private registry will vary depending on your organization's infrastructure and policies. Common approaches include: - Using container registry mirroring tools - Automated CI/CD pipelines that pull and push images - Manual `docker pull` and `docker push` commands - Registry proxy configurations **Note**: The exact version numbers for each image can be found in the `values.yaml` file for your chart release. These versions will change with each chart release, so always reference the current values file. ## Image Pull Secrets If your private registry requires authentication, you'll need to configure image pull secrets. These can be configured globally for all components or individually per component: - **Global pull secrets**: Applied to all components via `defaults.image.pullSecrets` - **Component-specific pull secrets**: Override global settings for specific components - **Kube State Metrics pull secrets**: Configured separately via `kubeStateMetrics.pullSecrets` ## Configuration Example Here's a complete example of how to configure CloudZero Agent to use private registry images. **Note**: The version numbers shown are examples - always check your chart's `values.yaml` file for the correct versions. ```yaml # Basic CloudZero configuration apiKey: "your-cloudzero-api-key" clusterName: "your-cluster-name" # Component-specific image configurations components: agent: image: repository: example.com/cloudzero-agent/cloudzero-agent clusteredNode: image: repository: example.com/grafana/alloy prometheus: image: repository: example.com/prometheus/prometheus prometheusReloader: image: repository: example.com/prometheus-operator/prometheus-config-reloader # Kube State Metrics configuration (note: different structure due to subchart) kubeStateMetrics: image: registry: example.com repository: kube-state-metrics/kube-state-metrics ``` Additionally, if your private registry requires an image pull secret, you can create the secret in the same namespace as the CloudZero Agent is being installed to, then reference it in your overrides: ```yaml defaults: image: pullSecrets: - name: my-private-registry-secret kubeStateMetrics: pullSecrets: - name: my-private-registry-secret ``` ## Troubleshooting ### Common Issues #### 1. Image Pull Errors If you see "ImagePullBackOff" or "ErrImagePull" errors: ```bash # Check pod status kubectl get pods -n cloudzero-agent # Check pod events kubectl describe pod -n cloudzero-agent # Check if pull secrets are correct kubectl get secret my-registry-secret -o yaml ``` #### 2. Wrong Image Being Used Verify your values file is being applied correctly: ```bash # Check the actual values being used helm get values cloudzero-agent # Verify image repositories in running pods kubectl get pod -n cloudzero-agent -o jsonpath='{.spec.containers[*].image}' ``` #### 3. Authentication Issues If images fail to pull due to authentication: ```bash # Test registry access from a pod kubectl run test-pod --image=example.com/cloudzero-agent/cloudzero-agent:1.2.9 \ --rm -it --restart=Never --image-pull-policy=Always # Check if pull secret is correctly configured kubectl get secret my-registry-secret -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d ``` ### Verification Steps #### 1. Verify All Images Are Correct ```bash # Check all running containers and their images kubectl get pods -n cloudzero-agent -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].image}{"\n"}{end}' ``` #### 2. Verify Pull Secrets Are Applied ```bash # Check if pull secrets are present in pod specs kubectl get pod -n cloudzero-agent -o yaml | grep -A 5 imagePullSecrets ``` #### 3. Test Registry Connectivity ```bash # Test from within the cluster kubectl run registry-test --image=example.com/cloudzero-agent/cloudzero-agent:1.2.9 \ --rm -it --restart=Never --image-pull-policy=Always ```