From ce24415da860895f9a67a9943b201aa126c759fe Mon Sep 17 00:00:00 2001 From: Rene Dekker Date: Thu, 9 Apr 2026 10:48:33 -0700 Subject: [PATCH 1/5] Add operator metrics page documenting TLS, component status, and license metrics Co-Authored-By: Claude Opus 4.6 (1M context) --- .../monitor/metrics/operator-metrics.mdx | 125 ++++++++++++++++++ sidebars-calico-enterprise.js | 1 + 2 files changed, 126 insertions(+) create mode 100644 calico-enterprise/operations/monitor/metrics/operator-metrics.mdx diff --git a/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx b/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx new file mode 100644 index 0000000000..3b02af8b9e --- /dev/null +++ b/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx @@ -0,0 +1,125 @@ +--- +description: Monitor the Calico Enterprise operator with Prometheus metrics for component health, TLS certificate expiry, and license status. +--- + +# Operator metrics + +## Big picture + +Use Prometheus to monitor $[prodname] operator health, TLS certificate expiry, and license status. + +## Value + +The $[prodname] operator exposes Prometheus metrics that give you visibility into the overall health of your $[prodname] installation. These metrics let you set up alerts for degraded components, expiring TLS certificates, and license issues before they impact operations. + +## Before you begin + +Operator metrics require the following environment variables on the `tigera-operator` deployment: + +- `METRICS_ENABLED=true` +- `METRICS_SCHEME=https` + +Metrics are served on port **9484** over HTTPS and scraped by the built-in Prometheus instance via the `tigera-operator-metrics` ServiceMonitor. + +## Metrics reference + +### Component status + +The `tigera_operator_component_status` metric reports the status of each $[prodname] component as managed by the operator. This mirrors the information available through `kubectl get tigerastatus`. + +| Metric | Labels | Description | +|---|---|---| +| `tigera_operator_component_status` | `component`, `condition` | Status of a component. Value is `1` (true) or `0` (false). | + +**Labels:** + +- `component` — The $[prodname] component (e.g. `calico`, `apiserver`, `monitor`, `log-storage`, `manager`) +- `condition` — One of: + - `available` — The component is running and healthy + - `degraded` — The component is in an error state + - `progressing` — The component is being updated or is starting up + +**Example queries:** + +- Find all degraded components: + + ``` + tigera_operator_component_status{condition="degraded"} == 1 + ``` + +- Check if a specific component is available: + + ``` + tigera_operator_component_status{component="calico", condition="available"} + ``` + +### TLS certificate expiry + +The `tigera_operator_tls_certificate_expiry_timestamp_seconds` metric reports the expiry time of each TLS certificate managed by the operator. + +| Metric | Labels | Description | +|---|---|---| +| `tigera_operator_tls_certificate_expiry_timestamp_seconds` | `name`, `namespace`, `issuer` | Unix timestamp when the certificate expires. | + +**Labels:** + +- `name` — The Secret name containing the certificate +- `namespace` — The namespace of the Secret +- `issuer` — The certificate issuer (e.g. `tigera-operator-signer`) + +**Example queries:** + +- Certificates expiring within 30 days: + + ``` + tigera_operator_tls_certificate_expiry_timestamp_seconds - time() < 30 * 24 * 3600 + ``` + +- Certificates expiring within 7 days: + + ``` + tigera_operator_tls_certificate_expiry_timestamp_seconds - time() < 7 * 24 * 3600 + ``` + +### License + +| Metric | Labels | Description | +|---|---|---| +| `tigera_operator_license_valid` | — | Whether the license is valid (`1`) or not (`0`). | +| `tigera_operator_license_expiry_timestamp_seconds` | — | Unix timestamp when the license expires. | + +**Example queries:** + +- License expires within 30 days: + + ``` + tigera_operator_license_expiry_timestamp_seconds - time() < 30 * 24 * 3600 + ``` + +- License is invalid: + + ``` + tigera_operator_license_valid == 0 + ``` + +## Built-in alerts + +$[prodname] installs PrometheusRule resources with alerting rules that use these metrics. The built-in rules include: + +| Alert | Condition | Severity | +|---|---|---| +| `TLSCertExpiringWarning` | Certificate expires in < 29 days | warning | +| `TLSCertExpiringCritical` | Certificate expires in < 7 days | critical | +| `LicenseExpiringWarning` | License expires in < 30 days | warning | +| `LicenseExpiringCritical` | License expires in < 7 days or is invalid | critical | +| `DeniedPacketsRate` | Denied packets rate > 50/s | info | +| `ComponentDegradedWarning` | Component degraded for > 10s | warning | +| `ComponentDegradedCritical` | Component degraded for > 1m | critical | + +To route these alerts, see [Configure Alertmanager](../prometheus/alertmanager.mdx). + +## Additional resources + +- [License expiration and renewal](../../license-options.mdx) +- [Configure Prometheus](../prometheus/configure-prometheus.mdx) +- [BYO Prometheus](../prometheus/byo-prometheus.mdx) diff --git a/sidebars-calico-enterprise.js b/sidebars-calico-enterprise.js index 3ddc5a675a..187af41307 100644 --- a/sidebars-calico-enterprise.js +++ b/sidebars-calico-enterprise.js @@ -588,6 +588,7 @@ module.exports = { label: 'Metrics', link: { type: 'doc', id: 'operations/monitor/metrics/index' }, items: [ + 'operations/monitor/metrics/operator-metrics', 'operations/monitor/metrics/recommended-metrics', 'operations/monitor/metrics/bgp-metrics', 'operations/monitor/metrics/policy-metrics', From 8b86046a84b2321ac81834c60c9dc6532dd90ff7 Mon Sep 17 00:00:00 2001 From: Rene Dekker Date: Thu, 9 Apr 2026 10:50:18 -0700 Subject: [PATCH 2/5] Rename calico-prometheus-dp-rate to calico-prometheus-rules in master docs Co-Authored-By: Claude Opus 4.6 (1M context) --- .../operations/monitor/prometheus/configure-prometheus.mdx | 4 ++-- calico-enterprise/operations/license-options.mdx | 2 +- .../operations/monitor/prometheus/configure-prometheus.mdx | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/calico-cloud/operations/monitor/prometheus/configure-prometheus.mdx b/calico-cloud/operations/monitor/prometheus/configure-prometheus.mdx index de68da887c..14e95c9e76 100644 --- a/calico-cloud/operations/monitor/prometheus/configure-prometheus.mdx +++ b/calico-cloud/operations/monitor/prometheus/configure-prometheus.mdx @@ -31,7 +31,7 @@ As an example, the range query in this Manifest is 10 seconds. apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: - name: calico-prometheus-dp-rate + name: calico-prometheus-rules namespace: tigera-prometheus labels: role: tigera-prometheus-rules @@ -56,7 +56,7 @@ To update this alerting rule, to say, execute the query with a range of apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: - name: calico-prometheus-dp-rate + name: calico-prometheus-rules namespace: tigera-prometheus labels: role: tigera-prometheus-rules diff --git a/calico-enterprise/operations/license-options.mdx b/calico-enterprise/operations/license-options.mdx index 8482c1a2d5..efa7be896d 100644 --- a/calico-enterprise/operations/license-options.mdx +++ b/calico-enterprise/operations/license-options.mdx @@ -30,7 +30,7 @@ These metrics are scraped by the built-in Prometheus instance via the `tigera-op $[prodname] installs PrometheusRule resources with alerting rules for license expiration. You can view them with: ```bash -kubectl -n tigera-prometheus get prometheusrule calico-prometheus-dp-rate -o yaml +kubectl -n tigera-prometheus get prometheusrule calico-prometheus-rules -o yaml ``` The built-in rules include: diff --git a/calico-enterprise/operations/monitor/prometheus/configure-prometheus.mdx b/calico-enterprise/operations/monitor/prometheus/configure-prometheus.mdx index 4cf8aadd5c..233b2188f2 100644 --- a/calico-enterprise/operations/monitor/prometheus/configure-prometheus.mdx +++ b/calico-enterprise/operations/monitor/prometheus/configure-prometheus.mdx @@ -31,7 +31,7 @@ As an example, the range query in this Manifest is 10 seconds. apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: - name: calico-prometheus-dp-rate + name: calico-prometheus-rules namespace: tigera-prometheus labels: role: tigera-prometheus-rules @@ -56,7 +56,7 @@ To update this alerting rule, to say, execute the query with a range of apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: - name: calico-prometheus-dp-rate + name: calico-prometheus-rules namespace: tigera-prometheus labels: role: tigera-prometheus-rules From 3aab75ef66d3e5ea96b3c264153e500e59204e68 Mon Sep 17 00:00:00 2001 From: Rene Dekker Date: Thu, 9 Apr 2026 10:54:25 -0700 Subject: [PATCH 3/5] Align PrometheusRule name to 'calico' and update alerts per operator#4663 - Rename calico-prometheus-rules to calico (matching operator constant) - Add ComponentProgressingWarning/Critical alerts - Update ComponentDegraded for durations to 15m/30m - Update TLSCertExpiringWarning threshold note (30d with 8h buffer) Co-Authored-By: Claude Opus 4.6 (1M context) --- .../prometheus/configure-prometheus.mdx | 4 ++-- .../operations/license-options.mdx | 2 +- .../monitor/metrics/operator-metrics.mdx | 18 +++++++++++++----- .../prometheus/configure-prometheus.mdx | 4 ++-- 4 files changed, 18 insertions(+), 10 deletions(-) diff --git a/calico-cloud/operations/monitor/prometheus/configure-prometheus.mdx b/calico-cloud/operations/monitor/prometheus/configure-prometheus.mdx index 14e95c9e76..29c4d7a155 100644 --- a/calico-cloud/operations/monitor/prometheus/configure-prometheus.mdx +++ b/calico-cloud/operations/monitor/prometheus/configure-prometheus.mdx @@ -31,7 +31,7 @@ As an example, the range query in this Manifest is 10 seconds. apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: - name: calico-prometheus-rules + name: calico namespace: tigera-prometheus labels: role: tigera-prometheus-rules @@ -56,7 +56,7 @@ To update this alerting rule, to say, execute the query with a range of apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: - name: calico-prometheus-rules + name: calico namespace: tigera-prometheus labels: role: tigera-prometheus-rules diff --git a/calico-enterprise/operations/license-options.mdx b/calico-enterprise/operations/license-options.mdx index efa7be896d..ec2a638846 100644 --- a/calico-enterprise/operations/license-options.mdx +++ b/calico-enterprise/operations/license-options.mdx @@ -30,7 +30,7 @@ These metrics are scraped by the built-in Prometheus instance via the `tigera-op $[prodname] installs PrometheusRule resources with alerting rules for license expiration. You can view them with: ```bash -kubectl -n tigera-prometheus get prometheusrule calico-prometheus-rules -o yaml +kubectl -n tigera-prometheus get prometheusrule calico -o yaml ``` The built-in rules include: diff --git a/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx b/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx index 3b02af8b9e..0a8be9d855 100644 --- a/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx +++ b/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx @@ -104,17 +104,25 @@ The `tigera_operator_tls_certificate_expiry_timestamp_seconds` metric reports th ## Built-in alerts -$[prodname] installs PrometheusRule resources with alerting rules that use these metrics. The built-in rules include: +$[prodname] installs a PrometheusRule resource named `calico` with alerting rules that use these metrics. You can view it with: + +```bash +kubectl -n tigera-prometheus get prometheusrule calico -o yaml +``` + +The built-in rules include: | Alert | Condition | Severity | |---|---|---| -| `TLSCertExpiringWarning` | Certificate expires in < 29 days | warning | +| `DeniedPacketsRate` | Denied packets rate > 50/s | info | +| `TLSCertExpiringWarning` | Certificate expires in < 30 days | warning | | `TLSCertExpiringCritical` | Certificate expires in < 7 days | critical | | `LicenseExpiringWarning` | License expires in < 30 days | warning | | `LicenseExpiringCritical` | License expires in < 7 days or is invalid | critical | -| `DeniedPacketsRate` | Denied packets rate > 50/s | info | -| `ComponentDegradedWarning` | Component degraded for > 10s | warning | -| `ComponentDegradedCritical` | Component degraded for > 1m | critical | +| `ComponentDegradedWarning` | Component degraded for > 15m | warning | +| `ComponentDegradedCritical` | Component degraded for > 30m | critical | +| `ComponentProgressingWarning` | Component progressing for > 15m | warning | +| `ComponentProgressingCritical` | Component progressing for > 30m | critical | To route these alerts, see [Configure Alertmanager](../prometheus/alertmanager.mdx). diff --git a/calico-enterprise/operations/monitor/prometheus/configure-prometheus.mdx b/calico-enterprise/operations/monitor/prometheus/configure-prometheus.mdx index 233b2188f2..8f8c09a65f 100644 --- a/calico-enterprise/operations/monitor/prometheus/configure-prometheus.mdx +++ b/calico-enterprise/operations/monitor/prometheus/configure-prometheus.mdx @@ -31,7 +31,7 @@ As an example, the range query in this Manifest is 10 seconds. apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: - name: calico-prometheus-rules + name: calico namespace: tigera-prometheus labels: role: tigera-prometheus-rules @@ -56,7 +56,7 @@ To update this alerting rule, to say, execute the query with a range of apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: - name: calico-prometheus-rules + name: calico namespace: tigera-prometheus labels: role: tigera-prometheus-rules From ddb6c840d9273e3402ca086c24b84f2acee64b70 Mon Sep 17 00:00:00 2001 From: Rene Dekker Date: Thu, 9 Apr 2026 10:59:40 -0700 Subject: [PATCH 4/5] Broaden operator metrics big picture description Co-Authored-By: Claude Opus 4.6 (1M context) --- .../operations/monitor/metrics/operator-metrics.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx b/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx index 0a8be9d855..547e982bef 100644 --- a/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx +++ b/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx @@ -6,7 +6,7 @@ description: Monitor the Calico Enterprise operator with Prometheus metrics for ## Big picture -Use Prometheus to monitor $[prodname] operator health, TLS certificate expiry, and license status. +Use Prometheus to monitor the $[prodname] operator. ## Value From c4f61f614a32119f6788c087b5720d95c7115126 Mon Sep 17 00:00:00 2001 From: Rene Dekker Date: Thu, 9 Apr 2026 11:00:45 -0700 Subject: [PATCH 5/5] Clarify operator metrics are enabled by default Co-Authored-By: Claude Opus 4.6 (1M context) --- .../operations/monitor/metrics/operator-metrics.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx b/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx index 547e982bef..645a2952b7 100644 --- a/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx +++ b/calico-enterprise/operations/monitor/metrics/operator-metrics.mdx @@ -14,7 +14,7 @@ The $[prodname] operator exposes Prometheus metrics that give you visibility int ## Before you begin -Operator metrics require the following environment variables on the `tigera-operator` deployment: +Operator metrics are enabled by default. The `tigera-operator` deployment ships with the following environment variables already set: - `METRICS_ENABLED=true` - `METRICS_SCHEME=https`