Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ As an example, the range query in this Manifest is 10 seconds.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: calico-prometheus-dp-rate
name: calico
namespace: tigera-prometheus
labels:
role: tigera-prometheus-rules
Expand All @@ -56,7 +56,7 @@ To update this alerting rule, to say, execute the query with a range of
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: calico-prometheus-dp-rate
name: calico
namespace: tigera-prometheus
labels:
role: tigera-prometheus-rules
Expand Down
2 changes: 1 addition & 1 deletion calico-enterprise/operations/license-options.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ These metrics are scraped by the built-in Prometheus instance via the `tigera-op
$[prodname] installs PrometheusRule resources with alerting rules for license expiration. You can view them with:

```bash
kubectl -n tigera-prometheus get prometheusrule calico-prometheus-dp-rate -o yaml
kubectl -n tigera-prometheus get prometheusrule calico -o yaml
```

The built-in rules include:
Expand Down
133 changes: 133 additions & 0 deletions calico-enterprise/operations/monitor/metrics/operator-metrics.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
description: Monitor the Calico Enterprise operator with Prometheus metrics for component health, TLS certificate expiry, and license status.
---

# Operator metrics

## Big picture

Use Prometheus to monitor the $[prodname] operator.

## Value

The $[prodname] operator exposes Prometheus metrics that give you visibility into the overall health of your $[prodname] installation. These metrics let you set up alerts for degraded components, expiring TLS certificates, and license issues before they impact operations.

## Before you begin

Operator metrics are enabled by default. The `tigera-operator` deployment ships with the following environment variables already set:

- `METRICS_ENABLED=true`
- `METRICS_SCHEME=https`

Metrics are served on port **9484** over HTTPS and scraped by the built-in Prometheus instance via the `tigera-operator-metrics` ServiceMonitor.

## Metrics reference

### Component status

The `tigera_operator_component_status` metric reports the status of each $[prodname] component as managed by the operator. This mirrors the information available through `kubectl get tigerastatus`.

| Metric | Labels | Description |
|---|---|---|
| `tigera_operator_component_status` | `component`, `condition` | Status of a component. Value is `1` (true) or `0` (false). |

**Labels:**

- `component` — The $[prodname] component (e.g. `calico`, `apiserver`, `monitor`, `log-storage`, `manager`)
- `condition` — One of:
- `available` — The component is running and healthy
- `degraded` — The component is in an error state
- `progressing` — The component is being updated or is starting up

**Example queries:**

- Find all degraded components:

```
tigera_operator_component_status{condition="degraded"} == 1
```

- Check if a specific component is available:

```
tigera_operator_component_status{component="calico", condition="available"}
```

### TLS certificate expiry

The `tigera_operator_tls_certificate_expiry_timestamp_seconds` metric reports the expiry time of each TLS certificate managed by the operator.

| Metric | Labels | Description |
|---|---|---|
| `tigera_operator_tls_certificate_expiry_timestamp_seconds` | `name`, `namespace`, `issuer` | Unix timestamp when the certificate expires. |

**Labels:**

- `name` — The Secret name containing the certificate
- `namespace` — The namespace of the Secret
- `issuer` — The certificate issuer (e.g. `tigera-operator-signer`)

**Example queries:**

- Certificates expiring within 30 days:

```
tigera_operator_tls_certificate_expiry_timestamp_seconds - time() < 30 * 24 * 3600
```

- Certificates expiring within 7 days:

```
tigera_operator_tls_certificate_expiry_timestamp_seconds - time() < 7 * 24 * 3600
```

### License

| Metric | Labels | Description |
|---|---|---|
| `tigera_operator_license_valid` | — | Whether the license is valid (`1`) or not (`0`). |
| `tigera_operator_license_expiry_timestamp_seconds` | — | Unix timestamp when the license expires. |

**Example queries:**

- License expires within 30 days:

```
tigera_operator_license_expiry_timestamp_seconds - time() < 30 * 24 * 3600
```

- License is invalid:

```
tigera_operator_license_valid == 0
```

## Built-in alerts

$[prodname] installs a PrometheusRule resource named `calico` with alerting rules that use these metrics. You can view it with:

```bash
kubectl -n tigera-prometheus get prometheusrule calico -o yaml
```

The built-in rules include:

| Alert | Condition | Severity |
|---|---|---|
| `DeniedPacketsRate` | Denied packets rate > 50/s | info |
| `TLSCertExpiringWarning` | Certificate expires in < 30 days | warning |
| `TLSCertExpiringCritical` | Certificate expires in < 7 days | critical |
| `LicenseExpiringWarning` | License expires in < 30 days | warning |
| `LicenseExpiringCritical` | License expires in < 7 days or is invalid | critical |
| `ComponentDegradedWarning` | Component degraded for > 15m | warning |
| `ComponentDegradedCritical` | Component degraded for > 30m | critical |
| `ComponentProgressingWarning` | Component progressing for > 15m | warning |
| `ComponentProgressingCritical` | Component progressing for > 30m | critical |

To route these alerts, see [Configure Alertmanager](../prometheus/alertmanager.mdx).

## Additional resources

- [License expiration and renewal](../../license-options.mdx)
- [Configure Prometheus](../prometheus/configure-prometheus.mdx)
- [BYO Prometheus](../prometheus/byo-prometheus.mdx)
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ As an example, the range query in this Manifest is 10 seconds.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: calico-prometheus-dp-rate
name: calico
namespace: tigera-prometheus
labels:
role: tigera-prometheus-rules
Expand All @@ -56,7 +56,7 @@ To update this alerting rule, to say, execute the query with a range of
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: calico-prometheus-dp-rate
name: calico
namespace: tigera-prometheus
labels:
role: tigera-prometheus-rules
Expand Down
1 change: 1 addition & 0 deletions sidebars-calico-enterprise.js
Original file line number Diff line number Diff line change
Expand Up @@ -588,6 +588,7 @@ module.exports = {
label: 'Metrics',
link: { type: 'doc', id: 'operations/monitor/metrics/index' },
items: [
'operations/monitor/metrics/operator-metrics',
'operations/monitor/metrics/recommended-metrics',
'operations/monitor/metrics/bgp-metrics',
'operations/monitor/metrics/policy-metrics',
Expand Down
Loading