Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ include::_attributes/common-attributes.adoc[]
toc::[]

[role="_abstract"]
You can configure a local or external Alertmanager instance to route alerts from Prometheus to endpoint receivers. You can also attach custom labels to all time series and alerts to add useful metadata information.
You can configure a local or external Alertmanager instance to route alerts from Prometheus to endpoint receivers to receive timely notifications about the state of your cluster. You can also attach custom labels to all time series and alerts to add useful metadata information.

//Configuring external Alertmanager instances
include::modules/monitoring-configuring-external-alertmanagers.adoc[leveloffset=+1,tags=**;CPM;!UWM]
Expand Down
3 changes: 2 additions & 1 deletion getting-started/core-platform-monitoring-first-steps.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@ include::_attributes/common-attributes.adoc[]
toc::[]

[role="_abstract"]
{ocp} provides cluster monitoring capabilities out of the box. As a cluster administrator, you can further configure monitoring components to suit the needs of different users in various scenarios.

After {ocp} is installed, core platform monitoring components start collecting metrics, which you can query and view.

The default in-cluster monitoring stack includes the core platform Prometheus instance that collects metrics from your cluster and the core Alertmanager instance that routes alerts, among other components.
Depending on who will use the monitoring stack and for what purposes, as a cluster administrator, you can further configure these monitoring components to suit the needs of different users in various scenarios.

[id="configuring-core-platform-monitoring-postinstallation-steps_{context}"]
== Configuring core platform monitoring: Postinstallation steps
Expand Down
2 changes: 1 addition & 1 deletion modules/monitoring-4-20-release-notes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
= {ocp} {product-version} monitoring release notes

[role="_abstract"]
Changes for {ocp} {product-version} monitoring stack, including new features and enhancements, Technology Previews, deprecated and removed features, known issues, and fixed issues.
Review changes for {ocp} {product-version} monitoring stack, including new features and enhancements, Technology Previews, deprecated and removed features, known issues, and fixed issues.

The changes are included in the following errata advisory:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,7 @@
= About accessing monitoring web service APIs

[role="_abstract"]
You can directly access web service API endpoints from the command line for the following monitoring stack components:

* Prometheus
* Alertmanager
* Thanos Ruler
* Thanos Querier
You can directly access web service API endpoints from the command line for Prometheus, Alertmanager, Thanos Ruler, and Thanos Querier to query metrics, manage alerts, and integrate with automation tools.

[IMPORTANT]
====
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
= Creating alerting rules for user-defined projects

[role="_abstract"]
In {ocp}, you can create alerting rules for user-defined projects. Those alerting rules will trigger alerts based on the values of the chosen metrics.
Create alerting rules for user-defined projects to monitor your custom applications and services with project-specific alerts. Those alerting rules trigger alerts based on the values of the chosen metrics.

If you create alerting rules for a user-defined project, consider the following key behaviors and important limitations when you define the new rules:

Expand Down
2 changes: 1 addition & 1 deletion modules/monitoring-about-managing-alerts.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
= Managing alerts

[role="_abstract"]
In the {ocp}, the Alerting UI enables you to manage alerts, silences, and alerting rules.
Learn about managing alerts, silences, and alerting rules through the Alerting UI to maintain cluster health and respond effectively to issues.

* *Alerting rules*. Alerting rules contain a set of conditions that outline a particular state within a cluster. Alerts are triggered when those conditions are true. An alerting rule can be assigned a severity that defines how the alerts are routed.
* *Alerts*. An alert is fired when the conditions defined in an alerting rule are true. Alerts provide a notification that a set of circumstances are apparent within an {ocp} cluster.
Expand Down
2 changes: 1 addition & 1 deletion modules/monitoring-about-monitoring-dashboards.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
= About monitoring dashboards

[role="_abstract"]
{ocp} provides a set of monitoring dashboards that help you understand the state of cluster components and user-defined workloads.
{ocp} provides a set of monitoring dashboards that help you track cluster health, identify performance issues, and troubleshoot problems across core components and user workloads.

include::snippets/unified-perspective-web-console.adoc[]

Expand Down
3 changes: 2 additions & 1 deletion modules/monitoring-about-performance-and-scalability.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
= About performance and scalability

[role="_abstract"]
You can optimize the performance and scale of your clusters.
Optimize the performance and scale of your clusters to handle larger workloads, reduce resource consumption, and improve cluster efficiency.

You can configure the monitoring stack by performing any of the following actions:

* Control the placement and distribution of monitoring components:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
= About specifying limits and requests for monitoring components

[role="_abstract"]
Specify resource limits and requests for core platform monitoring and user workload monitoring components to ensure proper resource allocation and prevent resource exhaustion.

ifndef::openshift-dedicated,openshift-rosa,openshift-rosa-hcp[]
You can configure resource limits and requests for the following core platform monitoring components:

Expand Down
3 changes: 2 additions & 1 deletion modules/monitoring-about-storing-and-recording-data.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
= About storing and recording data

[role="_abstract"]
You can store and record data to help you protect the data and use them for troubleshooting.
Store and record data to protect and use them for troubleshooting.

You can configure the monitoring stack by performing any of the following actions:

* Configure persistent storage:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
= Accessing alerting rules for user-defined projects

[role="_abstract"]
To list alerting rules for a user-defined project, you must have been assigned the `monitoring-rules-view` cluster role for the project.
Access alerting rules for user-defined projects to review current alert configurations and troubleshoot alerting issues. To list alerting rules for a user-defined project, you must be assigned the `monitoring-rules-view` cluster role for the project.

.Prerequisites

Expand Down
2 changes: 1 addition & 1 deletion modules/monitoring-accessing-the-alerting-ui.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
= Accessing the Alerting UI

[role="_abstract"]
The Alerting UI is accessible in the {ocp} web console.
Access the Alerting UI to manage alerts, silences, and alerting rules through the web console.

.Procedure

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
// end::UWM[]

[role="_abstract"]
You can add secrets to the Alertmanager configuration by editing the `{configmap-name}` config map in the `{namespace-name}` project.
Add secrets to Alertmanager configuration to enable secure authentication with external alert receivers by editing the `{configmap-name}` config map in the `{namespace-name}` project.

After you add a secret to the config map, the secret is mounted as a volume at `/etc/alertmanager/secrets/<secret_name>` within the `alertmanager` container for the Alertmanager pods.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
= Alert reference for the {cmo-full}

[role="_abstract"]
Learn about alerting rules that are managed by the {cmo-first} and are included in your cluster by default.
Review alerting rules that are managed by the {cmo-first} and are included in your cluster by default. Understanding these rules helps you identify when cluster components might fail and determine actions for faster troubleshooting and incident response.

[IMPORTANT]
====
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
// end::UWM[]

[role="_abstract"]
You can attach custom labels to all time series and alerts leaving Prometheus by using the external labels feature of Prometheus.
You can attach custom labels to all time series and alerts leaving Prometheus by using Prometheus external labels, to organize and identify metrics by environment, region, or other categories.

.Prerequisites

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
= Choosing a metrics collection profile

[role="_abstract"]
To choose a metrics collection profile for core {ocp} monitoring components, edit the `cluster-monitoring-config` `ConfigMap` object.
Choose a metrics collection profile for core {ocp} monitoring components to balance monitoring coverage with resource consumption by editing the `cluster-monitoring-config` config map in the `openshift-monitoring` project.

.Prerequisites

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
= {cmo-full} configuration reference

[role="_abstract"]
To customize how {ocp} monitors both the platform and user-defined projects, edit the relevant `ConfigMap` objects that define the cluster and user workload monitoring configurations:
To customize how {ocp} monitors both the platform and user-defined projects, edit the relevant `ConfigMap` objects that define the cluster and user workload monitoring configurations.

ifndef::openshift-dedicated,openshift-rosa,openshift-rosa-hcp[]
* To configure default monitoring components, edit the `ConfigMap` object named `cluster-monitoring-config` in the `openshift-monitoring` namespace.
Expand Down
2 changes: 1 addition & 1 deletion modules/monitoring-common-terms.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
= Glossary of common terms for {ocp} monitoring

[role="_abstract"]
This glossary defines common terms that are used in {ocp} architecture.
Review definitions of common terms when learning the monitoring stack concepts or searching for unfamiliar terminology in documentation.

Alertmanager::
Alertmanager handles alerts received from Prometheus. Alertmanager is also responsible for sending the alerts to external notification systems.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,7 @@
= Components for monitoring user-defined projects

[role="_abstract"]
{ocp}
ifndef::openshift-dedicated,openshift-rosa[]
{product-version}
endif::openshift-dedicated,openshift-rosa[]
includes an optional enhancement to the monitoring stack that helps you monitor services and pods in user-defined projects. This feature includes the following components:
{ocp} includes an optional enhancement to the monitoring stack that helps you monitor services and pods in user-defined projects. This feature includes the following components:

.Components for monitoring user-defined projects
[options="header"]
Expand Down
2 changes: 1 addition & 1 deletion modules/monitoring-configurable-monitoring-components.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
// end::UWM[]

[role="_abstract"]
The following table shows the monitoring components you can configure and the keys used to specify the components in the `{configmap-name}` config map.
Review configurable monitoring components and their corresponding config map keys used to specify the components in the `{configmap-name}` config map.

// tag::UWM[]
ifdef::openshift-dedicated,openshift-rosa[]
Expand Down
2 changes: 2 additions & 0 deletions modules/monitoring-configuring-alert-notifications-uwm.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
= Configuring alert notifications

[role="_abstract"]
Configure alert notifications for user-defined projects to ensure developers and teams receive timely notifications when their applications or services experience issues.

ifndef::openshift-dedicated,openshift-rosa,openshift-rosa-hcp[]
In {ocp}, an administrator can enable alert routing for user-defined projects with one of the following methods:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
= Configuring alert routing for user-defined projects

[role="_abstract"]
If you are a non-administrator user who has been given the `alert-routing-edit` cluster role, you can create or edit alert routing for user-defined projects.
If you are a non-administrator user with the `alert-routing-edit` cluster role, you can create or edit alert routing for user-defined projects to ensure alerts from your applications reach the appropriate notification systems and team members.

.Prerequisites

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
= Configuring different alert receivers for default platform alerts and user-defined alerts

[role="_abstract"]
You can configure different alert receivers for default platform alerts and user-defined alerts to ensure the following results:
Configure different alert receivers for platform and user-defined alerts to route notifications to the appropriate teams and reduce notification fatigue.

This configuration ensures the following results:

* All default platform alerts are sent to a receiver owned by the team in charge of these alerts.
* All user-defined alerts are sent to another receiver so that the team can focus only on platform alerts.
Expand Down
9 changes: 1 addition & 8 deletions modules/monitoring-configuring-external-alertmanagers.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,7 @@
// end::UWM[]

[role="_abstract"]
The {ocp} monitoring stack includes a local Alertmanager instance that routes alerts from Prometheus.

// tag::CPM[]
You can add external Alertmanager instances to route alerts for core {ocp} projects.
// end::CPM[]
// tag::UWM[]
You can add external Alertmanager instances to route alerts for user-defined projects.
// end::UWM[]
The {ocp} monitoring stack includes a local Alertmanager instance that routes alerts from Prometheus. You can add external Alertmanager instances to integrate with existing alerting infrastructure or centralize alert management across multiple clusters.

If you add the same external Alertmanager configuration for multiple clusters and disable the local instance for each cluster, you can then manage alert routing for multiple clusters by using a single external Alertmanager instance.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
= About metrics collection profiles

[role="_abstract"]
Optimize resource consumption with metrics collection profiles by choosing between comprehensive monitoring and essential-only metrics collection based on your cluster size and monitoring needs.

By default, Prometheus collects metrics exposed by all default metrics targets in {ocp} components.
However, you might want Prometheus to collect fewer metrics from a cluster in certain scenarios:

Expand Down
4 changes: 3 additions & 1 deletion modules/monitoring-configuring-persistent-storage.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,14 @@
= Configuring persistent storage

[role="_abstract"]
Learn about persistent storage configuration for monitoring components to properly plan and deploy production-ready monitoring infrastructure.

Run cluster monitoring with persistent storage to gain the following benefits:

* Protect your metrics and alerting data from data loss by storing them in a persistent volume (PV). As a result, they can survive pods being restarted or recreated.
* Avoid getting duplicate notifications and losing silences for alerts when the Alertmanager pods are restarted.
For production environments, it is highly recommended to configure persistent storage.
For production environments, it is highly recommended to configure persistent storage.

// tag::CPM[]
[IMPORTANT]
Expand Down
2 changes: 1 addition & 1 deletion modules/monitoring-configuring-remote-write-storage.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
// end::UWM[]

[role="_abstract"]
You can configure remote write storage to enable Prometheus to send ingested metrics to remote systems for long-term storage. Doing so has no impact on how or for how long Prometheus stores metrics.
Extend metrics retention and centralize monitoring data by sending metrics to external systems, supporting compliance requirements and long-term analytics. Doing so has no impact on how or for how long Prometheus stores metrics.

.Prerequisites

Expand Down
4 changes: 2 additions & 2 deletions modules/monitoring-configuring-secrets-for-alertmanager.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
= Configuring secrets for Alertmanager

[role="_abstract"]
The {ocp} monitoring stack includes Alertmanager, which routes alerts from Prometheus to endpoint receivers.
Securely send alerts to authenticated endpoints by configuring Alertmanager secrets, protecting sensitive credentials while maintaining reliable alert delivery to external systems.

If you need to authenticate with a receiver so that Alertmanager can send alerts to it, you can configure Alertmanager to use a secret that contains authentication credentials for the receiver.
The monitoring stack includes Alertmanager, which routes alerts from Prometheus to endpoint receivers. If you need to authenticate with a receiver so that Alertmanager can send alerts to it, you can configure Alertmanager to use a secret that contains authentication credentials for the receiver.

For example, you can configure Alertmanager to use a secret to authenticate with an endpoint receiver that requires a certificate issued by a private Certificate Authority (CA).
You can also configure Alertmanager to use a secret to authenticate with a receiver that requires a password file for Basic HTTP authentication.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,9 @@
= Controlling the placement and distribution of monitoring components

[role="_abstract"]
You can move the monitoring stack components to specific nodes:
Control the placement and distribution of monitoring components across cluster nodes to optimize system resource use, improve performance, and separate workloads based on specific requirements or policies.

You can move the monitoring stack components to specific nodes with the following methods:

* Use the `nodeSelector` constraint with labeled nodes to move any of the monitoring stack components to specific nodes.
* Assign tolerations to enable moving components to tainted nodes.
By doing so, you control the placement and distribution of the monitoring components across a cluster.

By controlling placement and distribution of monitoring components, you can optimize system resource use, improve performance, and separate workloads based on specific requirements or policies.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
= Controlling the impact of unbound metrics attributes in user-defined projects

[role="_abstract"]
Developers can create labels to define attributes for metrics in the form of key-value pairs. The number of potential key-value pairs corresponds to the number of possible values for an attribute.
Prevent monitoring performance degradation and excessive resource consumption by controlling the impact of unbound metrics attributes.

Developers can create labels to define attributes for metrics in the form of key-value pairs. The number of potential key-value pairs corresponds to the number of possible values for an attribute.

An attribute that has an unlimited number of potential values is called an unbound attribute. For example, a `customer_id` attribute is unbound because it has an infinite number of possible values.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
= Creating alerting rules for user-defined projects

[role="_abstract"]
You can create alerting rules for user-defined projects. Those alerting rules will trigger alerts based on the values of the chosen metrics.
Create alerting rules for user-defined projects to monitor your custom applications and receive notifications when specific conditions or thresholds are met to improve incident response times.

[NOTE]
====
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,7 @@
// end::UWM[]

[role="_abstract"]
You can create cluster ID labels for metrics by adding the `write_relabel` settings for remote write storage in the `{configmap-name}` config map in the `{namespace-name}` namespace.

By adding a cluster ID label, you can uniquely identify metrics and track them consistently across clusters and workloads.
You can create cluster ID labels for metrics to uniquely identify and track metrics across clusters and workloads by adding the `write_relabel` settings for remote write storage in the `{configmap-name}` config map in the `{namespace-name}` namespace.

ifndef::openshift-dedicated,openshift-rosa[]
// tag::UWM[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
= Creating a cluster monitoring config map

[role="_abstract"]
You can configure the core {ocp} monitoring components by creating and updating the `cluster-monitoring-config` config map in the `openshift-monitoring` project. The {cmo-first} then configures the core components of the monitoring stack.
Customize the default monitoring stack to match your infrastructure and performance requirements by creating and updating the `cluster-monitoring-config` config map in the `openshift-monitoring` project.

.Prerequisites

Expand Down
Loading