Merge pull request #1602 from StackVista/stac-22541

aacevedoosorio · web-flow · commit 8097005c7e57 · 2025-04-16T14:15:55.000+02:00
STAC-22541: Derived state monitors
diff --git a/SUMMARY.md b/SUMMARY.md
@@ -33,6 +33,7 @@
   * [Troubleshooting](use/alerting/notifications/troubleshooting.md)
 * [Customize](dynamic/customize-alerting.md)
   * [Add a monitor using the CLI](use/alerting/k8s-add-monitors-cli.md)
+  * [Derived State monitor](use/alerting/k8s-derived-state-monitors.md)
   * [Override monitor arguments](use/alerting/k8s-override-monitor-arguments.md)
   * [Write a remediation guide](use/alerting/k8s-write-remediation-guide.md)
 
diff --git a/use/alerting/k8s-derived-state-monitors.md b/use/alerting/k8s-derived-state-monitors.md
@@ -0,0 +1,39 @@
+---
+description: SUSE Observability
+---
+
+# Derived State Monitors
+
+## Overview
+
+In Observability scenarios where logical (business) components lack direct monitors but are affected by issues in their technical dependencies, you can use the derived-state-monitor function to derive a state from the connected technical components for the logical component.
+This monitor traverses component dependencies and selects the most critical health state based on direct observations (e.g., from metrics), ignoring any already-derived states. It will apply the derived state to all components selected through the `componentTypes` parameter.
+During traversal, only components with observed (non-derived) health states are considered for health derivation. Components with derived states are skipped in evaluation but still traversed to reach deeper dependencies—for example, logical components depending on other logical components.
+
+## Derived Health State Monitor example
+
+A Monitor implemented using the `derived-state-monitor` function looks like:
+
+```
+  - _type: "Monitor"
+    name: "Aggregated health state of a Deployment, StatefulSet, ReplicaSet and DaemonSet"
+    tags:
+      - deployments
+      - replicasets
+      - statefulsets
+      - daemonsets
+      - derived
+      - propagated
+    identifier: "urn:custom:monitor:..."
+    status: "DISABLED"
+    description: "Description"
+    function: {{ get "urn:stackpack:common:monitor-function:derived-state-monitor" }}
+    arguments:
+      componentTypes: "deployment, replicaset, statefulset, daemonset"
+    intervalSeconds: 30
+    remediationHint: "Investigate component [{{ causeName }}](/#/components/{{ causeComponentUrnForUrl }}) as is causing the workload to be unhealthy."
+```
+* The function has a single argument `componentTypes` where you can express the different component types as a single string of `,` separated values
+* The function offers two values to use in the remediation guide, `causeComponentName` being the component name where the state is propagated from and its `causeComponentUrnForUrl` to be able to create a link
+
+The monitor can be implemented using the guide at [Add a threshold monitor to components using the CLI](/use/alerting/k8s-add-monitors-cli.md)
diff --git a/use/alerting/kubernetes-monitors.md b/use/alerting/kubernetes-monitors.md
@@ -144,22 +144,18 @@ Cluster doesn't have any health itself. But a cluster is build from few componen
 - all nodes
 and then takes the most critical health state.
 
-### Aggregated health state of a DaemonSet
+### Derived Workloads health state (Deployment, DaemonSet, ReplicaSet, StatefulSet)
 
-The monitor aggregates states of all children Pods and then returns the most critical health state.
+The monitor aggregates states of all top-most dependencies and then returns the most critical health state based on direct observations (e.g., from metrics).
+This approach ensures that health signals propagate from low-level technical components (like Pods) to higher-level logical components, but only when the component itself lacks an observed health state.
+To use this monitor effectively, make sure that some or all of following health checks are disabled:
+* Deployment desired replicas
+* DaemonSet desired replicas
+* ReplicaSet desired replicas
+* StatefulSet desired replicas
 
-### Aggregated health state of a Deployment
+If you have a use case where logical components have no direct monitors then you can use the [Derived State Monitor](/use/alerting/k8s-derived-state-monitors.md) function to infer their health based on the technical components they depend on.
 
-The monitor aggregates states of all children ReplicaSets and then returns the most critical health state. ReplicaSets have
-the similar Monitor, so eventually this one aggregates health states of all children ReplicaSets and Pods.
-
-### Aggregated health state of a ReplicaSet
-
-The monitor aggregates states of all children Pods and then returns the most critical health state.
-
-### Aggregated health state of a StatefulSet
-
-The monitor aggregates states of all children Pods and then returns the most critical health state.
 
 ## See also