STAC-22541: Derived state monitors

aacevedoosorio · aacevedoosorio · commit 0cb7fb14ccdd · 2025-04-11T13:49:52.000+02:00
diff --git a/SUMMARY.md b/SUMMARY.md
@@ -32,6 +32,7 @@
   * [Troubleshooting](use/alerting/notifications/troubleshooting.md)
 * [Customize](dynamic/customize-alerting.md)
   * [Add a monitor using the CLI](use/alerting/k8s-add-monitors-cli.md)
+  * [Derived State monitor](use/alerting/k8s-derived-state-monitors.md)
   * [Override monitor arguments](use/alerting/k8s-override-monitor-arguments.md)
   * [Write a remediation guide](use/alerting/k8s-write-remediation-guide.md)
 
diff --git a/use/alerting/k8s-derived-state-monitors.md b/use/alerting/k8s-derived-state-monitors.md
@@ -0,0 +1,38 @@
+---
+description: SUSE Observability
+---
+
+# Derived State Monitors
+
+## Overview
+
+When in you Observability use case you have a set of logical components (bussiness components) that do not have any monitors on themselves but that get impacted when any technical dependencies have issues,
+then you can propagate such health state to them using the `derived-state-monitor` function.
+This function allows you to based on a starting group of components selected by `componentType` to discover and inherit the most critical health state of them at the top most layer,
+the discovery process focuses only on components that do have health state applied by observations (based on metrics for example) and excludes any components that their health state is derived (although keeps traversing across them), for example when you have a logical component depending on other logical component.
+
+## Derived Health State Monitor example
+
+A Monitor implemented using the `derived-state-monitor` function looks like, this can be implement using the guide at [Add a threshold monitor to components using the CLI](/use/alerting/k8s-add-monitors-cli.md):
+
+```
+  - _type: "Monitor"
+    name: "Aggregated health state of a Deployment, StatefulSet, ReplicaSet and DaemonSet"
+    tags:
+      - deployments
+      - replicasets
+      - statefulsets
+      - daemonsets
+      - derived
+      - propagated
+    identifier: "urn:custom:monitor:..."
+    status: "DISABLED"
+    description: "Description"
+    function: {{ get "urn:stackpack:common:monitor-function:derived-state-monitor" }}
+    arguments:
+      componentTypes: "deployment, replicaset, statefulset, daemonset"
+    intervalSeconds: 30
+    remediationHint: "Investigate component [{{ causeName }}](/#/components/{{ causeComponentUrnForUrl }}) as is causing the workload to be unhealthy."
+```
+* The function has a single argument `componentTypes` where you can express the different component types as a single string of "," separated values
+* The function offers two values to use in the remediation guide, `causeName` being the component name where the state is propagated from and its `causeComponentUrnForUrl` to be able to create a link
diff --git a/use/alerting/kubernetes-monitors.md b/use/alerting/kubernetes-monitors.md
@@ -144,22 +144,19 @@ Cluster doesn't have any health itself. But a cluster is build from few componen
 - all nodes
 and then takes the most critical health state.
 
-### Aggregated health state of a DaemonSet
+### Derived Workloads health state (Deployment, DaemonSet, ReplicaSet, StatefulSet)
 
-The monitor aggregates states of all children Pods and then returns the most critical health state.
+The monitor aggregates states of all children Pod and then returns the most critical health state.
+It uses a monitor function based on traversing component dependencies to find the most critical health state that is based on observations and not on any other derived states.
+For example the health state of a Deployment will be derived from Pods as those health states are based on actual data and will not take into account the derived state that
+a replicaset it depends on might have. Such monitor function helps propagate the health state from technical components into logical components, and it will only apply a derived state if the component does not have already a health state based on observations.
+Therefor is you want to activate this monitor then the `Deployment desired replicas`, `Daemonset desired replicas`, `Replicaset desired replicas` and `Statefulset desired replicas` should be disabled.
+If in your monitoring case you have a use case where you want to get health states to logical components that do not have any monitors on themselves you can implement a monitor such as 
+```
+    
+```
 
-### Aggregated health state of a Deployment
 
-The monitor aggregates states of all children ReplicaSets and then returns the most critical health state. ReplicaSets have
-the similar Monitor, so eventually this one aggregates health states of all children ReplicaSets and Pods.
-
-### Aggregated health state of a ReplicaSet
-
-The monitor aggregates states of all children Pods and then returns the most critical health state.
-
-### Aggregated health state of a StatefulSet
-
-The monitor aggregates states of all children Pods and then returns the most critical health state.
 
 ## See also