From 4a72eee218a3f4a3b91d1e8a7217a0965c56742c Mon Sep 17 00:00:00 2001 From: dfitzmau Date: Tue, 3 Mar 2026 12:54:38 +0000 Subject: [PATCH] OSDOCS-16874-5: CQA for SCALE-4 Low-Latency/CNF Debugging and Related --- ...er-node-tuning-operator-specification.adoc | 17 +-- ...-tuning-operator-default-profiles-set.adoc | 9 +- ...-node-tuning-operator-verify-profiles.adoc | 35 +++--- modules/custom-tuning-specification.adoc | 106 ++++++++++++------ ...-observability-create-custom-resource.adoc | 32 +++--- ...ode-observability-high-level-workflow.adoc | 3 + modules/node-observability-install-cli.adoc | 9 +- ...ode-observability-install-web-console.adoc | 12 ++ modules/node-observability-installation.adoc | 3 +- ...ode-observability-run-profiling-query.adoc | 9 +- modules/node-observability-scripting-cr.adoc | 31 +++-- modules/node-observability-scripting.adoc | 6 +- modules/node-tuning-operator.adoc | 14 +-- ...od-interactions-with-topology-manager.adoc | 10 +- modules/setting-up-cpu-manager.adoc | 23 ++-- modules/setting-up-topology-manager.adoc | 16 +-- modules/topology-manager-policies.adoc | 5 +- .../node-observability-operator.adoc | 12 +- .../using-cpu-manager.adoc | 3 + .../using-node-tuning-operator.adoc | 9 +- 20 files changed, 213 insertions(+), 151 deletions(-) diff --git a/modules/accessing-an-example-cluster-node-tuning-operator-specification.adoc b/modules/accessing-an-example-cluster-node-tuning-operator-specification.adoc index 6fd8c6eea2ca..91ac035f4b45 100644 --- a/modules/accessing-an-example-cluster-node-tuning-operator-specification.adoc +++ b/modules/accessing-an-example-cluster-node-tuning-operator-specification.adoc @@ -8,20 +8,23 @@ [id="accessing-an-example-node-tuning-operator-specification_{context}"] = Accessing an example Node Tuning Operator specification -Use this process to access an example Node Tuning Operator specification. +[role="_abstract"] +To understand how to correctly format your tuning parameters, access an example Node Tuning Operator specification. By reviewing this template, you can properly configure node-level settings for your workloads. + +For the example Node Tuning Operator specification provided in the procedure, the default CR is meant for delivering standard node-level tuning for the {product-title} platform and it can only be modified to set the Operator Management state. Any other custom changes to the default CR will be overwritten by the Operator. For custom tuning, create your own Tuned CRs. Newly created CRs will be combined with the default CR and custom tuning applied to {product-title} nodes based on node or pod labels and profile priorities. + +[WARNING] +==== +While in certain situations the support for pod labels can be a convenient way of automatically delivering required tuning, this practice is discouraged and strongly advised against, especially in large-scale clusters. The default Tuned CR ships without pod label matching. If a custom profile is created with pod label matching, then the functionality will be enabled at that time. The pod label functionality will be deprecated in future versions of the Node Tuning Operator. +==== .Procedure - * Run the following command to access an example Node Tuning Operator specification: +* Run the following command to access an example Node Tuning Operator specification: + [source,terminal] ---- oc get tuned.tuned.openshift.io/default -o yaml -n openshift-cluster-node-tuning-operator ---- -The default CR is meant for delivering standard node-level tuning for the {product-title} platform and it can only be modified to set the Operator Management state. Any other custom changes to the default CR will be overwritten by the Operator. For custom tuning, create your own Tuned CRs. Newly created CRs will be combined with the default CR and custom tuning applied to {product-title} nodes based on node or pod labels and profile priorities. -[WARNING] -==== -While in certain situations the support for pod labels can be a convenient way of automatically delivering required tuning, this practice is discouraged and strongly advised against, especially in large-scale clusters. The default Tuned CR ships without pod label matching. If a custom profile is created with pod label matching, then the functionality will be enabled at that time. The pod label functionality will be deprecated in future versions of the Node Tuning Operator. -==== diff --git a/modules/cluster-node-tuning-operator-default-profiles-set.adoc b/modules/cluster-node-tuning-operator-default-profiles-set.adoc index f1428b7ada82..620b6921ee64 100644 --- a/modules/cluster-node-tuning-operator-default-profiles-set.adoc +++ b/modules/cluster-node-tuning-operator-default-profiles-set.adoc @@ -8,7 +8,10 @@ [id="custom-tuning-default-profiles-set_{context}"] = Default profiles set on a cluster -The following are the default profiles set on a cluster. +[role="_abstract"] +To understand the baseline configurations automatically applied to your environment, review the default profiles set on a cluster. By analyzing these built-in settings, you can determine if additional node tuning is necessary for your specific workloads. + +The following configuration example shows default profiles set on a cluster: [source,yaml] ---- @@ -32,10 +35,10 @@ spec: - label: node-role.kubernetes.io/infra - profile: openshift-node priority: 40 +# ... ---- -Starting with {product-title} 4.9, all OpenShift TuneD profiles are shipped with -the TuneD package. You can use the `oc exec` command to view the contents of these profiles: +Starting with {product-title} 4.9, all {product-title} TuneD profiles are shipped with the TuneD package. You can use the following `oc exec` command to view the contents of these profiles: [source,terminal] ---- diff --git a/modules/cluster-node-tuning-operator-verify-profiles.adoc b/modules/cluster-node-tuning-operator-verify-profiles.adoc index 0339373127a5..95459e1702dc 100644 --- a/modules/cluster-node-tuning-operator-verify-profiles.adoc +++ b/modules/cluster-node-tuning-operator-verify-profiles.adoc @@ -4,15 +4,20 @@ :_mod-docs-content-type: PROCEDURE [id="verifying-tuned-profiles-are-applied_{context}"] -= Verifying that the TuneD profiles are applied += Verifying that the TuneD profiles are applied -Verify the TuneD profiles that are applied to your cluster node. +[role="_abstract"] +To confirm that your node-level tuning configurations are active, verify the TuneD profiles applied to your cluster node. Checking these settings ensures that your system is correctly optimized for your specific workloads. +.Procedure + +. Verify the TuneD profiles that are applied to your cluster node by entering the following command: ++ [source,terminal] ---- $ oc get profile.tuned.openshift.io -n openshift-cluster-node-tuning-operator ---- - ++ .Example output [source,terminal] ---- @@ -23,27 +28,25 @@ master-2 openshift-control-plane True False 6h33m worker-a openshift-node True False 6h28m worker-b openshift-node True False 6h28m ---- ++ +** `NAME`: Specifies the name of the Profile object. There is one Profile object per node and their names match. +** `TUNED`: Specifies the name of the desired TuneD profile to apply. +** `APPLIED`: Set as `True` if the TuneD daemon applied the desired profile. Supported values include `True`, `False`, and `Unknown`. +** `DEGRADED`: Set as `True` if any errors were reported during application of the TuneD profile. Supported values include `True`, `False`, and `Unknown`. +** `AGE`: Specifies the time elapsed since the creation of Profile object. -* `NAME`: Name of the Profile object. There is one Profile object per node and their names match. -* `TUNED`: Name of the desired TuneD profile to apply. -* `APPLIED`: `True` if the TuneD daemon applied the desired profile. (`True/False/Unknown`). -* `DEGRADED`: `True` if any errors were reported during application of the TuneD profile (`True/False/Unknown`). -* `AGE`: Time elapsed since the creation of Profile object. - -The `ClusterOperator/node-tuning` object also contains useful information about the Operator and its node agents' health. For example, Operator misconfiguration is reported by `ClusterOperator/node-tuning` status messages. - -To get status information about the `ClusterOperator/node-tuning` object, run the following command: - +. To get status information about the `ClusterOperator/node-tuning` object, run the following command. The `ClusterOperator/node-tuning` object also contains useful information about the Operator and the health status of node agents. For example, Operator misconfiguration is reported by `ClusterOperator/node-tuning` status messages. ++ [source,terminal] ---- $ oc get co/node-tuning -n openshift-cluster-node-tuning-operator ---- - ++ .Example output [source,terminal,subs="attributes+"] ---- NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE node-tuning {product-version}.1 True False True 60m 1/5 Profiles with bootcmdline conflict ---- - -If either the `ClusterOperator/node-tuning` or a profile object's status is `DEGRADED`, additional information is provided in the Operator or operand logs. ++ +If either the `ClusterOperator/node-tuning` or the status of a profile object is `DEGRADED`, additional information is provided in the Operator or operand logs. diff --git a/modules/custom-tuning-specification.adoc b/modules/custom-tuning-specification.adoc index 337c03803219..b0da59d41d47 100644 --- a/modules/custom-tuning-specification.adoc +++ b/modules/custom-tuning-specification.adoc @@ -12,9 +12,12 @@ endif::[] [id="custom-tuning-specification_{context}"] = Custom tuning specification +[role="_abstract"] +To define custom node-level configurations for your workloads, review the custom tuning specification. By understanding the structure of the custom resource (CR) for the Operator, you can correctly format your TuneD profiles and selection logic. + The custom resource (CR) for the Operator has two major sections. The first section, `profile:`, is a list of TuneD profiles and their names. The second, `recommend:`, defines the profile selection logic. -Multiple custom tuning specifications can co-exist as multiple CRs in the Operator's namespace. The existence of new CRs or the deletion of old CRs is detected by the Operator. All existing custom tuning specifications are merged and appropriate objects for the containerized TuneD daemons are updated. +Multiple custom tuning specifications can co-exist as multiple CRs in the namespace of the Operator. The existence of new CRs or the deletion of old CRs is detected by the Operator. All existing custom tuning specifications are merged and appropriate objects for the containerized TuneD daemons are updated. *Management state* @@ -102,26 +105,39 @@ The individual items of the list: ifndef::rosa-hcp-tuning[] [source,yaml] ---- -- machineConfigLabels: <1> - <2> - match: <3> - <4> - priority: <5> - profile: <6> - operand: <7> - debug: <8> +- machineConfigLabels: + + match: + + priority: + profile: + operand: + debug: tunedConfig: - reapply_sysctl: <9> + reapply_sysctl: ---- -<1> Optional. -<2> A dictionary of key/value `MachineConfig` labels. The keys must be unique. -<3> If omitted, profile match is assumed unless a profile with a higher priority matches first or `machineConfigLabels` is set. -<4> An optional list. -<5> Profile ordering priority. Lower numbers mean higher priority (`0` is the highest priority). -<6> A TuneD profile to apply on a match. For example `tuned_profile_1`. -<7> Optional operand configuration. -<8> Turn debugging on or off for the TuneD daemon. Options are `true` for on or `false` for off. The default is `false`. -<9> Turn `reapply_sysctl` functionality on or off for the TuneD daemon. Options are `true` for on and `false` for off. ++ +where: ++ +-- +`machineConfigLabels`:: Optional parameter. + +``:: Specifies a dictionary of key/value `MachineConfig` labels. The keys must be unique. + +`match`:: If omitted, profile match is assumed unless a profile with a higher priority matches first or `machineConfigLabels` is set. + +``:: An optional list. + +`priority`:: Specifies profile ordering priority. Lower numbers mean higher priority (`0` is the highest priority). + +``:: Specifies a TuneD profile to apply on a match. For example `tuned_profile_1`. + +`operand`:: Optional operand configuration. + +`debug`:: Turn debugging on or off for the TuneD daemon. Options are `true` for on or `false` for off. The default is `false`. + +`tunedConfig.reapply_sysctl`:: Turn `reapply_sysctl` functionality on or off for the TuneD daemon. Options are `true` for on and `false` for off. +-- endif::rosa-hcp-tuning[] ifdef::rosa-hcp-tuning[] [source,json] @@ -134,22 +150,30 @@ ifdef::rosa-hcp-tuning[] ], "recommend": [ { - "profile": , <1> - "priority":{ , <2> + "profile": , + "priority":{ , }, - "match": [ <3> + "match": [ { - "label": <4> + "label": }, ] }, ] } ---- -<1> A TuneD profile to apply on a match. For example `tuned_profile_1`. -<2> Profile ordering priority. Lower numbers mean higher priority (`0` is the highest priority). -<3> If omitted, profile match is assumed unless a profile with a higher priority matches first. -<4> The label for the profile matched items. ++ +where: ++ +-- +`profile`:: Specifies a TuneD profile to apply on a match. For example `tuned_profile_1`. + +`priority`:: Specifies profile ordering priority. Lower numbers mean higher priority (`0` is the highest priority). + +`match`:: If omitted, profile match is assumed unless a profile with a higher priority matches first. + +`label`:: Specifies the label for the profile matched items. +-- endif::[] `` is an optional list recursively defined as follows: @@ -157,26 +181,34 @@ endif::[] ifndef::rosa-hcp-tuning[] [source,yaml] ---- -- label: <1> - value: <2> - type: <3> - <4> +- label: + value: + type: + ---- -<1> Node or pod label name. -<2> Optional node or pod label value. If omitted, the presence of `` is enough to match. -<3> Optional object type (`node` or `pod`). If omitted, `node` is assumed. -<4> An optional `` list. ++ +where: ++ +-- +`label`:: Specifies the node or pod label name. + +`value`:: Specifies an optional node or pod label value. If omitted, the presence of `` is enough to match. + +`type`:: Specifies an optional object type, such as `node` or `pod`. If omitted, `node` is assumed. + +`` An optional `` list. +-- endif::rosa-hcp-tuning[] ifdef::rosa-hcp-tuning[] [source,yaml] ---- "match": [ { - "label": <1> + "label": }, ] ---- -<1> Node or pod label name. +** `label` Node or pod label name. endif::[] If `` is not omitted, all nested `` sections must also evaluate to `true`. Otherwise, `false` is assumed and the profile with the respective `` section will not be applied or recommended. Therefore, the nesting (child `` sections) works as logical AND operator. Conversely, if any item of the `` list matches, the entire `` list evaluates to `true`. Therefore, the list acts as logical OR operator. diff --git a/modules/node-observability-create-custom-resource.adoc b/modules/node-observability-create-custom-resource.adoc index 1f4e81587d4a..b05bc52f596a 100644 --- a/modules/node-observability-create-custom-resource.adoc +++ b/modules/node-observability-create-custom-resource.adoc @@ -6,11 +6,12 @@ [id="creating-node-observability-custom-resource_{context}"] = Creating the Node Observability custom resource -You must create and run the `NodeObservability` custom resource (CR) before you run the profiling query. When you run the `NodeObservability` CR, it creates the necessary machine config and machine config pool CRs to enable the CRI-O profiling on the worker nodes matching the `nodeSelector`. +[role="_abstract"] +You must create and run the `NodeObservability` custom resource (CR) before you run the profiling query. When you run the `NodeObservability` CR, the CR creates the necessary machine config and machine config pool CRs to enable the CRI-O profiling on the compute nodes matching the `nodeSelector`. [IMPORTANT] ==== -If CRI-O profiling is not enabled on the worker nodes, the `NodeObservabilityMachineConfig` resource gets created. Worker nodes matching the `nodeSelector` specified in `NodeObservability` CR restarts. This might take 10 or more minutes to complete. +If CRI-O profiling is not enabled on the compute nodes, the `NodeObservabilityMachineConfig` resource gets created. Compute nodes matching the `nodeSelector` specified in `NodeObservability` CR restarts. This might take 10 or more minutes to complete. ==== [NOTE] @@ -21,8 +22,9 @@ Kubelet profiling is enabled by default. The CRI-O unix socket of the node is mounted on the agent pod, which allows the agent to communicate with CRI-O to run the pprof request. Similarly, the `kubelet-serving-ca` certificate chain is mounted on the agent pod, which allows secure communication between the agent and node's kubelet endpoint. .Prerequisites + * You have installed the Node Observability Operator. -* You have installed the OpenShift CLI (oc). +* You have installed the {oc-first}. * You have access to the cluster with `cluster-admin` privileges. .Procedure @@ -45,17 +47,18 @@ $ oc project node-observability-operator + [source,yaml] ---- - apiVersion: nodeobservability.olm.openshift.io/v1alpha2 - kind: NodeObservability - metadata: - name: cluster <1> - spec: - nodeSelector: - kubernetes.io/hostname: <2> - type: crio-kubelet +apiVersion: nodeobservability.olm.openshift.io/v1alpha2 +kind: NodeObservability +metadata: + name: cluster +spec: + nodeSelector: + kubernetes.io/hostname: + type: crio-kubelet ---- -<1> You must specify the name as `cluster` because there should be only one `NodeObservability` CR per cluster. -<2> Specify the nodes on which the Node Observability agent must be deployed. ++ +** `metadata.name`: Specifies the name as `cluster` because there should be only one `NodeObservability` CR per cluster. +** `spec.nodeSelector.kubernetes.io/hostname`: Specifies the nodes on which the Node Observability agent must be deployed. . Run the `NodeObservability` CR: + @@ -63,7 +66,6 @@ $ oc project node-observability-operator ---- oc apply -f nodeobservability.yaml ---- - + .Example output [source,terminal] @@ -77,7 +79,6 @@ nodeobservability.olm.openshift.io/cluster created ---- $ oc get nob/cluster -o yaml | yq '.status.conditions' ---- - + .Example output [source,terminal] @@ -91,6 +92,5 @@ conditions: status: "True" type: Ready ---- - + `NodeObservability` CR run is completed when the reason is `Ready` and the status is `True`. diff --git a/modules/node-observability-high-level-workflow.adoc b/modules/node-observability-high-level-workflow.adoc index 9821ebfc296b..97322516a56c 100644 --- a/modules/node-observability-high-level-workflow.adoc +++ b/modules/node-observability-high-level-workflow.adoc @@ -6,6 +6,9 @@ [id="workflow-node-observability-operator_{context}"] = Workflow of the Node Observability Operator +[role="_abstract"] +To systematically query and analyze profiling data, follow the workflow for the Node Observability Operator. By understanding this process, you can collect metrics and troubleshoot performance issues on your compute nodes. + The following workflow outlines on how to query the profiling data using the Node Observability Operator: . Install the Node Observability Operator in the {product-title} cluster. diff --git a/modules/node-observability-install-cli.adoc b/modules/node-observability-install-cli.adoc index e937544f218e..d84cff979a24 100644 --- a/modules/node-observability-install-cli.adoc +++ b/modules/node-observability-install-cli.adoc @@ -6,11 +6,12 @@ [id="install-node-observability-using-cli_{context}"] = Installing the Node Observability Operator using the CLI -You can install the Node Observability Operator by using the OpenShift CLI (oc). +[role="_abstract"] +You can install the Node Observability Operator by using the {oc-first}. .Prerequisites -* You have installed the OpenShift CLI (oc). +* You have installed the {oc-first}. * You have access to the cluster with `cluster-admin` privileges. .Procedure @@ -21,7 +22,6 @@ You can install the Node Observability Operator by using the OpenShift CLI (oc). ---- $ oc get packagemanifests -n openshift-marketplace node-observability-operator ---- - + .Example output [source,terminal] @@ -78,7 +78,6 @@ EOF ---- $ oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name' ---- - + .Example output [source,terminal] @@ -94,7 +93,6 @@ $ oc -n node-observability-operator get ip -o yaml | yq '.st ---- + `` is the install plan name that you obtained from the output of the previous command. - + .Example output [source,terminal] @@ -108,7 +106,6 @@ COMPLETE ---- $ oc get deploy -n node-observability-operator ---- - + .Example output [source,terminal] diff --git a/modules/node-observability-install-web-console.adoc b/modules/node-observability-install-web-console.adoc index e5c74848793a..48497ade2d05 100644 --- a/modules/node-observability-install-web-console.adoc +++ b/modules/node-observability-install-web-console.adoc @@ -6,6 +6,7 @@ [id="install-node-observability-using-web-console_{context}"] = Installing the Node Observability Operator using the web console +[role="_abstract"] You can install the Node Observability Operator from the {product-title} web console. .Prerequisites @@ -16,16 +17,27 @@ You can install the Node Observability Operator from the {product-title} web con .Procedure . Log in to the {product-title} web console. + . In the Administrator's navigation panel, select *Ecosystem* -> *Software Catalog*. + . In the *All items* field, enter *Node Observability Operator* and select the *Node Observability Operator* tile. + . Click *Install*. + . On the *Install Operator* page, configure the following settings: ++ .. In the *Update channel* area, click *alpha*. ++ .. In the *Installation mode* area, click *A specific namespace on the cluster*. ++ .. From the *Installed Namespace* list, select *node-observability-operator* from the list. ++ .. In the *Update approval* area, select *Automatic*. ++ .. Click *Install*. .Verification + . In the Administrator's navigation panel, expand *Ecosystem* -> *Installed Operators*. + . Verify that the Node Observability Operator is listed in the Operators list. diff --git a/modules/node-observability-installation.adoc b/modules/node-observability-installation.adoc index 147568ef5a79..d6d338607881 100644 --- a/modules/node-observability-installation.adoc +++ b/modules/node-observability-installation.adoc @@ -4,6 +4,7 @@ :_mod-docs-content-type: CONCEPT [id="install-node-observability-operator_{context}"] -= Installing the Node Observability Operator += Node Observability Operator installation methods +[role="_abstract"] The Node Observability Operator is not installed in {product-title} by default. You can install the Node Observability Operator by using the {product-title} CLI or the web console. diff --git a/modules/node-observability-run-profiling-query.adoc b/modules/node-observability-run-profiling-query.adoc index f0760c4ee6fa..8114829c9c81 100644 --- a/modules/node-observability-run-profiling-query.adoc +++ b/modules/node-observability-run-profiling-query.adoc @@ -6,7 +6,10 @@ [id="running-profiling-query_{context}"] = Running the profiling query -To run the profiling query, you must create a `NodeObservabilityRun` resource. The profiling query is a blocking operation that fetches CRI-O and Kubelet profiling data for a duration of 30 seconds. After the profiling query is complete, you must retrieve the profiling data inside the container file system `/run/node-observability` directory. The lifetime of data is bound to the agent pod through the `emptyDir` volume, so you can access the profiling data while the agent pod is in the `running` status. +[role="_abstract"] +To run the profiling query, you must create a `NodeObservabilityRun` resource. The profiling query is a blocking operation that fetches CRI-O and Kubelet profiling data for a duration of 30 seconds. + +After the profiling query is complete, you must retrieve the profiling data inside the container file system `/run/node-observability` directory. The lifetime of data is bound to the agent pod through the `emptyDir` volume, so you can access the profiling data while the agent pod is in the `running` status. [IMPORTANT] ==== @@ -14,6 +17,7 @@ You can request only one profiling query at any point of time. ==== .Prerequisites + * You have installed the Node Observability Operator. * You have created the `NodeObservability` custom resource (CR). * You have access to the cluster with `cluster-admin` privileges. @@ -31,6 +35,7 @@ metadata: spec: nodeObservabilityRef: name: cluster +# ... ---- . Trigger the profiling query by running the `NodeObservabilityRun` resource: @@ -46,7 +51,6 @@ $ oc apply -f nodeobservabilityrun.yaml ---- $ oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq '.status.conditions' ---- - + .Example output [source,terminal] @@ -63,7 +67,6 @@ conditions: status: "True" type: Finished ---- - + The profiling query is complete once the status is `True` and type is `Finished`. diff --git a/modules/node-observability-scripting-cr.adoc b/modules/node-observability-scripting-cr.adoc index 55b7ee154c28..fc7dc5ab3ffa 100644 --- a/modules/node-observability-scripting-cr.adoc +++ b/modules/node-observability-scripting-cr.adoc @@ -6,9 +6,11 @@ [id="node-observability-scripting-cr_{context}"] = Creating the Node Observability custom resource for scripting -You must create and run the `NodeObservability` custom resource (CR) before you run the scripting. When you run the `NodeObservability` CR, it enables the agent in scripting mode on the compute nodes matching the `nodeSelector` label. +[role="_abstract"] +You must create and run the `NodeObservability` custom resource (CR) before you run the scripting. When you run the `NodeObservability` CR, the CR enables the agent in scripting mode on the compute nodes matching the `nodeSelector` label. .Prerequisites + * You have installed the Node Observability Operator. * You have installed the {oc-first}. * You have access to the cluster with `cluster-admin` privileges. @@ -33,19 +35,19 @@ $ oc project node-observability-operator + [source,yaml] ---- - apiVersion: nodeobservability.olm.openshift.io/v1alpha2 - kind: NodeObservability - metadata: - name: cluster <1> - spec: - nodeSelector: - kubernetes.io/hostname: <2> - type: scripting <3> +apiVersion: nodeobservability.olm.openshift.io/v1alpha2 +kind: NodeObservability +metadata: + name: cluster +spec: + nodeSelector: + kubernetes.io/hostname: + type: scripting <3> ---- -<1> You must specify the name as `cluster` because there should be only one `NodeObservability` CR per cluster. -<2> Specify the nodes on which the Node Observability agent must be deployed. -<3> To deploy the agent in scripting mode, you must set the type to `scripting`. - ++ +** `metadata.name`: Specifies the name as `cluster` because there should be only one `NodeObservability` CR per cluster. +** `spec.nodeSelector.kubernetes.io/hostname`: Specifies the nodes on which the Node Observability agent must be deployed. +** `spec.type`: Specifies the type to `scripting` to deploy the agent in scripting mode. . Create the `NodeObservability` CR by running the following command: + @@ -53,7 +55,6 @@ $ oc project node-observability-operator ---- $ oc apply -f nodeobservability.yaml ---- - + .Example output [source,terminal] @@ -67,7 +68,6 @@ nodeobservability.olm.openshift.io/cluster created ---- $ oc get nob/cluster -o yaml | yq '.status.conditions' ---- - + .Example output [source,terminal] @@ -81,6 +81,5 @@ conditions: status: "True" type: Ready ---- - + The `NodeObservability` CR run is completed when the `reason` is `Ready` and `status` is `"True"`. \ No newline at end of file diff --git a/modules/node-observability-scripting.adoc b/modules/node-observability-scripting.adoc index f667147e5371..75d189af3a26 100644 --- a/modules/node-observability-scripting.adoc +++ b/modules/node-observability-scripting.adoc @@ -6,6 +6,9 @@ [id="node-observability-scripting_{context}"] = Configuring Node Observability Operator scripting +[role="_abstract"] +To execute embedded scripts for network analysis, configure Node Observability Operator scripting. By setting up these custom scripts, you can debug performance-related issues on your compute nodes. + .Prerequisites * You have installed the Node Observability Operator. @@ -27,6 +30,7 @@ spec: nodeObservabilityRef: name: cluster type: scripting +# ... ---- + [IMPORTANT] @@ -50,7 +54,6 @@ $ oc apply -f nodeobservabilityrun-script.yaml ---- $ oc get nodeobservabilityrun nodeobservabilityrun-script -o yaml | yq '.status.conditions' ---- - + .Example output [source,terminal] @@ -78,7 +81,6 @@ Status: Finished Timestamp: 2023-12-19T15:11:01Z Start Timestamp: 2023-12-19T15:10:51Z ---- - + The scripting is complete once `Status` is `True` and `Type` is `Finished`. diff --git a/modules/node-tuning-operator.adoc b/modules/node-tuning-operator.adoc index b1c6ba71c25a..1f46abfde637 100644 --- a/modules/node-tuning-operator.adoc +++ b/modules/node-tuning-operator.adoc @@ -19,19 +19,18 @@ endif::[] [id="about-node-tuning-operator_{context}"] ifdef::operators[] = Node Tuning Operator - endif::operators[] ifdef::perf[] = About the Node Tuning Operator - endif::perf[] ifdef::cluster-caps[= Node Tuning capability] - +[role="_abstract"] ifdef::cluster-caps[] The Node Tuning Operator provides features for the `NodeTuning` capability. endif::cluster-caps[] +By using the Node Tuning Operator, you can manage node-level tuning by orchestrating the TuneD daemon. This configuration achieves low latency performance by using the Performance Profile controller. -The Node Tuning Operator helps you manage node-level tuning by orchestrating the TuneD daemon and achieves low latency performance by using the Performance Profile controller. The majority of high-performance applications require some level of kernel tuning. The Node Tuning Operator provides a unified management interface to users of node-level sysctls and more flexibility to add custom tuning specified by user needs. +The majority of high-performance applications require some level of kernel tuning. The Node Tuning Operator provides a unified management interface to users of node-level sysctls and more flexibility to add custom tuning specified by user needs. ifdef::cluster-caps[] If you disable the NodeTuning capability, some default tuning settings will not be applied to the control-plane nodes. This might limit the scalability and performance of large clusters with over 900 nodes or 900 routes. @@ -57,10 +56,3 @@ The Node Tuning Operator is part of a standard {product-title} installation in v In earlier versions of {product-title}, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In {product-title} 4.11 and later, this functionality is part of the Node Tuning Operator. ==== endif::cluster-caps[] - -ifdef::operators[] - -== Project - -link:https://github.com/openshift/cluster-node-tuning-operator[cluster-node-tuning-operator] -endif::operators[] diff --git a/modules/pod-interactions-with-topology-manager.adoc b/modules/pod-interactions-with-topology-manager.adoc index a2796cf069e4..f8c18a59bf4c 100644 --- a/modules/pod-interactions-with-topology-manager.adoc +++ b/modules/pod-interactions-with-topology-manager.adoc @@ -6,9 +6,10 @@ [id="pod-interactions-with-topology-manager_{context}"] = Pod interactions with Topology Manager policies -The example `Pod` specs illustrate pod interactions with Topology Manager. +[role="_abstract"] +To understand how the Topology Manager allocates hardware resources, review the example Pod specifications. By analyzing these interactions, you can properly configure your workloads for optimal alignment and performance. -The following pod runs in the `BestEffort` QoS class because no resource requests or limits are specified. +The following pod configuration example runs in the `BestEffort` QoS class because no resource requests or limits are specified: [source,yaml] ---- @@ -18,7 +19,7 @@ spec: image: nginx ---- -The next pod runs in the `Burstable` QoS class because requests are less than limits. +The pod configuration example pod runs in the `Burstable` QoS class because requests are less than limits: [source,yaml] ---- @@ -34,10 +35,11 @@ spec: ---- If the selected policy is anything other than `none`, Topology Manager would process all the pods and it enforces resource alignment only for the `Guaranteed` Qos `Pod` specification. + When the Topology Manager policy is set to `none`, the relevant containers are pinned to any available CPU without considering NUMA affinity. This is the default behavior and it does not optimize for performance-sensitive workloads. Other values enable the use of topology awareness information from device plugins core resources, such as CPU and memory. The Topology Manager attempts to align the CPU, memory, and device allocations according to the topology of the node when the policy is set to other values than `none`. For more information about the available values, see _Topology Manager policies_. -The following example pod runs in the `Guaranteed` QoS class because requests are equal to limits. +The following example pod configuration runs in the `Guaranteed` QoS class because requests are equal to limits: [source,yaml] ---- diff --git a/modules/setting-up-cpu-manager.adoc b/modules/setting-up-cpu-manager.adoc index 6acd47f53b99..2eab6ef6afdf 100644 --- a/modules/setting-up-cpu-manager.adoc +++ b/modules/setting-up-cpu-manager.adoc @@ -7,7 +7,8 @@ [id="setting_up_cpu_manager_{context}"] = Setting up CPU Manager -To configure CPU manager, create a KubeletConfig custom resource (CR) and apply it to the desired set of nodes. +[role="_abstract"] +To allocate dedicated processing resources to your workloads, configure the CPU Manager. Applying a KubeletConfig custom resource (CR) to your nodes ensures optimal performance for latency-sensitive applications. .Procedure @@ -49,13 +50,12 @@ spec: matchLabels: custom-kubelet: cpumanager-enabled kubeletConfig: - cpuManagerPolicy: static <1> - cpuManagerReconcilePeriod: 5s <2> + cpuManagerPolicy: static + cpuManagerReconcilePeriod: 5s ---- -<1> Specify a policy: -* `none`. This policy explicitly enables the existing default CPU affinity scheme, providing no affinity beyond what the scheduler does automatically. This is the default policy. -* `static`. This policy allows containers in guaranteed pods with integer CPU requests. It also limits access to exclusive CPUs on the node. If `static`, you must use a lowercase `s`. -<2> Optional. Specify the CPU Manager reconcile frequency. The default is `5s`. ++ +** `spec.kubeletConfig.cpuManagerPolicy`: Specifyies a policy. A value of `none`, explicitly enables the existing default CPU affinity scheme, providing no affinity beyond what the scheduler does automatically. This is the default policy. A value of `static`, the policy allows containers in guaranteed pods with integer CPU requests. It also limits access to exclusive CPUs on the node. If `static`, you must use a lowercase `s`. +<** `spec.kubeletConfig.cpuManagerReconcilePeriod`: Specifies the CPU Manager reconcile frequency. The default is `5s`. Optional parameter . Create the dynamic kubelet config by running the following command: + @@ -97,11 +97,12 @@ sh-4.2# cat /host/etc/kubernetes/kubelet.conf | grep cpuManager .Example output [source,terminal] ---- -cpuManagerPolicy: static <1> -cpuManagerReconcilePeriod: 5s <2> +cpuManagerPolicy: static +cpuManagerReconcilePeriod: 5s ---- -<1> `cpuManagerPolicy` is defined when you create the `KubeletConfig` CR. -<2> `cpuManagerReconcilePeriod` is defined when you create the `KubeletConfig` CR. ++ +** `cpuManagerPolicy`: Define after you create the `KubeletConfig` CR. +** `cpuManagerReconcilePeriod`: Define after you create the `KubeletConfig` CR. . Create a project by running the following command: + diff --git a/modules/setting-up-topology-manager.adoc b/modules/setting-up-topology-manager.adoc index 601de8881336..3c7304f7a6e3 100644 --- a/modules/setting-up-topology-manager.adoc +++ b/modules/setting-up-topology-manager.adoc @@ -7,6 +7,7 @@ [id="setting_up_topology_manager_{context}"] = Setting up Topology Manager +[role="_abstract"] To use Topology Manager, you must configure an allocation policy in the `KubeletConfig` custom resource (CR) named `cpumanager-enabled`. This file might exist if you have set up CPU Manager. If the file does not exist, you can create the file. .Prerequisites @@ -15,9 +16,7 @@ To use Topology Manager, you must configure an allocation policy in the `Kubelet .Procedure -To activate Topology Manager: - -. Configure the Topology Manager allocation policy in the custom resource. +* To activate Topology Manager, configure the Topology Manager allocation policy in the custom resource: + [source,terminal] ---- @@ -35,10 +34,11 @@ spec: matchLabels: custom-kubelet: cpumanager-enabled kubeletConfig: - cpuManagerPolicy: static <1> + cpuManagerPolicy: static cpuManagerReconcilePeriod: 5s - topologyManagerPolicy: single-numa-node <2> + topologyManagerPolicy: single-numa-node +# ... ---- -<1> This parameter must be `static` with a lowercase `s`. -<2> Specify your selected Topology Manager allocation policy. Here, the policy is `single-numa-node`. -Acceptable values are: `default`, `best-effort`, `restricted`, `single-numa-node`. ++ +** `spec.kubeletConfig.cpuManagerPolicy`: You must set `static` with a lowercase `s`. +** `spec.kubeletConfig.topologyManagerPolicy`: Specifies your selected Topology Manager allocation policy. The example lists the policy as `single-numa-node`. Acceptable values are: `default`, `best-effort`, `restricted`, `single-numa-node`. diff --git a/modules/topology-manager-policies.adoc b/modules/topology-manager-policies.adoc index 3e021c40194b..10ef7a551192 100644 --- a/modules/topology-manager-policies.adoc +++ b/modules/topology-manager-policies.adoc @@ -7,7 +7,10 @@ [id="topology-manager-policies_{context}"] = Topology Manager policies -Topology Manager aligns `Pod` resources of all Quality of Service (QoS) classes by collecting topology hints from Hint Providers, such as CPU Manager and Device Manager, and using the collected hints to align the `Pod` resources. +[role="_abstract"] +To align Pod resources across all Quality of Service (QoS) classes, configure the Topology Manager policies. Assigning an allocation policy in the KubeletConfig custom resource (CR) ensures that your workloads receive optimally coordinated hardware resources. + +Topology Manager can collect topology hints from Hint Providers, such as CPU Manager and Device Manager, and using the collected hints to align the `Pod` resources. Topology Manager supports four allocation policies, which you assign in the `KubeletConfig` custom resource (CR) named `cpumanager-enabled`: diff --git a/scalability_and_performance/node-observability-operator.adoc b/scalability_and_performance/node-observability-operator.adoc index 6c22c58047af..6911c7323b4e 100644 --- a/scalability_and_performance/node-observability-operator.adoc +++ b/scalability_and_performance/node-observability-operator.adoc @@ -6,10 +6,10 @@ include::_attributes/common-attributes.adoc[] toc::[] -The Node Observability Operator collects and stores CRI-O and Kubelet profiling or metrics from scripts of compute nodes. - -With the Node Observability Operator, you can query the profiling data, enabling analysis of performance trends in CRI-O and Kubelet. It supports debugging performance-related issues and executing embedded scripts for network metrics by using the `run` field in the custom resource definition. To enable CRI-O and Kubelet profiling or scripting, you can configure the `type` field in the custom resource definition. +[role="_abstract"] +To analyze performance trends and debug issues on your compute nodes, use the Node Observability Operator to collect and query CRI-O and Kubelet metrics. By reviewing this profiling data, you can optimize system performance and execute embedded scripts for network analysis. +With the Node Observability Operator, you can query the profiling data, enabling analysis of performance trends in CRI-O and Kubelet. The Operator supports debugging performance-related issues and executing embedded scripts for network metrics by using the `run` field in the custom resource definition. To enable CRI-O and Kubelet profiling or scripting, you can configure the `type` field in the custom resource definition. :FeatureName: The Node Observability Operator include::snippets/technology-preview.adoc[leveloffset=+0] @@ -22,7 +22,6 @@ include::modules/node-observability-install-cli.adoc[leveloffset=+2] include::modules/node-observability-install-web-console.adoc[leveloffset=+2] - [id="requesting-crio-kubelet-profiling-using-noo_{context}"] == Requesting CRI-O and Kubelet profiling data using the Node Observability Operator @@ -32,13 +31,12 @@ include::modules/node-observability-create-custom-resource.adoc[leveloffset=+2] include::modules/node-observability-run-profiling-query.adoc[leveloffset=+2] - [id="node-observability-operator-scripting_{context}"] == Node Observability Operator scripting -Scripting allows you to run pre-configured bash scripts, using the current Node Observability Operator and Node Observability Agent. +By scripting, you can run pre-configured bash scripts by using the current Node Observability Operator and Node Observability Agent. -These scripts monitor key metrics like CPU load, memory pressure, and worker node issues. They also collect sar reports and custom performance metrics. +These scripts monitor key metrics like CPU load, memory pressure, and compute node issues. They also collect sar reports and custom performance metrics. include::modules/node-observability-scripting-cr.adoc[leveloffset=+2] diff --git a/scalability_and_performance/using-cpu-manager.adoc b/scalability_and_performance/using-cpu-manager.adoc index 3b85635a752e..35af40a16953 100644 --- a/scalability_and_performance/using-cpu-manager.adoc +++ b/scalability_and_performance/using-cpu-manager.adoc @@ -6,6 +6,9 @@ include::_attributes/common-attributes.adoc[] toc::[] +[role="_abstract"] +To optimize hardware resource allocation for latency-sensitive workloads, configure the CPU Manager and Topology Manager. Coordinating these components ensures that your pods receive dedicated resources aligned on the same NUMA node, which maximizes system performance. + CPU Manager manages groups of CPUs and constrains workloads to specific CPUs. CPU Manager is useful for workloads that have some of these attributes: diff --git a/scalability_and_performance/using-node-tuning-operator.adoc b/scalability_and_performance/using-node-tuning-operator.adoc index 9fa2d534b865..4f3694a64595 100644 --- a/scalability_and_performance/using-node-tuning-operator.adoc +++ b/scalability_and_performance/using-node-tuning-operator.adoc @@ -6,11 +6,16 @@ include::_attributes/common-attributes.adoc[] toc::[] -Learn about the Node Tuning Operator and how you can use it to manage node-level -tuning by orchestrating the tuned daemon. +[role="_abstract"] +To manage node-level tuning by orchestrating the tuned daemon, use the Node Tuning Operator. Applying these configurations ensures that your nodes are automatically optimized for specific high-performance workloads. include::modules/node-tuning-operator.adoc[leveloffset=+1] +[role="_additional-resources"] +.Additional resources + +* link:https://github.com/openshift/cluster-node-tuning-operator[cluster-node-tuning-operator (GitHub)] + include::modules/accessing-an-example-cluster-node-tuning-operator-specification.adoc[leveloffset=+1] include::modules/cluster-node-tuning-operator-default-profiles-set.adoc[leveloffset=+1]