Skip to content

Commit 6d23574

Browse files
jhlodinmikeCRLNishanthNalluri
authored
Add docs for decommissioning nodes with the operator (#20520)
* Add docs for decommissioning nodes with the operator * Peach comments * Address comments * Apply suggestions from code review Co-authored-by: NishanthNalluri <nishanth.nalluri@cockroachlabs.com> --------- Co-authored-by: Mike Lewis <76072290+mikeCRL@users.noreply.github.com> Co-authored-by: NishanthNalluri <nishanth.nalluri@cockroachlabs.com>
1 parent 968b6f0 commit 6d23574

File tree

3 files changed

+147
-0
lines changed

3 files changed

+147
-0
lines changed

src/current/v25.2/scale-cockroachdb-operator.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,52 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C
104104
~~~ shell
105105
kubectl get pods
106106
~~~
107+
108+
## Decommission nodes
109+
110+
When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the CockroachDB nodes scheduled on this Kubernetes node. Decommissioning safely moves data and workloads away before the node goes offline.
111+
112+
{{site.data.alerts.callout_info}}
113+
Once annotated, the Kubernetes node is cordoned so no further pods are scheduled on the node and the decommissioning process for the CockroachDB pods scheduled on this Kubernetes node begins immediately.
114+
115+
If cluster resources are constrained, replacement pods may remain in the Pending state until the Kubernetes scheduler identifies suitable nodes.
116+
{{site.data.alerts.end}}
117+
118+
The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node:
119+
120+
- The `--enable-k8s-node-/controller=true` flag must be enabled in the operator's `.yaml` values file, for example:
121+
{% include_cached copy-clipboard.html %}
122+
~~~ yaml
123+
containers:
124+
- name: cockroach-operator
125+
image: {{ .Values.image.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag }}
126+
args:
127+
- "-enable-k8s-node-controller=true"
128+
~~~
129+
- At least one replica of the operator must not be on the target node.
130+
- There must be no under-replicated ranges on the CockroachDB cluster.
131+
132+
To mark a node for decommissioning, follow these steps:
133+
134+
1. Identify the name of the Kubernetes node that is to be removed.
135+
136+
1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl`, for example:
137+
138+
{% include_cached copy-clipboard.html %}
139+
~~~ shell
140+
kubectl annotate node {example-node-name} crdb.cockroachlabs.com/decommission="true"
141+
~~~
142+
143+
1. Monitor the cluster:
144+
- Confirm the decommissioned node's cordoned status:
145+
{% include_cached copy-clipboard.html %}
146+
~~~ shell
147+
kubectl describe node {example-node-name}
148+
~~~
149+
- Monitor operator events and logs for decommission start and completion messages:
150+
{% include_cached copy-clipboard.html %}
151+
~~~ shell
152+
kubectl logs pod {operator-pod-name}
153+
~~~
154+
155+
If the replacement pods remain in a `Pending` state, this typically means there is not enough available capacity in the cluster for these pods to be scheduled.

src/current/v25.3/scale-cockroachdb-operator.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,52 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C
104104
~~~ shell
105105
kubectl get pods
106106
~~~
107+
108+
## Decommission nodes
109+
110+
When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the CockroachDB nodes scheduled on this Kubernetes node. Decommissioning safely moves data and workloads away before the node goes offline.
111+
112+
{{site.data.alerts.callout_info}}
113+
Once annotated, the Kubernetes node is cordoned so no further pods are scheduled on the node. The annotation is not a mark for future removal, as CockroachDB is decommissioned on the node immediately.
114+
115+
If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available.
116+
{{site.data.alerts.end}}
117+
118+
The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node:
119+
120+
- The `--enable-k8s-node-/controller=true` flag must be enabled in the operator's `.yaml` values file, for example:
121+
{% include_cached copy-clipboard.html %}
122+
~~~ yaml
123+
containers:
124+
- name: cockroach-operator
125+
image: {{ .Values.image.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag }}
126+
args:
127+
- "-enable-k8s-node-controller=true"
128+
~~~
129+
- At least one replica of the operator must not be on the target node.
130+
- There must be no under-replicated ranges on the CockroachDB cluster.
131+
132+
To mark a node for decommissioning, follow these steps:
133+
134+
1. Identify the name of the Kubernetes node that is to be removed.
135+
136+
1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl`, for example:
137+
138+
{% include_cached copy-clipboard.html %}
139+
~~~ shell
140+
kubectl annotate node {example-node-name} crdb.cockroachlabs.com/decommission="true"
141+
~~~
142+
143+
1. Monitor the cluster:
144+
- Confirm the decommissioned node's cordoned status:
145+
{% include_cached copy-clipboard.html %}
146+
~~~ shell
147+
kubectl describe node {example-node-name}
148+
~~~
149+
- Monitor operator events and logs for decommission start and completion messages:
150+
{% include_cached copy-clipboard.html %}
151+
~~~ shell
152+
kubectl logs pod {operator-pod-name}
153+
~~~
154+
155+
If the replacement pods remain in a `Pending` state, this typically means there is not enough available capacity in the cluster for these pods to be scheduled.

src/current/v25.4/scale-cockroachdb-operator.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,52 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C
104104
~~~ shell
105105
kubectl get pods
106106
~~~
107+
108+
## Decommission nodes
109+
110+
When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the CockroachDB nodes scheduled on this Kubernetes node. Decommissioning safely moves data and workloads away before the node goes offline.
111+
112+
{{site.data.alerts.callout_info}}
113+
Once annotated, the Kubernetes node is cordoned so no further pods are scheduled on the node. The annotation is not a mark for future removal, as CockroachDB is decommissioned on the node immediately.
114+
115+
If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available.
116+
{{site.data.alerts.end}}
117+
118+
The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node:
119+
120+
- The `--enable-k8s-node-/controller=true` flag must be enabled in the operator's `.yaml` values file, for example:
121+
{% include_cached copy-clipboard.html %}
122+
~~~ yaml
123+
containers:
124+
- name: cockroach-operator
125+
image: {{ .Values.image.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag }}
126+
args:
127+
- "-enable-k8s-node-controller=true"
128+
~~~
129+
- At least one replica of the operator must not be on the target node.
130+
- There must be no under-replicated ranges on the CockroachDB cluster.
131+
132+
To mark a node for decommissioning, follow these steps:
133+
134+
1. Identify the name of the Kubernetes node that is to be removed.
135+
136+
1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl`, for example:
137+
138+
{% include_cached copy-clipboard.html %}
139+
~~~ shell
140+
kubectl annotate node {example-node-name} crdb.cockroachlabs.com/decommission="true"
141+
~~~
142+
143+
1. Monitor the cluster:
144+
- Confirm the decommissioned node's cordoned status:
145+
{% include_cached copy-clipboard.html %}
146+
~~~ shell
147+
kubectl describe node {example-node-name}
148+
~~~
149+
- Monitor operator events and logs for decommission start and completion messages:
150+
{% include_cached copy-clipboard.html %}
151+
~~~ shell
152+
kubectl logs pod {operator-pod-name}
153+
~~~
154+
155+
If the replacement pods remain in a `Pending` state, this typically means there is not enough available capacity in the cluster for these pods to be scheduled.

0 commit comments

Comments
 (0)