diff --git a/content/en/docs/concepts/staged-update.md b/content/en/docs/concepts/staged-update.md index 8ba1e8a1..205f9b72 100644 --- a/content/en/docs/concepts/staged-update.md +++ b/content/en/docs/concepts/staged-update.md @@ -123,6 +123,7 @@ spec: labelSelector: matchLabels: environment: staging + maxConcurrency: 2 # Update 2 clusters concurrently afterStageTasks: - type: TimedWait waitTime: 1h @@ -130,6 +131,9 @@ spec: labelSelector: matchLabels: environment: canary + maxConcurrency: 1 # Sequential updates (default) + beforeStageTasks: + - type: Approval # Require approval before starting canary stage afterStageTasks: - type: Approval - name: production @@ -137,6 +141,7 @@ spec: matchLabels: environment: production sortingLabelKey: order + maxConcurrency: 50% # Update 50% of production clusters at once afterStageTasks: - type: Approval - type: TimedWait @@ -155,18 +160,20 @@ metadata: namespace: my-app-namespace spec: stages: - - name: dev-clusters + - name: dev labelSelector: matchLabels: environment: development + maxConcurrency: 3 # Update 3 dev clusters at once afterStageTasks: - type: TimedWait waitTime: 30m - - name: production-clusters + - name: prod labelSelector: matchLabels: environment: production sortingLabelKey: deployment-order + maxConcurrency: 1 # Sequential production updates afterStageTasks: - type: Approval ``` @@ -177,15 +184,33 @@ Each stage includes: - **name**: Unique identifier for the stage - **labelSelector**: Selects target clusters for this stage - **sortingLabelKey** (optional): Label whose integer value determines update sequence within the stage -- **afterStageTasks** (optional): Tasks that must complete before proceeding to the next stage +- **maxConcurrency** (optional): Maximum number of clusters to update concurrently. Can be an absolute number (e.g., `5`) or percentage (e.g., `50%`). Defaults to `1` (sequential). Fractional results are rounded down with a minimum of 1 +- **beforeStageTasks** (optional): Tasks that must complete before starting the stage (max 1 task, Approval type only) +- **afterStageTasks** (optional): Tasks that must complete before proceeding to the next stage (max 2 tasks) -### After-Stage Tasks +### Stage Tasks -Two task types to control stage progression: -- **TimedWait**: Waits for a specified duration before proceeding -- **Approval**: Requires manual approval via an approval request object +Stage tasks provide control gates at different points in the rollout lifecycle: -For approval tasks, the system automatically creates an approval request object named `-`. The approval request type depends on the scope: +#### Before-Stage Tasks + +Execute before a stage begins. Only one task allowed per stage: +- **Approval**: Requires manual approval before starting the stage +- **TimedWait**: Not supported for before-stage tasks + +For before-stage approval tasks, the system creates an approval request named `-before-`. + +#### After-Stage Tasks + +Execute after all clusters in a stage complete. Up to two tasks allowed (one of each type): +- **TimedWait**: Waits for a specified duration before proceeding to the next stage +- **Approval**: Requires manual approval before proceeding to the next stage + +For after-stage approval tasks, the system creates an approval request named `-after-`. + +#### Approval Request Details + +For all approval tasks, the approval request type depends on the scope: - **Cluster-scoped**: Creates `ClusterApprovalRequest` (short name: `careq`) - a cluster-scoped resource containing a spec with `parentStageRollout` (the UpdateRun name) and `targetStage` (the stage name). The spec is immutable after creation. - **Namespace-scoped**: Creates `ApprovalRequest` (short name: `areq`) within the same namespace - a namespace-scoped resource with the same spec structure as `ClusterApprovalRequest`. @@ -196,21 +221,28 @@ Both approval request types use status conditions to track approval state: Approve manually by setting the `Approved` condition to `True` using kubectl patch: +> Note: Observed generation in the Approved condition should match the generation of the updateRun object. + ```bash -# For cluster-scoped approvals -kubectl patch clusterapprovalrequests example-run-canary --type='merge' \ - -p '{"status":{"conditions":[{"type":"Approved","status":"True","reason":"approved","message":"approved"}]}}' \ +# For cluster-scoped before-stage approvals +kubectl patch clusterapprovalrequests example-run-before-canary --type='merge' \ + -p '{"status":{"conditions":[{"type":"Approved","status":"True","reason":"approved","message":"approved","lastTransitionTime":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'","observedGeneration":1}]}}' \ + --subresource=status + +# For cluster-scoped after-stage approvals +kubectl patch clusterapprovalrequests example-run-after-canary --type='merge' \ + -p '{"status":{"conditions":[{"type":"Approved","status":"True","reason":"approved","message":"approved","lastTransitionTime":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'","observedGeneration":1}]}}' \ --subresource=status # For namespace-scoped approvals -kubectl patch approvalrequests app-run-canary -n my-app-namespace --type='merge' \ - -p '{"status":{"conditions":[{"type":"Approved","status":"True","reason":"approved","message":"approved"}]}}' \ +kubectl patch approvalrequests example-run-before-canary -n test-namespace --type='merge' \ + -p '{"status":{"conditions":[{"type":"Approved","status":"True","reason":"approved","message":"approved","lastTransitionTime":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'","observedGeneration":1}]}}' \ --subresource=status ``` ## Trigger Staged Rollouts -UpdateRun resources execute strategies for specific rollouts. Both scopes follow the same pattern with three required parameters: +UpdateRun resources execute strategies for specific rollouts. Both scopes follow the same pattern: **Cluster-scoped example:** ```yaml @@ -219,9 +251,10 @@ kind: ClusterStagedUpdateRun metadata: name: example-run spec: - placementName: example-placement # Target ClusterResourcePlacement - resourceSnapshotIndex: "0" # Resource version to deploy - stagedRolloutStrategyName: example-strategy # Strategy to execute + placementName: example-placement # Required: Target ClusterResourcePlacement + resourceSnapshotIndex: "0" # Optional: Resource version (omit for latest) + stagedRolloutStrategyName: example-strategy # Required: Strategy to execute + state: Run # Optional: Initialize (default), Run, or Stop ``` **Namespace-scoped example:** @@ -232,34 +265,92 @@ metadata: name: app-rollout-v1-2-3 namespace: my-app-namespace spec: - placementName: example-namespace-placement # Target ResourcePlacement - resourceSnapshotIndex: "5" # Resource version to deploy - stagedRolloutStrategyName: app-rollout-strategy # Strategy to execute + placementName: example-namespace-placement # Required: Target ResourcePlacement + resourceSnapshotIndex: "5" # Optional: Resource version (omit for latest) + stagedRolloutStrategyName: app-rollout-strategy # Required: Strategy to execute + state: Initialize # Optional: Initialize (default), Run, or Stop +``` + +**Using Latest Resource Snapshot:** +```yaml +apiVersion: placement.kubernetes-fleet.io/v1beta1 +kind: ClusterStagedUpdateRun +metadata: + name: example-run-latest +spec: + placementName: example-placement + # resourceSnapshotIndex omitted - system uses latest snapshot automatically + stagedRolloutStrategyName: example-strategy + state: Run +``` + +### UpdateRun State Management + +UpdateRuns support three states to control execution lifecycle: + +| State | Behavior | Use Case | +|-------|----------|----------| +| **Initialize** | Prepares the updateRun without executing (default) | Review computed stages before starting rollout | +| **Run** | Executes the rollout or resumes from stopped state | Start or resume the staged rollout | +| **Stop** | Pauses execution at current cluster/stage | Temporarily halt rollout for investigation | + +**Valid State Transitions:** +- `Initialize` → `Run`: Start the rollout +- `Run` → `Stop`: Pause the rollout +- `Stop` → `Run`: Resume the rollout + +**Invalid State Transitions:** +- `Initialize` → `Stop`: Cannot stop before starting +- `Run` → `Initialize`: Cannot reinitialize after starting +- `Stop` → `Initialize`: Cannot reinitialize after stopping + +The `state` field is the **only mutable field** in the UpdateRunSpec. You can update it to control rollout execution: + +```bash +# Start a rollout +kubectl patch csur example-run --type='merge' -p '{"spec":{"state":"Run"}}' + +# Pause a rollout +kubectl patch csur example-run --type='merge' -p '{"spec":{"state":"Stop"}}' + +# Resume a paused rollout +kubectl patch csur example-run --type='merge' -p '{"spec":{"state":"Run"}}' ``` ### UpdateRun Execution UpdateRuns execute in two phases: -1. **Initialization**: Captures strategy snapshot, collects target bindings, generates cluster update sequence -2. **Execution**: Processes stages sequentially, updates clusters within each stage, enforces after-stage tasks +1. **Initialization**: Captures strategy snapshot, collects target bindings, generates cluster update sequence. Occurs when state is `Initialize` or `Run` +2. **Execution**: Processes stages sequentially, updates clusters within each stage (respecting maxConcurrency), enforces before-stage and after-stage tasks. Only occurs when state is `Run` + +When state is `Stop`, the updateRun pauses execution at the current cluster/stage and can be resumed by changing state back to `Run` ### Important Constraints and Validation **Immutable Fields**: Once created, the following UpdateRun spec fields cannot be modified: - `placementName`: Target placement resource name -- `resourceSnapshotIndex`: Resource version to deploy +- `resourceSnapshotIndex`: Resource version to deploy (empty string if omitted, becomes latest at initialization) - `stagedRolloutStrategyName`: Strategy to execute +**Mutable Field**: The `state` field can be modified after creation to control execution (Initialize, Run, Stop). + **Strategy Limits**: Each strategy can define a maximum of 31 stages to ensure reasonable execution times. +**MaxConcurrency Validation**: +- Must be >= 1 for absolute numbers +- Must be 1-100% for percentages +- Fractional results are rounded down with minimum of 1 + ## Monitor UpdateRun Status UpdateRun status provides detailed information about rollout progress across stages and clusters. The status includes: - **Overall conditions**: Initialization, progression, and completion status - **Stage status**: Progress and timing for each stage -- **Cluster status**: Individual cluster update results -- **After-stage task status**: Approval and wait task progress +- **Cluster status**: Individual cluster update results with maxConcurrency respected +- **Before-stage task status**: Pre-stage approval progress +- **After-stage task status**: Post-stage approval and wait task progress +- **Resource snapshot used**: The actual resource snapshot index used (from spec or latest) Use `kubectl describe` to view detailed status: ```bash diff --git a/content/en/docs/how-tos/staged-update.md b/content/en/docs/how-tos/staged-update.md index c0f23e15..b761b429 100644 --- a/content/en/docs/how-tos/staged-update.md +++ b/content/en/docs/how-tos/staged-update.md @@ -149,6 +149,7 @@ spec: labelSelector: matchLabels: environment: staging + maxConcurrency: 1 # Update clusters sequentially in staging afterStageTasks: - type: TimedWait waitTime: 1m @@ -157,8 +158,11 @@ spec: matchLabels: environment: canary sortingLabelKey: order + maxConcurrency: 2 # Update 2 canary clusters concurrently + beforeStageTasks: + - type: Approval # Require approval before starting canary afterStageTasks: - - type: Approval + - type: Approval # Require approval after canary completes EOF ``` @@ -175,14 +179,25 @@ spec: placementName: example-placement resourceSnapshotIndex: "1" stagedRolloutStrategyName: example-strategy + state: Initialize # Initialize but don't start execution yet EOF ``` +The UpdateRun starts in `Initialize` state, which computes the stages without executing. This allows you to review the computed stages before starting: +```bash +kubectl get csur example-run -o yaml # Review computed stages in status +``` + +Once satisfied with the plan, start the rollout by changing the state to `Run`: +```bash +kubectl patch csur example-run --type='merge' -p '{"spec":{"state":"Run"}}' +``` + The staged update run is initialized and running: ```bash kubectl get csur example-run -NAME PLACEMENT RESOURCE-SNAPSHOT POLICY-SNAPSHOT INITIALIZED SUCCEEDED AGE -example-run example-placement 1 0 True 44s +NAME PLACEMENT RESOURCE-SNAPSHOT-INDEX POLICY-SNAPSHOT-INDEX INITIALIZED PROGRESSING SUCCEEDED AGE +example-run example-placement 1 0 True True 62s ``` A more detailed look at the status: @@ -192,30 +207,38 @@ kind: ClusterStagedUpdateRun metadata: ... name: example-run + generation: 2 # state changed from Initialize -> Run ... spec: placementName: example-placement resourceSnapshotIndex: "1" stagedRolloutStrategyName: example-strategy + state: Run status: + appliedStrategy: + comparisonOption: PartialComparison + type: ClientSideApply + whenToApply: Always + whenToTakeOver: Always conditions: - lastTransitionTime: ... - message: ClusterStagedUpdateRun initialized successfully - observedGeneration: 1 + message: "" + observedGeneration: 2 reason: UpdateRunInitializedSuccessfully status: "True" # the updateRun is initialized successfully type: Initialized - lastTransitionTime: ... message: "" - observedGeneration: 1 - reason: UpdateRunStarted - status: "True" - type: Progressing # the updateRun is still running + observedGeneration: 2 + reason: UpdateRunWaiting + status: "False" # the updateRun is waiting + type: Progressing deletionStageStatus: clusters: [] # no clusters need to be cleaned up stageName: kubernetes-fleet.io/deleteStage policyObservedClusterCount: 3 # number of clusters to be updated policySnapshotIndexUsed: "0" + resourceSnapshotIndexUsed: "1" stagedUpdateStrategySnapshot: # snapshot of the strategy stages: - afterStageTasks: @@ -224,12 +247,16 @@ status: labelSelector: matchLabels: environment: staging + maxConcurrency: 1 name: staging - afterStageTasks: + - type: Approval + beforeStageTasks: - type: Approval labelSelector: matchLabels: environment: canary + maxConcurrency: 2 name: canary sortingLabelKey: order stagesStatus: # detailed status for each stage @@ -237,7 +264,7 @@ status: - conditions: - lastTransitionTime: ... message: "" - observedGeneration: 1 + observedGeneration: 2 reason: AfterStageTaskWaitTimeElapsed status: "True" # the wait after-stage task has completed type: WaitTimeElapsed @@ -247,26 +274,26 @@ status: conditions: - lastTransitionTime: ... message: "" - observedGeneration: 1 + observedGeneration: 2 reason: ClusterUpdatingStarted status: "True" type: Started - lastTransitionTime: ... message: "" - observedGeneration: 1 + observedGeneration: 2 reason: ClusterUpdatingSucceeded status: "True" # member2 is updated successfully type: Succeeded conditions: - lastTransitionTime: ... message: "" - observedGeneration: 1 - reason: StageUpdatingWaiting + observedGeneration: 2 + reason: StageUpdatingSucceeded status: "False" type: Progressing - lastTransitionTime: ... message: "" - observedGeneration: 1 + observedGeneration: 2 reason: StageUpdatingSucceeded status: "True" # stage staging has completed successfully type: Succeeded @@ -274,89 +301,99 @@ status: stageName: staging startTime: ... - afterStageTaskStatus: - - approvalRequestName: example-run-canary # ClusterApprovalRequest name for this stage + - approvalRequestName: example-run-after-canary type: Approval - clusters: - - clusterName: member3 # according the labelSelector and sortingLabelKey, member3 is selected first in this stage - conditions: - - lastTransitionTime: ... - message: "" - observedGeneration: 1 - reason: ClusterUpdatingStarted - status: "True" - type: Started - - lastTransitionTime: ... - message: "" - observedGeneration: 1 - reason: ClusterUpdatingSucceeded - status: "True" # member3 update is completed - type: Succeeded - - clusterName: member1 # member1 is selected after member3 because of order=2 label + beforeStageTaskStatus: + - approvalRequestName: example-run-before-canary conditions: - lastTransitionTime: ... message: "" - observedGeneration: 1 - reason: ClusterUpdatingStarted - status: "True" # member1 update has not finished yet - type: Started + observedGeneration: 2 + reason: StageTaskApprovalRequestCreated + status: "True" # before stage cluster approval task has been created + type: ApprovalRequestCreated + type: Approval + clusters: + - clusterName: member3 + - clusterName: member1 conditions: - lastTransitionTime: ... message: "" - observedGeneration: 1 - reason: StageUpdatingStarted - status: "True" # stage canary is still executing + observedGeneration: 2 + reason: StageUpdatingWaiting + status: "False" type: Progressing stageName: canary - startTime: ... ``` -Wait a little bit more, and we can see stage `canary` finishes cluster update and is waiting for the Approval task. -We can check the `ClusterApprovalRequest` generated and not approved yet: +After stage `staging` completes, the canary stage requires approval **before** it starts (due to beforeStageTasks). Check for the before-stage approval request: +```bash +kubectl get clusterapprovalrequest -A +NAME UPDATE-RUN STAGE APPROVED AGE +example-run-before-canary example-run canary 6m55s +``` + +Approve the before-stage task to allow canary stage to start: ```bash -kubectl get clusterapprovalrequest -NAME UPDATE-RUN STAGE APPROVED APPROVALACCEPTED AGE -example-run-canary example-run canary 2m2s +kubectl patch clusterapprovalrequests example-run-before-canary --type='merge' \ + -p '{"status":{"conditions":[{"type":"Approved","status":"True","reason":"approved","message":"approved","lastTransitionTime":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'","observedGeneration":2}]}}' \ + --subresource=status ``` -We can approve the `ClusterApprovalRequest` by patching its status: + +Once approved, the canary stage begins updating clusters. With `maxConcurrency: 2`, it updates up to 2 clusters concurrently. + +Wait for the canary stage to finish cluster updates. It will then wait for the after-stage Approval task: +```bash +kubectl get clusterapprovalrequest -A +NAME UPDATE-RUN STAGE APPROVED AGE +example-run-after-canary example-run canary 3s +example-run-before-canary example-run canary True 15m +``` + +> Note: Observed generation in the Approvaed condition should match the generation of the updateRun object. + +Approve the after-stage task to complete the rollout: ```bash -kubectl patch clusterapprovalrequests example-run-canary --type=merge -p {"status":{"conditions":[{"type":"Approved","status":"True","reason":"lgtm","message":"lgtm","lastTransitionTime":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","observedGeneration":1}]}} --subresource=status -clusterapprovalrequest.placement.kubernetes-fleet.io/example-run-canary patched +kubectl patch clusterapprovalrequests example-run-after-canary --type='merge' \ + -p '{"status":{"conditions":[{"type":"Approved","status":"True","reason":"approved","message":"approved","lastTransitionTime":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'","observedGeneration":2}]}}' \ + --subresource=status ``` -This can be done equivalently by creating a json patch file and applying it: +Alternatively, you can approve using a json patch file: ```bash cat << EOF > approval.json "status": { "conditions": [ { "lastTransitionTime": "$(date -u +%Y-%m-%dT%H:%M:%SZ)", - "message": "lgtm", - "observedGeneration": 1, - "reason": "lgtm", + "message": "approved", + "observedGeneration": 2, + "reason": "approved", "status": "True", "type": "Approved" } ] } EOF -kubectl patch clusterapprovalrequests example-run-canary --type='merge' --subresource=status --patch-file approval.json +kubectl patch clusterapprovalrequests example-run-canary-after --type='merge' --subresource=status --patch-file approval.json ``` -Then verify it's approved: +Verify both approvals are accepted: ```bash -kubectl get clusterapprovalrequest -NAME UPDATE-RUN STAGE APPROVED APPROVALACCEPTED AGE -example-run-canary example-run canary True True 2m30s +kubectl get clusterapprovalrequest -A +NAME UPDATE-RUN STAGE APPROVED AGE +example-run-after-canary example-run canary True 2m12s +example-run-before-canary example-run canary True 17m ``` The updateRun now is able to proceed and complete: ```bash kubectl get csur example-run -NAME PLACEMENT RESOURCE-SNAPSHOT POLICY-SNAPSHOT INITIALIZED SUCCEEDED AGE -example-run example-placement 1 0 True True 4m22s +NAME PLACEMENT RESOURCE-SNAPSHOT-INDEX POLICY-SNAPSHOT-INDEX INITIALIZED PROGRESSING SUCCEEDED AGE +example-run example-placement 1 0 True False True 20m ``` The CRP also shows rollout has completed and resources are available on all member clusters: ```bash kubectl get crp example-placement NAME GEN SCHEDULED SCHEDULED-GEN AVAILABLE AVAILABLE-GEN AGE -example-placement 1 True 1 True 1 134m +example-placement 1 True 1 True 1 36m ``` The configmap `test-cm` should be deployed on all 3 member clusters, with latest data: ```yaml @@ -364,6 +401,53 @@ data: key: value2 ``` +### Using Latest Snapshot Automatically + +Instead of specifying a resource snapshot index, you can omit `resourceSnapshotIndex` to automatically use the latest snapshot. This is useful for continuous delivery workflows: + +```bash +kubectl apply -f - << EOF +apiVersion: placement.kubernetes-fleet.io/v1beta1 +kind: ClusterStagedUpdateRun +metadata: + name: example-run-latest +spec: + placementName: example-placement + # resourceSnapshotIndex omitted - uses latest automatically + stagedRolloutStrategyName: example-strategy + state: Run # Start immediately +EOF +``` + +The system will determine the latest snapshot at initialization time. Check which snapshot was used: +```bash +kubectl get csur example-run-latest -o jsonpath='{.status.resourceSnapshotIndexUsed}' +``` + +### Pausing and Resuming a Rollout + +You can pause an in-progress rollout to investigate issues or wait for off-peak hours: + +```bash +# Pause the rollout +kubectl patch csur example-run --type='merge' -p '{"spec":{"state":"Stop"}}' +``` + +Verify the rollout is stopped: +```bash +kubectl get csur example-run +NAME PLACEMENT RESOURCE-SNAPSHOT POLICY-SNAPSHOT INITIALIZED PROGRESSING SUCCEEDED AGE +example-run example-placement 1 0 True False 8m +``` + +The rollout pauses at its current position (current cluster/stage). Resume when ready: +```bash +# Resume the rollout +kubectl patch csur example-run --type='merge' -p '{"spec":{"state":"Run"}}' +``` + +The rollout continues from where it was paused. + ### Deploy a second ClusterStagedUpdateRun to rollback to a previous version Now suppose the workload admin wants to rollback the configmap change, reverting the value `value2` back to `value1`. @@ -376,8 +460,9 @@ metadata: name: example-run-2 spec: placementName: example-placement - resourceSnapshotIndex: "0" + resourceSnapshotIndex: "0" # Rollback to previous version stagedRolloutStrategyName: example-strategy + state: Run # Start rollback immediately EOF ``` @@ -388,15 +473,22 @@ kind: ClusterStagedUpdateRun metadata: ... name: example-run-2 + generation: 1 ... spec: placementName: example-placement resourceSnapshotIndex: "0" stagedRolloutStrategyName: example-strategy + state: Run status: + appliedStrategy: + comparisonOption: PartialComparison + type: ClientSideApply + whenToApply: Always + whenToTakeOver: Always conditions: - lastTransitionTime: ... - message: ClusterStagedUpdateRun initialized successfully + message: "" observedGeneration: 1 reason: UpdateRunInitializedSuccessfully status: "True" @@ -404,8 +496,8 @@ status: - lastTransitionTime: ... message: "" observedGeneration: 1 - reason: UpdateRunStarted - status: "True" + reason: UpdateRunSucceeded + status: "False" type: Progressing - lastTransitionTime: ... message: "" @@ -419,8 +511,8 @@ status: - lastTransitionTime: ... message: "" observedGeneration: 1 - reason: StageUpdatingStarted - status: "True" + reason: StageUpdatingSucceeded + status: "False" type: Progressing - lastTransitionTime: ... message: "" @@ -433,6 +525,7 @@ status: startTime: ... policyObservedClusterCount: 3 policySnapshotIndexUsed: "0" + resourceSnapshotIndexUsed: "0" stagedUpdateStrategySnapshot: stages: - afterStageTasks: @@ -441,12 +534,16 @@ status: labelSelector: matchLabels: environment: staging + maxConcurrency: 1 name: staging - afterStageTasks: + - type: Approval + beforeStageTasks: - type: Approval labelSelector: matchLabels: environment: canary + maxConcurrency: 2 name: canary sortingLabelKey: order stagesStatus: @@ -478,7 +575,7 @@ status: - lastTransitionTime: ... message: "" observedGeneration: 1 - reason: StageUpdatingWaiting + reason: StageUpdatingSucceeded status: "False" type: Progressing - lastTransitionTime: ... @@ -491,18 +588,34 @@ status: stageName: staging startTime: ... - afterStageTaskStatus: - - approvalRequestName: example-run-2-canary + - approvalRequestName: example-run-2-after-canary + conditions: + - lastTransitionTime: ... + message: "" + observedGeneration: 1 + reason: StageTaskApprovalRequestCreated + status: "True" + type: ApprovalRequestCreated + - lastTransitionTime: ... + message: "" + observedGeneration: 1 + reason: StageTaskApprovalRequestApproved + status: "True" + type: ApprovalRequestApproved + type: Approval + beforeStageTaskStatus: + - approvalRequestName: example-run-2-before-canary conditions: - lastTransitionTime: ... message: "" observedGeneration: 1 - reason: AfterStageTaskApprovalRequestCreated + reason: StageTaskApprovalRequestCreated status: "True" type: ApprovalRequestCreated - lastTransitionTime: ... message: "" observedGeneration: 1 - reason: AfterStageTaskApprovalRequestApproved + reason: StageTaskApprovalRequestApproved status: "True" type: ApprovalRequestApproved type: Approval @@ -539,7 +652,7 @@ status: - lastTransitionTime: ... message: "" observedGeneration: 1 - reason: StageUpdatingWaiting + reason: StageUpdatingSucceeded status: "False" type: Progressing - lastTransitionTime: ... @@ -564,10 +677,39 @@ Namespace-scoped staged updates allow application teams to manage rollouts indep ### Setup for Namespace-Scoped Updates -Let's demonstrate namespace-scoped staged updates by deploying an application within a specific namespace. Create a namespace and an application rollout: +Let's demonstrate namespace-scoped staged updates by deploying an application within a specific namespace. + +Create a namespace, ```bash kubectl create ns my-app-namespace +``` + +Create a CRP that only propagates the namespace (i.e. with selectionScope set to NamespaceOnly, the namespace resource is propagated without any resources withing the namespace) to all the clusters, + +```bash +kubectl apply -f - << EOF +apiVersion: placement.kubernetes-fleet.io/v1beta1 +kind: ClusterResourcePlacement +metadata: + name: ns-only-crp +spec: + resourceSelectors: + - group: "" + kind: Namespace + name: my-app-namespace + version: v1 + selectionScope: NamespaceOnly + policy: + placementType: PickAll + strategy: + type: RollingUpdate +EOF +``` + +Create application to rollout, + +```bash kubectl create deployment web-app --image=nginx:1.20 --port=80 -n my-app-namespace kubectl expose deployment web-app --port=80 --target-port=80 -n my-app-namespace ``` @@ -602,21 +744,21 @@ EOF Check the resource snapshots for the namespace-scoped placement: ```bash kubectl get resourcesnapshots -n my-app-namespace -NAME GEN AGE LABELS -web-app-placement-0-snapshot 1 63s kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=web-app-placement,kubernetes-fleet.io/resource-index=0 +NAME GEN AGE +web-app-placement-0-snapshot 1 30s ``` Update the deployment to a new version: ```bash -kubectl set image deployment/web-app web-app=nginx:1.21 -n my-app-namespace +kubectl set image deployment/web-app nginx=nginx:1.21 -n my-app-namespace ``` Verify the new snapshot is created: ```bash kubectl get resourcesnapshots -n my-app-namespace --show-labels -NAME GEN AGE LABELS -web-app-placement-0-snapshot 1 263s kubernetes-fleet.io/is-latest-snapshot=false,kubernetes-fleet.io/parent-CRP=web-app-placement,kubernetes-fleet.io/resource-index=0 -web-app-placement-1-snapshot 1 23s kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=web-app-placement,kubernetes-fleet.io/resource-index=1 +NAME GEN AGE LABELS +web-app-placement-0-snapshot 1 5m24s kubernetes-fleet.io/is-latest-snapshot=false,kubernetes-fleet.io/parent-CRP=web-app-placement,kubernetes-fleet.io/resource-index=0 +web-app-placement-1-snapshot 1 16s kubernetes-fleet.io/is-latest-snapshot=true,kubernetes-fleet.io/parent-CRP=web-app-placement,kubernetes-fleet.io/resource-index=1 ``` ### Deploy a StagedUpdateStrategy @@ -631,20 +773,24 @@ metadata: namespace: my-app-namespace spec: stages: - - name: dev-clusters + - name: dev labelSelector: matchLabels: environment: staging + maxConcurrency: 2 # Update 2 dev clusters concurrently afterStageTasks: - type: TimedWait waitTime: 30s - - name: prod-clusters + - name: prod labelSelector: matchLabels: environment: canary sortingLabelKey: order + maxConcurrency: 1 # Sequential production updates + beforeStageTasks: + - type: Approval # Require approval before production afterStageTasks: - - type: Approval + - type: Approval # Require approval after production EOF ``` @@ -662,6 +808,7 @@ spec: placementName: web-app-placement resourceSnapshotIndex: "1" # Latest snapshot with nginx:1.21 stagedRolloutStrategyName: app-rollout-strategy + state: Run # Start rollout immediately EOF ``` @@ -669,18 +816,32 @@ EOF Check the status of the staged update run: ```bash -kubectl describe sur web-app-rollout-v1-21 -n my-app-namespace +kubectl get sur web-app-rollout-v1-21 -n my-app-namespace ``` -Wait for the first stage to complete, then check for approval requests: +Wait for the first stage to complete. The prod before stage requires approval before starting: ```bash kubectl get approvalrequests -n my-app-namespace +NAME UPDATE-RUN STAGE APPROVED AGE +web-app-rollout-v1-21-before-prod web-app-rollout-v1-21 prod 2s +``` + +Approve the before-stage task to start production rollout: +```bash +kubectl patch approvalrequests web-app-rollout-v1-21-before-prod -n my-app-namespace --type='merge' \ + -p '{"status":{"conditions":[{"type":"Approved","status":"True","reason":"approved","message":"approved","lastTransitionTime":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'","observedGeneration":1}]}}' \ + --subresource=status ``` -Approve the staging gate to proceed to production clusters: +After production clusters complete updates, approve the after-stage task: ```bash -kubectl patch approvalrequests web-app-rollout-v1-21-prod-clusters -n my-app-namespace --type='merge' \ - -p '{"status":{"conditions":[{"type":"Approved","status":"True","reason":"approved","message":"approved"}]}}' \ +kubectl get approvalrequests -n my-app-namespace +NAME UPDATE-RUN STAGE APPROVED AGE +web-app-rollout-v1-21-after-prod web-app-rollout-v1-21 prod 18s +web-app-rollout-v1-21-before-prod web-app-rollout-v1-21 prod True 2m22s + +kubectl patch approvalrequests web-app-rollout-v1-21-after-prod -n my-app-namespace --type='merge' \ + -p '{"status":{"conditions":[{"type":"Approved","status":"True","reason":"approved","message":"approved","lastTransitionTime":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'","observedGeneration":1}]}}' \ --subresource=status ``` @@ -704,11 +865,33 @@ spec: placementName: web-app-placement resourceSnapshotIndex: "0" # Previous snapshot with nginx:1.20 stagedRolloutStrategyName: app-rollout-strategy + state: Run # Start rollback immediately EOF ``` Follow the same monitoring and approval process as above to complete the rollback. +## Best Practices and Tips + +### MaxConcurrency Guidelines + +- **Development/Staging**: Use higher values (e.g., `maxConcurrency: 3` or `50%`) to speed up rollouts +- **Production**: Use `maxConcurrency: 1` for sequential updates to minimize risk and allow early detection of issues +- **Large fleets**: Use percentages (e.g., `10%`, `25%`) to scale with cluster growth automatically +- **Small fleets**: Use absolute numbers for predictable behavior + +### State Management + +- **Initialize state**: Use to review computed stages before execution. Useful for validating strategy configuration +- **Run state**: Start execution or resume from stopped state +- **Stop state**: Pause rollout to investigate issues, wait for maintenance windows, or coordinate with other activities + +### Approval Strategies + +- **Before-stage approvals**: Use when stage selection requires validation (e.g., ensure all production prerequisites are met) +- **After-stage approvals**: Use to validate rollout success before proceeding (e.g., check metrics, run tests) +- **Both**: Combine for critical stages requiring validation at both entry and exit points + ## Key Differences Summary | Aspect | Cluster-Scoped | Namespace-Scoped |