Version 1.6.3 (2026-04-07)
Automated, idempotent script for switching over Red Hat Advanced Cluster Management (ACM) from a primary hub to a secondary hub cluster.
- ✅ Idempotent execution - Resume from last successful step
- ✅ Comprehensive validation - Pre-flight and post-flight checks for safety
- ✅ RBAC enforcement - Least privilege access control with validation
- ✅ ArgoCD support (production-ready) - Pause/resume ACM-touching Applications with automatic CRD detection
- ✅ Data protection - Verifies
preserveOnDeleteon ClusterDeployments - ✅ Auto-detection - Automatically detects ACM Observability and version
- ✅ Dry-run mode - Preview actions without making changes
- ✅ Validate-only mode - Run all validations without execution
- ✅ State tracking - JSON state file for resume capability
- ✅ Two methods supported - Continuous passive restore (Method 1) or one-time full restore (Method 2)
- ✅ Multi-deployment support - RBAC via Kustomize, Helm, or ACM Policies
ArgoCD integration is fully available and stable in the switchover workflow.
- Automatic read-only ArgoCD discovery runs when the Applications CRD is present
- Optional managed pause/resume is available through
--argocd-manage - Optional automatic resume at the end of switchover is available through
--argocd-resume-after-switchover - Resume-only mode is available through
--argocd-resume-only
For full ArgoCD behavior, constraints, and examples, see Detailed Usage Guide and Scripts README.
The ACM Switchover tool uses a least-privilege RBAC model with two distinct roles:
| Role | Purpose | Access Level |
|---|---|---|
| Operator | Execute switchovers | Full read/write to ACM resources |
| Validator | Pre-flight validation, dry-runs | Read-only access |
# Deploy RBAC to both hubs (requires cluster-admin)
kubectl --context primary-hub apply -f deploy/rbac/
kubectl --context secondary-hub apply -f deploy/rbac/
# Deploy managed cluster RBAC via ACM Policy (optional, for klusterlet operations)
kubectl --context primary-hub apply -f deploy/acm-policies/policy-managed-cluster-rbac.yaml
# Validate RBAC permissions
python check_rbac.py --context primary-hub --role operator
python check_rbac.py --context secondary-hub --role operator
# Validate managed cluster RBAC
python check_rbac.py --context prod1 --managed-cluster --role operator📖 Full Guide: RBAC Deployment Guide | RBAC Requirements
Standalone validation scripts ensure safe and successful switchovers:
# Auto-discover ACM hubs from kubeconfig
./scripts/discover-hub.sh --auto
# Run comprehensive pre-flight validation
./scripts/preflight-check.sh --primary-context primary-hub --secondary-context secondary-hubValidates: ACM versions match, backups complete, OADP healthy, passive sync ready, ClusterDeployments protected, ManagedClusters in backup
# Verify switchover completed successfully
./scripts/postflight-check.sh --old-hub-context primary-hub --new-hub-context secondary-hubVerifies: ManagedClusters connected, backups running on new hub, Observability healthy, old hub properly configured
📖 Full Guide: Scripts README | Quick Reference
- Quick Reference - Command cheat sheet and common tasks
- Detailed Usage Guide - Complete examples and scenarios
- RBAC Requirements - Complete RBAC permissions guide
- RBAC Deployment - Step-by-step RBAC deployment instructions
- ACM Switchover Runbook - Detailed operational procedures
- Installation Guide - Detailed installation instructions
- Architecture - Design and implementation details
- Testing Guide - How to run tests and CI/CD
- Contributing - Development guidelines
See docs/README.md for complete documentation index.
- Python 3.10+
kubectlorocCLI configured for both primary and secondary hubs- ACM Backup configured on both hubs
- OADP operator installed on both hubs
- Network access to both Kubernetes clusters
- RBAC permissions: Required permissions for switchover operations (see RBAC Requirements)
# Clone the repository
git clone https://github.com/tomazb/rh-acm-switchover.git
cd rh-acm-switchover
# Install dependencies
pip install -r requirements.txt# Validate everything first
python acm_switchover.py --validate-only \
--primary-context primary-hub \
--secondary-context secondary-hub
# Dry-run to see what would happen
python acm_switchover.py --dry-run \
--primary-context primary-hub \
--secondary-context secondary-hub \
--old-hub-action secondary \
--method passive
# Actual execution
python acm_switchover.py \
--primary-context primary-hub \
--secondary-context secondary-hub \
--old-hub-action secondary \
--method passivepython acm_switchover.py \
--primary-context primary-hub \
--secondary-context secondary-hub \
--old-hub-action decommission \
--method fullUse this when ArgoCD manages ACM resources and you want the tool to coordinate pause/resume safely.
python acm_switchover.py \
--primary-context primary-hub \
--secondary-context secondary-hub \
--old-hub-action secondary \
--method passive \
--argocd-manage \
--argocd-resume-after-switchoverFor post-cutover-only resumption, use --argocd-resume-only with --secondary-context.
# Script automatically resumes from last successful step
python acm_switchover.py \
--primary-context primary-hub \
--secondary-context secondary-hub \
--old-hub-action secondary \
--method passive \
--state-file .state/switchover-<primary>__<secondary>.jsonTo return to the original hub, perform a reverse switchover by swapping contexts:
# Swap --primary-context and --secondary-context values
python acm_switchover.py \
--primary-context secondary-hub \
--secondary-context primary-hub \
--method passive \
--old-hub-action secondaryIf you later use --argocd-resume-only after a reverse switchover, the CLI will reuse the original state file automatically when the swapped-context match is unambiguous. If both context orderings have state files, pass --state-file explicitly.
Note: Requires original switchover used
--old-hub-action secondaryto enable passive sync.
python acm_switchover.py --decommission \
--primary-context primary-hub| Option | Description |
|---|---|
--primary-context |
Kubernetes context for primary hub (required) |
--secondary-context |
Kubernetes context for secondary hub (required for switchover) |
--method |
Switchover method: passive or full (required) |
--activation-method |
Activation option for passive method: patch (default) or restore |
--min-managed-clusters |
Minimum restored non-local ManagedCluster count to enforce after activation; must be non-negative (0 = informational only) |
--old-hub-action |
Action for old hub: secondary (recommended - enables reverse switchover), decommission, or none (required) |
--validate-only |
Run validation checks only, no changes |
--dry-run |
Show planned actions without executing |
--state-file |
Path to state file (default: .state/switchover-<primary>__<secondary>.json) |
--decommission |
Decommission old hub (interactive) |
--manage-auto-import-strategy |
Temporarily set ImportAndSync on destination hub (ACM 2.14+) |
--skip-observability-checks |
Skip Observability-related steps even if detected |
--disable-observability-on-secondary |
Delete MCO on old hub when keeping it as secondary |
--skip-rbac-validation |
Skip RBAC permission validation during pre-flight checks |
--argocd-manage |
Pause/resume ACM-touching ArgoCD Applications as part of switchover |
--argocd-resume-after-switchover |
Resume ArgoCD Applications after successful switchover completion |
--argocd-resume-only |
Only resume previously paused ArgoCD Applications (no switchover execution) |
--verbose |
Enable verbose logging |
-
Pre-flight Validation
- Verify backup completion and status
- Check ACM version matching between hubs
- Validate OADP operator and DataProtectionApplication
- Verify all ClusterDeployments have
spec.preserveOnDelete=true - Verify all ManagedClusters are included in latest backup
- Check passive sync status (Method 1 only)
-
Primary Hub Preparation
- Pause BackupSchedule (version-aware: ACM 2.11 vs 2.12+)
- Add disable-auto-import annotations to ManagedClusters
- Scale down Thanos compactor (if Observability detected)
-
Secondary Hub Activation
- Verify latest passive restore (Method 1) or create full restore (Method 2)
- Activate managed clusters on secondary hub (patch or create
restore-acm-activate) - Apply
immediate-importannotations whenautoImportStrategy=ImportOnly(ACM 2.14+) - Poll until restore completes
-
Post-Activation Verification
- Monitor ManagedCluster connection status (5-10 minutes)
- Restart observatorium-api deployment (if Observability detected)
- Verify Observability pod health
- Check metrics collection
-
Finalization
- Enable BackupSchedule on new hub
- Fix BackupSchedule collision if detected
- Verify new backups are created
- Verify backup integrity (status, age, and Velero logs)
- Handle old hub based on
--old-hub-action:secondary: Set up passive sync restore (recommended - enables reverse switchover)decommission: Remove ACM components automaticallynone: Leave unchanged for manual handling
- Optional: delete MultiClusterObservability on old hub when keeping it as secondary
- Generate completion report
The script maintains a JSON state file tracking:
- Completed steps
- Current phase
- Timestamp of each operation
- Detected configuration (ACM version, Observability presence)
- Errors encountered
Optimized State Persistence:
The state manager uses intelligent write batching to optimize performance:
- Non-critical updates (step completion, configuration) are batched and written only when needed
- Critical checkpoints (phase transitions, errors, resets) are immediately persisted
- Automatic protection: State is automatically flushed on program termination (SIGTERM/SIGINT/atexit) to prevent data loss
This enables:
- Resume from failure point
- Audit trail of operations
- Context awareness across sessions
- Reduced disk I/O for better performance
- ClusterDeployment Protection: Mandatory check for
preserveOnDelete=trueprevents accidental cluster destruction - Backup State Verification: Ensures no backups in progress during switchover
- Progressive Validation: Validates at each step before proceeding
- Dry-run Mode: Preview all actions before execution
- Reverse Switchover: Return to original hub by swapping contexts (when using
--old-hub-action secondary) - Auto-detection: No manual configuration of optional components
Wait 10-15 minutes for auto-import. Check import secrets in managed cluster namespaces.
Ensure observatorium-api pods were restarted. Wait 10 minutes for metrics collection to resume.
Check Velero restore logs: oc logs -n open-cluster-management-backup deployment/velero
Edit state file manually or use --reset-state to start fresh (use with caution).
# Run all tests with coverage
./run_tests.sh
# Or manually
python -m pytest tests/ -v --cov=. --cov-report=htmlEnd-to-end tests validate complete switchover cycles on real clusters:
# Dry-run (no cluster changes)
pytest -m e2e tests/e2e/ --e2e-dry-run
# Real switchover with soak testing controls
pytest -m e2e tests/e2e/ \
--primary-context mgmt1 \
--secondary-context mgmt2 \
--e2e-cycles 5 \
--e2e-run-hours 2 \
--e2e-max-failures 2Note: Legacy bash E2E scripts (quick_start_e2e.sh, e2e_test_orchestrator.sh, phase_monitor.sh) are deprecated and will be removed in v2.0. See tests/e2e/MIGRATION.md for migration guide.
- Unit tests for core utilities and validation modules
- E2E tests with Python orchestrator and monitoring
- Code quality checks (flake8, pylint, black, isort)
- Security scanning (bandit, safety)
- Type checking (mypy)
- CI/CD integration with GitHub Actions
See docs/development/testing.md for detailed testing guide.
Main Pipeline (.github/workflows/ci-cd.yml):
- Runs on every push and pull request
- Tests across Python 3.10-3.12
- Code quality and security checks
- Syntax validation
- Documentation verification
Security Pipeline (.github/workflows/security.yml):
- Runs daily and on security-related changes
- Dependency vulnerability scanning
- Static security analysis
- Secrets detection
- Container image scanning
- SBOM generation
- Quick Reference - Command cheat sheet and examples
- Usage Guide - Detailed usage guide with scenarios
- Installation Guide - Installation and deployment
- Architecture - Design and implementation details
- Testing Guide - Testing strategy and CI/CD
- Contributing - Development guidelines
- Changelog - Version history and changes
- PRD - Product Requirements Document
- Project Summary - Comprehensive overview
- Deliverables - Complete project inventory
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
MIT License - See LICENSE file for details