-
Notifications
You must be signed in to change notification settings - Fork 199
Add extra worker to cluster after dev-scripts install is complete #1807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MahnoorAsghar
wants to merge
1
commit into
openshift-metal3:master
Choose a base branch
from
MahnoorAsghar:add-vm-bmc-post-install
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,234 @@ | ||
| # Implementation Summary: Post-Installation Worker Node Addition | ||
|
|
||
| ## What Was Created | ||
|
|
||
| ### Core Scripts | ||
|
|
||
| 1. **`add_worker_node.sh`** - Main script to add worker nodes | ||
| - Creates libvirt VM with customizable resources | ||
| - Configures virtual BMC (vbmc or sushy-tools) | ||
| - Generates BareMetalHost and Secret manifests | ||
| - Provides step-by-step instructions for completion | ||
| - Automatically finds available BMC ports and generates unique MAC addresses | ||
|
|
||
| 2. **`remove_worker_node.sh`** - Cleanup script to remove worker nodes | ||
| - Drains and deletes node from cluster | ||
| - Removes BareMetalHost and Machine resources | ||
| - Destroys VM and cleans up disk/NVRAM | ||
| - Removes BMC configuration | ||
|
|
||
| 3. **`auto_approve_csrs.sh`** - Helper script for CSR approval | ||
| - Auto-approves pending CSRs for specified duration | ||
| - Useful for development/testing scenarios | ||
| - Default 30-minute duration | ||
|
|
||
| ### Documentation | ||
|
|
||
| 1. **`docs/add-worker-post-install.md`** - Complete documentation | ||
| - Detailed usage instructions | ||
| - Configuration options | ||
| - Complete workflow examples | ||
| - Troubleshooting guide | ||
| - Architecture notes | ||
|
|
||
| 2. **`WORKER_QUICK_START.md`** - Quick reference guide | ||
| - TL;DR commands | ||
| - Common use cases | ||
| - Quick examples | ||
|
|
||
| 3. **`README.md`** - Updated main README | ||
| - Added new "Option 1: Add Workers Post-Installation" | ||
| - Kept existing pre-configuration method as "Option 2" | ||
| - Links to new documentation | ||
|
|
||
| ### Makefile Integration | ||
|
|
||
| Added two new targets to `Makefile`: | ||
| - `make add_worker WORKER_NAME=<name>` - Add a worker | ||
| - `make remove_worker WORKER_NAME=<name>` - Remove a worker | ||
|
|
||
| ## Key Features | ||
|
|
||
| ### 1. No Pre-Planning Required | ||
| Unlike the existing methods, this solution allows adding workers **after** deployment without requiring `NUM_EXTRA_WORKERS` to be set beforehand. | ||
|
|
||
| ### 2. Flexible Configuration | ||
| Users can customize worker resources via environment variables: | ||
| ```bash | ||
| export EXTRA_WORKER_MEMORY=32768 # 32GB | ||
| export EXTRA_WORKER_DISK=100 # 100GB | ||
| export EXTRA_WORKER_VCPU=16 # 16 cores | ||
| ``` | ||
|
|
||
| ### 3. Smart Automation | ||
| - Automatically finds available BMC ports | ||
| - Generates unique MAC addresses | ||
| - Detects and starts BMC containers if needed | ||
| - Supports both IPMI and Redfish BMC protocols | ||
| - Handles UEFI/BIOS firmware automatically | ||
|
|
||
| ### 4. Complete Lifecycle Management | ||
| - Add workers: `add_worker_node.sh` | ||
| - Remove workers: `remove_worker_node.sh` | ||
| - Auto-approve CSRs: `auto_approve_csrs.sh` | ||
|
|
||
| ### 5. Safety Checks | ||
| - Validates cluster connectivity | ||
| - Checks for VM name conflicts | ||
| - Checks for BareMetalHost conflicts | ||
| - Validates worker name format | ||
|
|
||
| ## Usage Comparison | ||
|
|
||
| ### Old Method (Pre-Configuration Required) | ||
| ```bash | ||
| # BEFORE initial deployment | ||
| export NUM_EXTRA_WORKERS=2 | ||
| export EXTRA_WORKERS_ONLINE_STATUS=false | ||
| make | ||
|
|
||
| # AFTER deployment | ||
| oc apply -f ocp/ostest/extra_host_manifests.yaml | ||
| oc scale machineset ostest-worker-0 --replicas=3 -n openshift-machine-api | ||
| ``` | ||
|
|
||
| ### New Method (Post-Installation) | ||
| ```bash | ||
| # Deploy cluster normally | ||
| make | ||
|
|
||
| # LATER: Add worker on demand | ||
| ./add_worker_node.sh my-worker-1 | ||
| oc apply -f ocp/ostest/my-worker-1_bmh.yaml | ||
| ./auto_approve_csrs.sh 30 & | ||
| oc scale machineset ostest-worker-0 --replicas=3 -n openshift-machine-api | ||
| ``` | ||
|
|
||
| ## Architecture | ||
|
|
||
| ### VM Creation | ||
| - Uses libvirt XML to define VM | ||
| - Supports x86_64 and aarch64 architectures | ||
| - Configures UEFI or BIOS boot | ||
| - Connects to baremetal network | ||
| - Allocates disk storage in `/var/lib/libvirt/images/` | ||
|
|
||
| ### BMC Configuration | ||
| Two protocols supported: | ||
|
|
||
| 1. **IPMI** (via vbmc) | ||
| - Port range: 6230-6250 | ||
| - Protocol: `ipmi://<host>:<port>` | ||
| - Container: vbmc | ||
|
|
||
| 2. **Redfish** (via sushy-tools) | ||
| - Port: 8000 (HTTP) | ||
| - Protocol: `redfish-virtualmedia+http://<host>:8000/<vm-name>` | ||
| - Container: sushy-tools | ||
| - Default choice | ||
|
|
||
| ### BareMetalHost Integration | ||
| Generated manifest includes: | ||
| - Secret with BMC credentials (admin/password) | ||
| - BareMetalHost spec with: | ||
| - BMC address and credentials | ||
| - Boot MAC address | ||
| - Online status (true) | ||
| - Automated cleaning mode (disabled) | ||
|
|
||
| ## Testing | ||
|
|
||
| All scripts pass bash syntax validation: | ||
| ```bash | ||
| bash -n add_worker_node.sh # ✓ PASS | ||
| bash -n remove_worker_node.sh # ✓ PASS | ||
| bash -n auto_approve_csrs.sh # ✓ PASS | ||
| ``` | ||
|
|
||
| ## Files Modified/Created | ||
|
|
||
| ### New Files | ||
| - `add_worker_node.sh` (executable) | ||
| - `remove_worker_node.sh` (executable) | ||
| - `auto_approve_csrs.sh` (executable) | ||
| - `docs/add-worker-post-install.md` | ||
| - `WORKER_QUICK_START.md` | ||
| - `.add_worker_implementation_summary.md` (this file) | ||
|
|
||
| ### Modified Files | ||
| - `Makefile` - Added `add_worker` and `remove_worker` targets | ||
| - `README.md` - Updated "Testing with extra workers" section | ||
|
|
||
| ## Dependencies | ||
|
|
||
| The scripts leverage existing dev-scripts infrastructure: | ||
| - `common.sh` - Environment variables and common functions | ||
| - `network.sh` - Network configuration | ||
| - `utils.sh` - Utility functions | ||
| - `ocp_install_env.sh` - OCP environment setup | ||
| - `logging.sh` - Logging functions | ||
|
|
||
| ## Compatibility | ||
|
|
||
| - Works with standard installer flow (`make` or `make all`) | ||
| - Compatible with existing `NUM_EXTRA_WORKERS` workflow | ||
| - Supports both IPMI and Redfish BMC protocols | ||
| - Works with UEFI and BIOS boot modes | ||
| - Supports x86_64 and aarch64 architectures | ||
|
|
||
| ## Next Steps for Users | ||
|
|
||
| 1. **Basic Usage** | ||
| ```bash | ||
| ./add_worker_node.sh worker-1 | ||
| oc apply -f ocp/ostest/worker-1_bmh.yaml | ||
| oc scale machineset <name> --replicas=<N+1> -n openshift-machine-api | ||
| ``` | ||
|
|
||
| 2. **With Custom Resources** | ||
| ```bash | ||
| export EXTRA_WORKER_MEMORY=32768 EXTRA_WORKER_DISK=100 EXTRA_WORKER_VCPU=16 | ||
| ./add_worker_node.sh large-worker | ||
| ``` | ||
|
|
||
| 3. **Quick Start with Make** | ||
| ```bash | ||
| make add_worker WORKER_NAME=worker-1 | ||
| ``` | ||
|
|
||
| 4. **Auto-approve CSRs** | ||
| ```bash | ||
| ./auto_approve_csrs.sh 30 & | ||
| ``` | ||
|
|
||
| ## Limitations | ||
|
|
||
| 1. Only works with libvirt-based deployments | ||
| 2. Requires BMC containers (vbmc/sushy-tools) to be available | ||
| 3. BMC port range limited to available ports in 6230-6250 | ||
| 4. Worker must be on same network as other cluster nodes | ||
| 5. Resources (memory, disk, CPU) are set at VM creation time | ||
|
|
||
| ## Future Enhancements | ||
|
|
||
| Possible improvements for future versions: | ||
| - Support for multiple workers in one command | ||
| - Interactive mode with prompts | ||
| - Integration with Ansible playbooks | ||
| - Support for additional disk attachment | ||
| - Network configuration customization | ||
| - BMC port range expansion | ||
| - Support for non-libvirt platforms | ||
|
|
||
| ## Support | ||
|
|
||
| For issues or questions: | ||
| 1. Check the documentation: `docs/add-worker-post-install.md` | ||
| 2. Review quick start: `WORKER_QUICK_START.md` | ||
| 3. Check script output for detailed instructions | ||
| 4. Verify cluster connectivity and resources | ||
|
|
||
| ## Conclusion | ||
|
|
||
| This implementation provides a complete, flexible solution for adding worker nodes to dev-scripts clusters post-installation, eliminating the need for pre-planning and making it easier to scale clusters on demand for testing and development purposes. | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # Quick Start: Adding Workers Post-Installation | ||
|
|
||
| ## TL;DR | ||
|
|
||
| Add a worker node after your cluster is already deployed: | ||
|
|
||
| ```bash | ||
| # Add a worker | ||
| ./add_worker_node.sh my-worker-1 | ||
|
|
||
| # Apply the generated manifest | ||
| oc apply -f ocp/ostest/my-worker-1_bmh.yaml | ||
|
|
||
| # Auto-approve CSRs in background | ||
| ./auto_approve_csrs.sh 30 & | ||
|
|
||
| # Scale up your machineset | ||
| oc get machineset -n openshift-machine-api | ||
| oc scale machineset <your-cluster>-worker-0 --replicas=<N+1> -n openshift-machine-api | ||
|
|
||
| # Watch it join | ||
| oc get nodes -w | ||
| ``` | ||
|
|
||
| ## Customizing Resources | ||
|
|
||
| ```bash | ||
| export EXTRA_WORKER_MEMORY=32768 # 32GB RAM | ||
| export EXTRA_WORKER_DISK=100 # 100GB disk | ||
| export EXTRA_WORKER_VCPU=16 # 16 vCPUs | ||
|
|
||
| ./add_worker_node.sh my-large-worker | ||
| ``` | ||
|
|
||
| ## Using Make | ||
|
|
||
| ```bash | ||
| # Add worker | ||
| make add_worker WORKER_NAME=worker-1 | ||
|
|
||
| # Remove worker | ||
| make remove_worker WORKER_NAME=worker-1 | ||
| ``` | ||
|
|
||
| ## What Gets Created | ||
|
|
||
| - ✅ Libvirt VM with specified resources | ||
| - ✅ Virtual BMC (IPMI or Redfish) | ||
| - ✅ BareMetalHost manifest | ||
| - ✅ Secret for BMC credentials | ||
| - ✅ Complete setup instructions | ||
|
|
||
| ## Removing a Worker | ||
|
|
||
| ```bash | ||
| # Remove from cluster and delete VM | ||
| ./remove_worker_node.sh my-worker-1 | ||
|
|
||
| # Don't forget to scale down the machineset | ||
| oc scale machineset <your-cluster>-worker-0 --replicas=<N-1> -n openshift-machine-api | ||
| ``` | ||
|
|
||
| ## Full Documentation | ||
|
|
||
| See [docs/add-worker-post-install.md](docs/add-worker-post-install.md) for detailed documentation. | ||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not commit this file