feat(recipes): add nodewright to bcm with reapply-on-reboot#1105
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR adds BCM recipe support for nodewright-based node customization. A new Helm template (recipes/components/nodewright-customizations/manifests/bcm-setup.yaml) defines a Skyhook custom resource rendered as a post-install/post-upgrade hook with templated tolerations, node selectors, workload labels, and nvidia-setup package configuration. The BCM overlay (recipes/overlays/bcm.yaml) is updated to include nodewright-operator (reapplyOnReboot enabled) and nodewright-customizations (service: bcm, accelerator: h100) with an explicit dependency on the operator. Documentation (docs/user/container-images.md) is updated to reflect the new nvidia-setup image version. Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Coverage Report ✅
Coverage BadgeNo Go source files changed in this PR. |
mchmarny
left a comment
There was a problem hiding this comment.
Focused, real motivation — BCM rewrites the OS on reboot, so reapplyOnReboot: "true" is the right knob. CI green across 118 checks.
Three inline comments — two are addressable in a fixup:
bcm-setup.yaml: toleration default regresses the #1101 consistency fix (one-line change to match the post-#1101 pattern on the sibling manifests).bcm-setup.yaml: BOM regen needed fornvidia-setup:0.3.0(existing BOM has0.2.2).bcm.yaml:accelerator: h100hardcoded in the override even though the BCM criteria block has no accelerator filter — worth a comment or scoping the override.
Two non-blocking nits for the next push:
- Title:
feat: add nodewright to bcm…→feat(recipes): add nodewright to bcm with reapply-on-rebootto match the conventional-commits +feat(area):convention in AGENTS.md "Pull Request Requirements". - Rebase: branch is
mergeable_state: behindafter #1100/#1101/#1102 landed.
Not blocking the substance — the additive overlay + reapply-on-reboot wiring is good.
Wire nodewright-operator (reapplyOnReboot enabled, since BCM rewrites the OS on reboot) and nodewright-customizations into the BCM overlay, with a dependency edge from customizations to the operator. - bcm-setup.yaml additionalTolerations default uses tolerate-all (operator Exists, no key) to match the sibling tuning manifests and ParseTolerations()/DefaultTolerations(). - accelerator: h100 is hardcoded with a comment noting the nvidia-setup package is accelerator-agnostic today; BCM criteria filters on service only. - Regenerated docs/user/container-images.md BOM for nvidia-setup:0.3.0.
663aabc to
452baec
Compare
Summary
Add the
nodewright-operatorandnodewright-customizationscomponents to the BCM overlay, withreapplyOnRebootenabled so nodewright customizations are re-applied after BCM rewrites the OS on boot.Motivation / Context
BCM re-writes the node OS on every reboot, which wipes any nodewright customizations applied to the host. Enabling
reapplyOnReboot: "true"ensures the operator re-applies the BCM setup customizations after each reboot, keeping nodes in the expected state.Fixes: N/A
Related: N/A
Type of Change
Component(s) Affected
pkg/recipe)Implementation Notes
nodewright-operator(Helm) torecipes/overlays/bcm.yamlwithcontrollerManager.manager.env.reapplyOnReboot: "true".nodewright-customizations(Helm) referencingcomponents/nodewright-customizations/manifests/bcm-setup.yaml, with adependencyRefsedge tonodewright-operator.Testing
Risk Assessment
Rollout notes: N/A — additive recipe overlay change.
Checklist
git commit -S)