Skip to content

feat(recipes): add nodewright to bcm with reapply-on-reboot#1105

Merged
mchmarny merged 2 commits into
mainfrom
feat/bcm-nodewright-reapply-on-reboot
May 29, 2026
Merged

feat(recipes): add nodewright to bcm with reapply-on-reboot#1105
mchmarny merged 2 commits into
mainfrom
feat/bcm-nodewright-reapply-on-reboot

Conversation

@ayuskauskas

Copy link
Copy Markdown
Contributor

Summary

Add the nodewright-operator and nodewright-customizations components to the BCM overlay, with reapplyOnReboot enabled so nodewright customizations are re-applied after BCM rewrites the OS on boot.

Motivation / Context

BCM re-writes the node OS on every reboot, which wipes any nodewright customizations applied to the host. Enabling reapplyOnReboot: "true" ensures the operator re-applies the BCM setup customizations after each reboot, keeping nodes in the expected state.

Fixes: N/A
Related: N/A

Type of Change

  • New feature (non-breaking change that adds functionality)

Component(s) Affected

  • Recipe engine / data (pkg/recipe)

Implementation Notes

  • Adds nodewright-operator (Helm) to recipes/overlays/bcm.yaml with controllerManager.manager.env.reapplyOnReboot: "true".
  • Adds nodewright-customizations (Helm) referencing components/nodewright-customizations/manifests/bcm-setup.yaml, with a dependencyRefs edge to nodewright-operator.
  • Node scheduling (selectors/tolerations) for the operator is intentionally left to be supplied at recipe time rather than hardcoded in the overlay.

Testing

make qualify

Risk Assessment

  • Low — Isolated overlay/data change, easy to revert

Rollout notes: N/A — additive recipe overlay change.

Checklist

  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 4dc730b2-cf72-4e24-859e-ca972f6c03ba

📥 Commits

Reviewing files that changed from the base of the PR and between 663aabc and 452baec.

📒 Files selected for processing (3)
  • docs/user/container-images.md
  • recipes/components/nodewright-customizations/manifests/bcm-setup.yaml
  • recipes/overlays/bcm.yaml

📝 Walkthrough

Walkthrough

This PR adds BCM recipe support for nodewright-based node customization. A new Helm template (recipes/components/nodewright-customizations/manifests/bcm-setup.yaml) defines a Skyhook custom resource rendered as a post-install/post-upgrade hook with templated tolerations, node selectors, workload labels, and nvidia-setup package configuration. The BCM overlay (recipes/overlays/bcm.yaml) is updated to include nodewright-operator (reapplyOnReboot enabled) and nodewright-customizations (service: bcm, accelerator: h100) with an explicit dependency on the operator. Documentation (docs/user/container-images.md) is updated to reflect the new nvidia-setup image version.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • NVIDIA/aicr#1101: Both PRs add/wire the nodewright-customizations Helm component that renders a Skyhook CR and share templating for spec.additionalTolerations and optional selector injections.
  • NVIDIA/aicr#1102: Both PRs wire nodewright-customizations into recipe overlays with nodewright-operator as a dependency and select accelerator: h100 tuning paths.

Suggested reviewers

  • lockwobr
  • mchmarny
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: adding nodewright-operator and nodewright-customizations components to the BCM overlay with reapplyOnReboot enabled.
Description check ✅ Passed The description provides clear context, motivation, implementation details, and testing guidance all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/bcm-nodewright-reapply-on-reboot

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Coverage Report ✅

Metric Value
Coverage 77.1%
Threshold 75%
Status Pass
Coverage Badge
![Coverage](https://img.shields.io/badge/coverage-77.1%25-green)

No Go source files changed in this PR.

@mchmarny mchmarny left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Focused, real motivation — BCM rewrites the OS on reboot, so reapplyOnReboot: "true" is the right knob. CI green across 118 checks.

Three inline comments — two are addressable in a fixup:

  • bcm-setup.yaml: toleration default regresses the #1101 consistency fix (one-line change to match the post-#1101 pattern on the sibling manifests).
  • bcm-setup.yaml: BOM regen needed for nvidia-setup:0.3.0 (existing BOM has 0.2.2).
  • bcm.yaml: accelerator: h100 hardcoded in the override even though the BCM criteria block has no accelerator filter — worth a comment or scoping the override.

Two non-blocking nits for the next push:

  • Title: feat: add nodewright to bcm…feat(recipes): add nodewright to bcm with reapply-on-reboot to match the conventional-commits + feat(area): convention in AGENTS.md "Pull Request Requirements".
  • Rebase: branch is mergeable_state: behind after #1100/#1101/#1102 landed.

Not blocking the substance — the additive overlay + reapply-on-reboot wiring is good.

Comment thread recipes/overlays/bcm.yaml
Wire nodewright-operator (reapplyOnReboot enabled, since BCM rewrites the OS
on reboot) and nodewright-customizations into the BCM overlay, with a
dependency edge from customizations to the operator.

- bcm-setup.yaml additionalTolerations default uses tolerate-all
  (operator Exists, no key) to match the sibling tuning manifests and
  ParseTolerations()/DefaultTolerations().
- accelerator: h100 is hardcoded with a comment noting the nvidia-setup
  package is accelerator-agnostic today; BCM criteria filters on service only.
- Regenerated docs/user/container-images.md BOM for nvidia-setup:0.3.0.
@ayuskauskas ayuskauskas force-pushed the feat/bcm-nodewright-reapply-on-reboot branch from 663aabc to 452baec Compare May 29, 2026 20:32
@ayuskauskas ayuskauskas requested a review from a team as a code owner May 29, 2026 20:32
@ayuskauskas ayuskauskas changed the title feat: add nodewright to bcm with re-apply on reboot feat(recipes): add nodewright to bcm with reapply-on-reboot May 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

@mchmarny mchmarny enabled auto-merge (squash) May 29, 2026 22:16
@mchmarny mchmarny merged commit 178bbe4 into main May 29, 2026
120 checks passed
@mchmarny mchmarny deleted the feat/bcm-nodewright-reapply-on-reboot branch May 29, 2026 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants