Skip to content

nvswitch: Add fabric manager modes#134

Merged
zvonkok merged 5 commits intoNVIDIA:mainfrom
zvonkok:fm_modes
Feb 9, 2026
Merged

nvswitch: Add fabric manager modes#134
zvonkok merged 5 commits intoNVIDIA:mainfrom
zvonkok:fm_modes

Conversation

@zvonkok
Copy link
Collaborator

@zvonkok zvonkok commented Jan 29, 2026

Replace boolean nvrc.fabricmanager with nvrc.fm.mode (0=bare metal, 1=servicevm) to properly configure FABRIC_MODE and FABRIC_MODE_RESTART in fabricmanager.cfg.

Add nvrc.fm.rail.policy (greedy|symmetric) for PARTITION_RAIL_POLICY. Symmetric policy required for Confidential Computing on Blackwell to ensure memory isolation boundaries during attestation.

Introduce generic update_config_file() helper in src/config.rs to eliminate repetitive KEY=VALUE file manipulation and enable easy addition of future configuration parameters.

Copilot AI review requested due to automatic review settings January 29, 2026 00:06
@zvonkok zvonkok changed the title nvswtich: Add fm modes nvswitch: Add fabric manager modes Jan 29, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates NVSwitch/Fabric Manager configuration to support explicit Fabric Manager modes and partition rail policy, and introduces a helper intended to simplify KEY=VALUE config file updates.

Changes:

  • Replace nvrc.fabricmanager boolean with nvrc.fm.mode (0/1/2) and add nvrc.fm.rail.policy (greedy/symmetric) kernel params.
  • Configure fabricmanager.cfg with FABRIC_MODE, FABRIC_MODE_RESTART, and PARTITION_RAIL_POLICY based on parsed settings.
  • Add config module wiring (binary + library) to host the new update_config_file() helper (currently missing from the PR).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/nvrc.rs Replaces Fabric Manager enable boolean with fabric_mode + rail_policy fields in NVRC state.
src/main.rs Sets fabric_mode=1 in NVSwitch modes and adds mod config; module declaration.
src/lib.rs Exposes config module from the library crate.
src/kernel_params.rs Parses new nvrc.fm.mode and nvrc.fm.rail.policy parameters and updates tests accordingly.
src/daemon.rs Writes Fabric Manager settings into fabricmanager.cfg and uses update_config_file() helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zvonkok zvonkok force-pushed the fm_modes branch 2 times, most recently from fe5e4f8 to 5fdfc25 Compare January 29, 2026 00:50
Copilot AI review requested due to automatic review settings January 29, 2026 00:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 30, 2026 20:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 5, 2026 23:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace boolean nvrc.fabricmanager with nvrc.fm.mode (0=bare metal,
1=servicevm, 2=vgpu) to properly configure FABRIC_MODE and
FABRIC_MODE_RESTART in fabricmanager.cfg.

Add nvrc.fm.rail.policy (greedy|symmetric) for PARTITION_RAIL_POLICY.
Symmetric policy required for Confidential Computing on Blackwell to
ensure memory isolation boundaries during attestation.

Introduce generic update_config_file() helper in src/config.rs to
eliminate repetitive KEY=VALUE file manipulation and enable easy
addition of future configuration parameters.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Copilot AI review requested due to automatic review settings February 6, 2026 00:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@zvonkok zvonkok force-pushed the fm_modes branch 2 times, most recently from 9dc7917 to 76d71f6 Compare February 6, 2026 01:05
Copilot AI review requested due to automatic review settings February 6, 2026 01:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

manuelh-dev
manuelh-dev previously approved these changes Feb 6, 2026
HGX Bx00 systems use CX7 bridges for NVLink management instead of direct GPU access.
GPUs are passed to tenant VMs; only the CX7 IB devices are visible here.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Addiionally introduce a check between NVLSM and FM

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Improve require_root() to prevent running the entire test suite when
a single root-requiring test needs to be re-executed with sudo.

Add --exact flag when a test filter is present to prevent fuzzy
matching (e.g., "test_log" no longer matches "test_log_debug",
"test_log_info", etc.)

Add --test-threads=1 to force serial execution under sudo,
preventing race conditions between root-requiring tests

Before: cargo test test_foo → reruns all test_foo* tests with sudo
After:  cargo test test_foo → reruns only test_foo with sudo

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
@zvonkok zvonkok merged commit 067750a into NVIDIA:main Feb 9, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants