docs: add DGX local inference walkthrough (Fixes #3231) by deepujain · Pull Request #4337 · NVIDIA/NemoClaw

deepujain · 2026-05-27T16:43:19Z

Summary

Adds a single DGX Spark and DGX Station local-inference walkthrough so users do not have to stitch together host prep, provider selection, vLLM/Ollama setup, verification, and Spark-specific troubleshooting from several pages.

Fixes #3231.

Changes

Added docs/inference/dgx-spark-station-local-inference.mdx with GPU/CDI checks, provider choice guidance, managed vLLM commands, verification steps, and common DGX fixes.
Linked the walkthrough from prerequisites, local inference, troubleshooting, and docs navigation.
Regenerated the relevant generated user-skill references for get-started, inference, and troubleshooting.
Added a docs copy-paste test for the new page.

Testing

python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx
npm run build:cli
npm run typecheck:cli
npm run docs
npx vitest run test/dgx-local-inference-doc-copy.test.ts

Evidence it works

Fern docs validation completed with 0 errors.
The generated nemoclaw-user-configure-inference skill now points DGX Spark and DGX Station questions to the new walkthrough.
The docs copy-paste test confirms the walkthrough's shell examples are copyable bash blocks without prompt prefixes.

Summary by CodeRabbit

Documentation
- Added a comprehensive DGX Spark/DGX Station local inference guide, updated platform/prerequisite entries and navigation links, and revised related troubleshooting and onboarding docs.
- Expanded docs for model router configuration, shields/seal behavior, CLI commands (onboard/status/channels/rebuild/uninstall), and messaging/Telegram guidance.
- Removed Hermes "Experimental" warning from quickstart.
Tests
- Added a test ensuring the DGX local inference doc's fenced code blocks contain only bash and no interactive prompts.

Signed-off-by: Deepak Jain deepujain@gmail.com

copy-pr-bot · 2026-05-27T16:43:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-27T16:43:32Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

Adds a new end-to-end DGX Spark/DGX Station local inference guide, integrates it into site navigation and related skill docs, expands several CLI/troubleshooting/security docs, and adds a Vitest check validating code-block formatting in the new guide.

Changes

DGX Spark and DGX Station Local Inference Documentation

Layer / File(s)	Summary
Core local inference setup documentation `docs/inference/dgx-spark-station-local-inference.mdx`, `.agents/skills/nemoclaw-user-configure-inference/references/dgx-spark-station-local-inference.md`, `.agents/skills/nemoclaw-user-configure-inference/SKILL.md`	New comprehensive guide covering host prerequisites (Docker, Node/npm, NVIDIA driver/toolkit, optional CDI spec generation), choice of managed vLLM vs Ollama, interactive and non-interactive onboarding (`nemoclaw onboard` / `NEMOCLAW_PROVIDER` + `NEMOCLAW_VLLM_MODEL`), verification (`status`, `doctor`, TUI), tool-calling notes, and DGX-specific troubleshooting (CoreDNS CrashLoop, k3s readiness timeouts, CDI GPU errors, port 3000 conflicts).
Navigation and cross-reference integration `docs/index.yml`, `docs/get-started/prerequisites.mdx`, `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`, `.agents/skills/nemoclaw-user-get-started/references/prerequisites.md`, `.agents/skills/nemoclaw-user-configure-inference/SKILL.md`	Adds "DGX Local Inference" nav entry and updates prerequisites/troubleshooting/use-local-inference pages and skill references to point to the new walkthrough instead of the prior external Ollama playbook.
CLI commands and lifecycle docs `.agents/skills/nemoclaw-user-reference/references/commands.md`	Expanded command reference: GPU detection and passthrough docs, `status --json` fields/exit semantics, `channels status` diagnostics, `rebuild` GPU-mode preservation, `uninstall` user-data preservation, and lifecycle env flags.
Security: shields & uninstall `.agents/skills/nemoclaw-user-configure-security/references/best-practices.md`, `.agents/skills/nemoclaw-user-manage-sandboxes/SKILL.md`	Documents `shields up` SHA-256 sealing behavior, legacy-baseline opt-in (`NEMOCLAW_SHIELDS_ACCEPT_LEGACY_BASELINE`), and clarifies uninstall-preserved data and non-interactive destroy opt-in.
Messaging and troubleshooting updates `.agents/skills/nemoclaw-user-manage-sandboxes/references/messaging-channels.md`, `.agents/skills/nemoclaw-user-reference/references/troubleshooting.md`	Clarifies `TELEGRAM_ALLOWED_IDS` and aliases, adds repeatable e2e DM testing steps, and updates DGX troubleshooting to point to the new DGX local-inference guide and replace gateway-destroy flow with gateway-remove + pkill + `--resume`.
Overview & skill wording tweaks `.agents/skills/nemoclaw-user-overview/`, `.agents/skills/nemoclaw-user-get-started/`, `.agents/skills/nemoclaw-user-manage-sandboxes/*`	Minor front-matter and reference wording changes (OpenClaw → NemoClaw, quickstart bullets, default sandbox name note for DGX express-install), and removal of an Hermes "Experimental" banner.
Documentation code block validation test `test/dgx-local-inference-doc-copy.test.ts`	Vitest test ensures fenced code blocks in the new DGX local-inference doc contain no interactive shell prompts and are only labeled `bash`.

Estimated code review effort:
🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

NVIDIA/NemoClaw#4460: Updates and tests for shields up SHA-256 sealing behavior and legacy-baseline acceptance that overlap with the shields/uninstall docs added here.
NVIDIA/NemoClaw#3151: Prior inference-provider and router documentation changes that touch inference-options.md and relate to router pool configuration described in this PR.

Suggested labels

Getting Started, enhancement: skill

Suggested reviewers

miyoungc
cv
ericksoa

🐰 A DGX guide at last, no longer adrift,
vLLM or Ollama — pick the right lift.
CDI, timeouts, ports, and CoreDNS fight,
One walkthrough to follow, from preflight to night.
Pasteable commands, and tests keep it tight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'docs: add DGX local inference walkthrough (Fixes `#3231`)' directly and clearly describes the primary change: adding comprehensive DGX local inference documentation that addresses the linked issue.
Linked Issues check	✅ Passed	The PR comprehensively addresses all coding/documentation objectives from `#3231`: adds end-to-end DGX walkthrough with pre-flight checks, provider selection, onboarding steps, verification commands, inline failure modes, and self-contained documentation instead of external redirects.
Out of Scope Changes check	✅ Passed	All changes are directly related to the DGX local inference documentation objective. Modifications to prerequisite references, troubleshooting links, skill documentation, and supporting files are all aligned with consolidating and improving local inference setup guidance.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/nemoclaw-user-reference/references/troubleshooting.md:
- Around line 1138-1139: This change edits an autogenerated skill reference and
must be reverted; do not modify generated markdown directly. Revert the manual
edit in the autogenerated file and instead update the canonical docs/source that
generate the skill (the source used to produce the
nemoclaw-user-configure-inference skill reference), then re-run the
documentation/skill generation pipeline so the corrected text is emitted into
the generated nemoclaw-user-*/*.md outputs; ensure CI/linting for autogenerated
skills passes before merging.

In `@docs/inference/dgx-spark-station-local-inference.mdx`:
- Around line 1-12: The frontmatter in the new page is missing required fields
and the SPDX header is incorrectly placed inside the frontmatter; update the
frontmatter to include title, description, keywords, topics, tags, content.type,
difficulty, audience, and status (use the existing title/description/keywords
and add appropriate topics/tags/difficulty/audience/status values), move the
SPDX lines so they appear immediately after the frontmatter block (not inside
it), and ensure the document body contains an H1 that exactly matches the
frontmatter title (i.e., add or replace the top-level heading to match "Set Up
DGX Spark or DGX Station Local Inference").

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1a4e20af-4761-45f1-b0b4-5fdee947e6d3

📥 Commits

Reviewing files that changed from the base of the PR and between e139dbc and c2a2ebf.

📒 Files selected for processing (10)

.agents/skills/nemoclaw-user-configure-inference/SKILL.md
.agents/skills/nemoclaw-user-configure-inference/references/dgx-spark-station-local-inference.md
.agents/skills/nemoclaw-user-get-started/references/prerequisites.md
.agents/skills/nemoclaw-user-reference/references/troubleshooting.md
docs/get-started/prerequisites.mdx
docs/index.yml
docs/inference/dgx-spark-station-local-inference.mdx
docs/inference/use-local-inference.mdx
docs/reference/troubleshooting.mdx
test/dgx-local-inference-doc-copy.test.ts

prekshivyas

@deepujain Thanks for putting this together — it's a genuinely useful walkthrough and most of it checks out. A couple of housekeeping items first, then the content fixes.

Before review can pass

Merge conflicts: the PR is currently CONFLICTING/DIRTY. Please rebase onto main and resolve.
dco-check is failing: the commit has a Signed-off-by line, but the workflow also requires one in the PR description. Please add Signed-off-by: Deepak Jain <deepujain@gmail.com> to the PR body. (commit-lint is green — an earlier red entry was a superseded run.)

Required changes before merge

1. Non-interactive examples will exit immediately for a first-time user.
Both --non-interactive examples omit the third-party consent flag, so a new user (no prior acceptance) hits:

ensureUsageNoticeConsent returns false (src/lib/onboard/usage-notice.ts:153) → onboard.ts:6500 process.exit(1).
Plain --yes does not satisfy this — only --yes-i-accept-third-party-software / NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 does (legacy-command.ts:240-241 vs :246).
The model/image download separately requires --yes (onboard.ts:3907-3915).

So both examples need both flags, e.g.:

NEMOCLAW_PROVIDER=install-vllm nemoclaw onboard --non-interactive --yes --yes-i-accept-third-party-software

2. The "express setup" paragraph describes behavior that isn't in the code.
Lines 66–67 say the installer offers an "express setup" after the third-party notice that selects the local-inference path and policy defaults on DGX Spark/Station. I couldn't find any such flow — express in src/ only refers to the vLLM/Ollama install model-picker path, not a notice-driven onboarding mode. Please remove or rewrite this so it matches the actual wizard.

Non-blocking

Provider label: the doc shows **Local vLLM [experimental]**, but providers.ts:130-131 returns exactly "Local vLLM" (no suffix). Suggest dropping [experimental].
Test rigidity: test/dgx-local-inference-doc-copy.test.ts asserts every fenced block is bash (toEqual(new Set(["bash"]))). Fine today, but any future non-bash block (e.g. an output sample) will break it.

Verified correct

vLLM model slug + default (qwen3.6-27b, vllm-models.ts:41-43, vllm.ts:464), all env vars, install-vllm provider key, every cross-link/anchor, and the regenerated skill reference for the new page are all accurate.

Once the two required items and the conflicts/DCO are sorted, this is a clear approve.

Fixes NVIDIA#3231 Signed-off-by: Deepak Jain <deepujain@gmail.com>

deepujain · 2026-05-30T02:02:52Z

Rebased on current main, cleaned up the DGX walkthrough review items, regenerated the user skills, and added the PR-body sign-off. Focused docs copy test passes.

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread .agents/skills/nemoclaw-user-reference/references/troubleshooting.md

Comment thread docs/inference/dgx-spark-station-local-inference.mdx

wscurran added documentation Improvements or additions to documentation fix Local Models Running NemoClaw with local models Platform: DGX Spark Support for DGX Spark priority: high Important issue that should be resolved in the next release labels May 27, 2026

wscurran requested a review from miyoungc May 27, 2026 22:48

wscurran added the v0.0.55 Release target label May 27, 2026

jyaunches added R3 v0.0.56 Release target and removed v0.0.55 Release target R3 labels May 29, 2026

prekshivyas requested changes May 29, 2026

View reviewed changes

docs: add DGX local inference walkthrough

f44331f

Fixes NVIDIA#3231 Signed-off-by: Deepak Jain <deepujain@gmail.com>

deepujain force-pushed the docs/3231-dgx-local-inference branch from c2a2ebf to f44331f Compare May 30, 2026 02:01

cv added v0.0.57 Release target and removed v0.0.56 Release target labels Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add DGX local inference walkthrough (Fixes #3231)#4337

docs: add DGX local inference walkthrough (Fixes #3231)#4337
deepujain wants to merge 1 commit into
NVIDIA:mainfrom
deepujain:docs/3231-dgx-local-inference

deepujain commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

prekshivyas left a comment

Uh oh!

deepujain commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

deepujain commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Evidence it works

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

prekshivyas left a comment

Choose a reason for hiding this comment

Before review can pass

Required changes before merge

Non-blocking

Verified correct

Uh oh!

deepujain commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

deepujain commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading