Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .agents/skills/nemoclaw-user-configure-inference/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: "nemoclaw-user-configure-inference"
description: "Connects NemoClaw to a local inference server. Use when setting up Ollama, vLLM, TensorRT-LLM, NIM, or any OpenAI-compatible local model server with NemoClaw. Trigger keywords - nemoclaw local inference, ollama nemoclaw, vllm nemoclaw, local model server, openai compatible endpoint, switch nemoclaw inference model, change inference runtime, nemoclaw additional model, nemoclaw sub-agent model, openclaw sub-agent, agents.list, sessions_spawn, vlm-demo, nemoclaw inference options, nemoclaw onboarding providers, nemoclaw inference routing, nemoclaw tool calling, ollama tool calls, vllm tool-call-parser, raw json in tui."
description: "Connects NemoClaw to a local inference server. Use when setting up Ollama, vLLM, TensorRT-LLM, NIM, or any OpenAI-compatible local model server with NemoClaw. Trigger keywords - nemoclaw local inference, ollama nemoclaw, vllm nemoclaw, local model server, openai compatible endpoint, switch nemoclaw inference model, change inference runtime, nemoclaw additional model, nemoclaw sub-agent model, openclaw sub-agent, agents.list, sessions_spawn, vlm-demo, nemoclaw dgx spark local inference, nemoclaw dgx station vllm, nemoclaw spark ollama, nemoclaw cdi gpu setup, nemoclaw inference options, nemoclaw onboarding providers, nemoclaw inference routing, nemoclaw tool calling, ollama tool calls, vllm tool-call-parser, raw json in tui."
license: "Apache-2.0"
---

Expand Down Expand Up @@ -453,11 +453,13 @@ If the provider itself needs to change (for example, switching from vLLM to a cl

- **Load [references/switch-inference-providers.md](references/switch-inference-providers.md)** when switching inference providers, changing the model runtime, or reconfiguring inference routing. Changes the active inference model without restarting the sandbox.
- **Load [references/set-up-sub-agent.md](references/set-up-sub-agent.md)** when users ask how to add a second model, configure a sub-agent model, use Omni for vision tasks, configure agents.list, or use sessions_spawn in NemoClaw. Shows the NemoClaw-specific file paths and update flow for adding an auxiliary OpenClaw sub-agent model.
- **Load [references/dgx-spark-station-local-inference.md](references/dgx-spark-station-local-inference.md)** when preparing DGX hardware, choosing Ollama or managed vLLM, checking GPU/CDI prerequisites, verifying the OpenShell gateway and local inference route, or troubleshooting CoreDNS, k3s image pull, CDI, or port 3000 conflicts. Guides DGX Spark and DGX Station users through end-to-end local inference setup with NemoClaw.
- **Load [references/inference-options.md](references/inference-options.md)** when explaining which providers are available, what the onboard wizard presents, or how inference routing works. Lists all inference providers offered during NemoClaw onboarding.
- **[references/tool-calling-reliability.md](references/tool-calling-reliability.md)** — Explains Ollama tool-call leak symptoms, when vLLM with a tool-call parser is recommended, and how to repoint NemoClaw to a parser-aware local endpoint.

## Related Skills

- [Set Up DGX Spark or DGX Station Local Inference](references/dgx-spark-station-local-inference.md) for an end-to-end DGX hardware walkthrough.
- [Inference Options](references/inference-options.md) for the full list of providers available during onboarding.
- [Tool-Calling Reliability](references/tool-calling-reliability.md) for diagnosing raw JSON tool-call output with local models.
- [Switch Inference Models](references/switch-inference-providers.md) for runtime model switching.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
<!-- SPDX-License-Identifier: Apache-2.0 -->
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# Set Up DGX Spark or DGX Station Local Inference

Use this guide when you want NemoClaw to run with local inference on DGX Spark or DGX Station.
It pulls together the host checks, provider choice, onboarding flow, and the common Spark-specific failure modes that are otherwise spread across the quickstart, local inference, and troubleshooting pages.

## Prerequisites

Before onboarding, verify the host basics:

- Docker is installed and running.
- Node.js 22.16 or later and npm 10 or later are available.
- The NVIDIA driver and container toolkit are installed.
- `nvidia-smi` works on the host.
- Port `3000` is free, or you are ready to choose a different dashboard port.

Run:

```bash
docker info
nvidia-smi
node --version
npm --version
```

DGX Spark and recent Docker installations can require NVIDIA Container Device Interface (CDI) specs for GPU passthrough.
NemoClaw checks and repairs the common missing-CDI case during install, but you can pre-generate the spec when needed:

```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
```

If this command is unavailable, install or repair the NVIDIA Container Toolkit before onboarding.

## Choose a Local Inference Path

DGX Spark and DGX Station have two common local-inference paths.

| Path | Best for | Notes |
|---|---|---|
| Managed vLLM | Tool-heavy agents, stronger tool-call reliability, larger GPU-backed models | Offered by default on DGX Spark and DGX Station. Uses `Qwen/Qwen3.6-27B-FP8` unless you override the registry slug. |
| Ollama | Simpler local chat, existing Ollama model libraries, quick experiments | Convenient, but some model/template combinations can emit tool calls as plain text. Use vLLM when tool-call reliability matters. |

For managed vLLM, the first run pulls the container image and model weights into local caches.
Plan for a long first run on fresh systems.

For Ollama, make sure only one daemon owns port `11434`.
If another runtime is already using that port, stop it or move one service before onboarding.

## Run Onboarding

Start the standard onboard wizard:

```bash
nemoclaw onboard
```

On DGX Spark and DGX Station, the interactive wizard prompts for the provider and policy choices after the third-party software notice.
Choose the local-inference path and review the suggested policy defaults before NemoClaw creates the sandbox.

If you prefer to choose manually:

1. Select the local provider you want: **Local vLLM** or **Local Ollama**.
2. For managed vLLM, accept the default model or set `NEMOCLAW_VLLM_MODEL` before running onboarding.
3. For Ollama, choose an installed model or a starter model that fits available memory.
4. Let NemoClaw validate the local endpoint before it creates the sandbox.

For non-interactive managed vLLM setup on DGX Spark or DGX Station:

```bash
NEMOCLAW_PROVIDER=install-vllm nemoclaw onboard --non-interactive --yes --yes-i-accept-third-party-software
```

To choose a supported managed-vLLM model:

```bash
NEMOCLAW_PROVIDER=install-vllm \
NEMOCLAW_VLLM_MODEL=qwen3.6-27b \
nemoclaw onboard --non-interactive --yes --yes-i-accept-third-party-software
```

Supported managed-vLLM slugs are listed in [Use a Local Inference Server](../SKILL.md#override-the-managed-vllm-model).

## Verify the Setup

After onboarding completes, check the sandbox and local inference route:

```bash
nemoclaw <sandbox-name> status
nemoclaw <sandbox-name> doctor
```

Healthy output should show:

- The sandbox is running.
- The dashboard is reachable.
- The selected inference provider is healthy.
- For Ollama, the authenticated proxy health line is healthy when the proxy token is available.

Open the TUI:

```bash
nemoclaw <sandbox-name> connect
openclaw tui
```

Ask for a small tool-using action.
If you see raw JSON tool calls printed as chat text, switch to vLLM with a parser-aware model path and review [Tool-Calling Reliability](tool-calling-reliability.md).

## Common DGX Spark and Station Fixes

### CoreDNS CrashLoop

If CoreDNS in the embedded k3s cluster crashes shortly after setup, run the CoreDNS fix script referenced by the troubleshooting guide, then recreate the sandbox.
The issue is usually a resolver path that points at `127.0.0.11`, which does not route inside the gateway container.

### k3s Image Pull or Upload Takes Too Long

Fresh systems may spend several minutes pulling images, uploading layers to the OpenShell gateway, or loading model weights.
If readiness times out while the host is still doing real work, raise both local inference and sandbox readiness budgets:

```bash
export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300
export NEMOCLAW_SANDBOX_READY_TIMEOUT=600
nemoclaw onboard
```

### CDI GPU Errors

If gateway startup reports `unresolvable CDI devices nvidia.com/gpu=all`, regenerate CDI specs and rerun onboarding:

```bash
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
nemoclaw onboard
```

If the error persists, repair the NVIDIA Container Toolkit installation and verify that `docker info` reports the expected CDI spec directories.

### Port 3000 Conflict

Some Spark systems already run services on port `3000`.
Set a different dashboard port before onboarding:

```bash
export NEMOCLAW_DASHBOARD_PORT=18789
nemoclaw onboard
```

Use a free port that does not overlap the configured gateway, vLLM, Ollama, or Ollama proxy ports.

## Next Steps

- [Use a Local Inference Server](../SKILL.md) for full Ollama, vLLM, NIM, and compatible-endpoint details.
- [Tool-Calling Reliability](tool-calling-reliability.md) for choosing between Ollama and parser-aware vLLM.
- Troubleshooting (use the `nemoclaw-user-reference` skill) for deeper DGX Spark failure-mode guidance.
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,46 @@ When you select it, NemoClaw starts the router proxy on the host, waits for its
The sandbox does not call the router port directly.

The router model pool lives in `nemoclaw-blueprint/router/pool-config.yaml`.
Edit that file to define which models the router can choose from.
The default pool routes between NVIDIA-hosted Nemotron models and uses the `tolerance` value to choose the lowest-cost model whose predicted quality stays within the configured threshold.

```yaml
routing:
method: prefill
checkpoint: llm-router/checkpoints/prefill_router_qwen08b.pt
tolerance: 0.20
encoder: Qwen/Qwen3.5-0.8B

models:
- name: nano
litellm_model: "openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B"
cost_per_m_input_tokens: 0.05
api_base: "https://inference-api.nvidia.com"

- name: super
litellm_model: "openai/nvidia/nvidia/nemotron-3-super-v3"
cost_per_m_input_tokens: 0.10
api_base: "https://inference-api.nvidia.com"
```
The `tolerance` parameter controls the accuracy-cost tradeoff.

| Value | Behavior |
|-------|----------|
| `0.0` | Always pick the most accurate model. |
| `0.20` | Allow up to 20 percentage points below the best for a cheaper model (default). |
| `1.0` | Always pick the cheapest model. |

The router runs on the host, not inside the sandbox.

```text
Sandbox (agent) ──> OpenShell Gateway (L7 proxy) ──> Model Router (:4000) ──> NVIDIA API
└── PrefillRouter selects model
```

Credentials flow through the OpenShell provider system.
The sandbox never sees raw API keys.

To use the router in scripted setup, set:

```console
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,15 @@ For sensitive workloads, use a reviewed host-side immutability workflow after in

- **DAC permissions (default).** The sandbox user owns `/sandbox/.openclaw` with mode `2770` (setgid `sandbox:sandbox`) and `openclaw.json` with mode `660`, so the agent and its group can read and write config directly. A reviewed host-side immutability workflow should compare the intended ownership and mode with the live sandbox filesystem before treating the config tree as locked.
- **Config integrity hash.** The image includes a SHA256 hash of `openclaw.json`. In the default mutable state, `.config-hash` is sandbox-owned and is not a tamper-proof trust anchor, so startup does not fail closed on that hash. When the hash is root-owned and read-only, startup enforces it and refuses to start if the hash does not match.
- **Content seal under shields up.**
When `nemoclaw <name> shields up` runs against a clean lock, it captures a SHA-256 seal of `openclaw.json` and any other locked files into the host-side shields state file.
On sealed sandboxes, every `shields status` call recomputes the hash inside the sandbox and surfaces drift on any mismatch, so a host-root tamper that flips perms back to `444 root:root` after rewriting the file is still flagged.
Sandboxes locked before this seal landed have no recorded hash; perm-only verification cannot prove their bytes match the image-original, so the seal is **not** a retroactive proof of integrity for legacy state.
The same refusal applies to partial seals where the locked file set grew after the existing seal was captured (some entries sealed, some missing).
By default, `shields up` refuses to seal in either case and asks you to rebuild the sandbox first for a known-good baseline.
`shields status` on a legacy lockdown surfaces `UP (UNSEALED — content integrity unknown for legacy lockdown)` and exits with status 2 so scripts treat it as a failure until the operator seals an explicit baseline.
If you explicitly trust the current bytes, opt in via `NEMOCLAW_SHIELDS_ACCEPT_LEGACY_BASELINE=1`, which captures a seal over the current files and is acknowledged in the log line.
Once a sandbox is sealed, `shields up` refuses to re-seal a tampered baseline; restore the original file or rebuild the sandbox before re-running.
- **Gateway token environment.** The gateway exports `OPENCLAW_GATEWAY_TOKEN` and writes it to `/tmp/nemoclaw-proxy-env.sh` for interactive sandbox sessions. Keep this in mind when deciding whether a workload should run with mutable config or an immutable config posture.

| Aspect | Detail |
Expand Down
1 change: 1 addition & 0 deletions .agents/skills/nemoclaw-user-get-started/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ $ curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
On DGX Spark, DGX Station, and Windows WSL, an interactive installer offers express install after you accept the third-party software notice.
Express install switches onboarding to non-interactive mode, allows `sudo` password prompts for required host changes, and selects the managed local inference path for that platform.
Unless `NEMOCLAW_POLICY_TIER` is set, it applies sandbox policy in `suggested` mode with the `balanced` tier by default, using the base sandbox policy plus supported package, model, web-search, and local-inference presets.
On DGX Spark, express install uses `my-spark-assistant` as the sandbox name unless `NEMOCLAW_SANDBOX_NAME` is already set.
On WSL, express install selects the Windows-host Ollama setup path.
Set `NEMOCLAW_NO_EXPRESS=1` to skip the express prompt, or set `NEMOCLAW_PROVIDER` before launching the installer when you want to choose a provider yourself.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ The table is generated from [`ci/platform-matrix.json`](https://github.com/NVIDI
|----|-------------------|--------|-------|
| Linux | Docker | Tested | Primary tested path. |
| macOS (Apple Silicon) | Colima, Docker Desktop | Tested with limitations | Install Xcode Command Line Tools (`xcode-select --install`) and start the runtime before running the installer. |
| DGX Spark | Docker | Tested | Use the standard installer and `nemoclaw onboard`. For an end-to-end walkthrough with local Ollama inference, see the [NVIDIA Spark playbook](https://build.nvidia.com/spark/nemoclaw). |
| DGX Spark | Docker | Tested | Use the standard installer and `nemoclaw onboard`. For local inference, see Set Up DGX Spark or DGX Station Local Inference (use the `nemoclaw-user-configure-inference` skill). |
| Windows WSL2 | Docker Desktop (WSL backend) | Tested with limitations | Requires WSL2 with Docker Desktop backend. |

## Next Steps
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,6 @@
Use NemoHermes when you want NemoClaw to create an OpenShell sandbox that runs Hermes instead of the default OpenClaw agent.
The `nemohermes` command is an alias for `nemoclaw` with the Hermes agent pre-selected.

**Experimental Feature:**

The Hermes agent option is experimental.
Interfaces, defaults, and supported features may change without notice, and it is not recommended for production use.

Review the [Prerequisites](prerequisites.md) before starting.
Docker must be installed, running, and reachable from the current shell before Hermes onboarding can build the sandbox image.
On Linux, the installer can install Docker, start the service, and add your user to the `docker` group.
Expand Down
10 changes: 9 additions & 1 deletion .agents/skills/nemoclaw-user-manage-sandboxes/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ Re-run the installer.
Before it onboards anything, the installer calls `nemoclaw backup-all` (use the `nemoclaw-user-reference` skill) automatically, storing a snapshot of each running sandbox in `~/.nemoclaw/rebuild-backups/` as a safety net.
If your existing gateway is from OpenShell earlier than `0.0.37`, the installer prompts before it runs the new automatic gateway upgrade path.
The automatic path is offered only when the existing `nemoclaw` CLI supports `backup-all`; older installs must preserve sandbox state manually before retiring the gateway.
For unattended installs, set `NEMOCLAW_ACCEPT_EXPERIMENTAL_OPENSHELL_UPGRADE=1`, or manually run `nemoclaw backup-all` and `openshell gateway destroy -g nemoclaw || openshell gateway destroy` before rerunning the installer as `curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_OPENSHELL_UPGRADE_PREPARED=1 bash`.
For unattended installs, set `NEMOCLAW_ACCEPT_EXPERIMENTAL_OPENSHELL_UPGRADE=1`, or manually run `nemoclaw backup-all`, `openshell gateway remove nemoclaw || openshell gateway destroy -g nemoclaw || openshell gateway destroy` (both verbs are tried so the right one runs on either OpenShell release), and `sudo pkill -f openshell-gateway` if a privileged host gateway remains before rerunning the installer as `curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_OPENSHELL_UPGRADE_PREPARED=1 bash`.

```console
$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
Expand Down Expand Up @@ -255,6 +255,14 @@ nemoclaw uninstall
| `--keep-openshell` | Leave OpenShell binaries installed. |
| `--delete-models` | Also remove NemoClaw-pulled Ollama models. |

**Note:**

`nemoclaw uninstall` preserves `~/.nemoclaw/rebuild-backups/` (host-side snapshots that `nemoclaw <name> snapshot create` and `nemoclaw backup-all` write), `~/.nemoclaw/backups/` (workspace backups that `scripts/backup-workspace.sh` writes), and `~/.nemoclaw/sandboxes.json` (the sandbox registry) by default.
Uninstall removes every other entry under `~/.nemoclaw/`.
Interactive runs prompt before they remove the preserved entries; the default answer keeps them.
For non-interactive runs (`--yes`, `NEMOCLAW_NON_INTERACTIVE=1`, or a non-TTY shell), set `NEMOCLAW_UNINSTALL_DESTROY_USER_DATA=1` to acknowledge data loss and remove the preserved entries as well.
See `nemoclaw uninstall` (use the `nemoclaw-user-reference` skill) for the full preservation contract.

`nemoclaw uninstall` runs the version-pinned `uninstall.sh` that shipped with your installed CLI, so it does not fetch anything over the network at uninstall time.

If the `nemoclaw` CLI is missing or broken, fall back to the hosted script:
Expand Down
Loading
Loading