Skip to content

fix(shields): exec direct sandbox container on VM driver (#4245)#4290

Closed
yimoj wants to merge 2 commits into
NVIDIA:mainfrom
yimoj:fix/4245-vm-driver-shields-exec
Closed

fix(shields): exec direct sandbox container on VM driver (#4245)#4290
yimoj wants to merge 2 commits into
NVIDIA:mainfrom
yimoj:fix/4245-vm-driver-shields-exec

Conversation

@yimoj
Copy link
Copy Markdown
Contributor

@yimoj yimoj commented May 27, 2026

Summary

nemoclaw <sandbox> shields up/down on macOS Docker Desktop with the OpenShell VM driver was routing privileged execs through docker exec openshell-cluster-nemoclaw kubectl exec .... VM-driver gateways have no k3s cluster container, so shields could neither lock nor unlock config (and host-initiated config writes hit the same path).

Extract a shared resolver/dispatcher so shields/index.ts and sandbox/config.ts both treat openshellDriver === "vm" as a direct-container driver and only fall back to the legacy openshell-cluster-nemoclaw kubectl path for gateways that actually use it.

Related Issue

Closes #4245

Changes

  • New src/lib/sandbox/privileged-container.ts exports selectPrivilegedSandboxContainer, resolvePrivilegedSandboxContainer, buildDirectContainerExecArgv, buildKubectlExecArgv, and buildPrivilegedExecArgv. docker and vm are the direct-container drivers; everything else falls back to kubectl through the legacy gateway container.
  • src/lib/shields/index.ts and src/lib/sandbox/config.ts delegate privileged-exec resolution to the shared module so they cannot drift on driver gating or argv shape.
  • Longest-known-name disambiguation is preserved so a sandbox name that is a prefix of another sandbox (my vs my-assistant) cannot steal the longer sandbox's container; the exact openshell-<name> always beats a suffixed sibling.
  • Added unit tests:
    • test/privileged-container.test.ts (20 tests) — pure selector, argv builders, driver-class detection, prefix-collision and exact-name precedence.
    • test/shields-vm-driver-argv.test.ts (6 tests) — integration coverage proving shields.privilegedSandboxExecArgv and sandbox/config.privilegedSandboxExecArgv route to docker exec --user root <openshell-sandbox-container> ... on vm/docker drivers, fall back to kubectl for kubernetes, and refuse to steal a prefix-sibling's container.
    • test/config-set.test.ts — two additional VM-driver cases on the existing selectDockerDriverSandboxContainer helper.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npm run typecheck:cli clean.
  • Targeted tests pass (96 tests across test/privileged-container.test.ts, test/shields-vm-driver-argv.test.ts, test/config-set.test.ts).
  • Broader shields+sandbox+config subset green (173 tests across src/lib/shields/, src/lib/sandbox/, test/rebuild-shields-*, test/config-*, test/privileged-container, test/shields-vm-driver-argv).
  • Tests added for new and changed behavior.
  • No secrets, API keys, or credentials committed.
  • Full npx prek run --all-files not run in this worktree (hooks-managed; happens at commit).
  • Live macOS Docker Desktop VM-driver E2E not exercised — that hardware/OpenShell combo was not available from this triage workspace. Coverage is via the targeted unit + integration tests above; please run shields up/down on a VM-driver sandbox before merge if possible.
  • No user-facing docs changes; the bug is an internal runtime path. Skipping docs/skill regeneration.

Follow-up out of scope

src/lib/actions/sandbox/host-aliases.ts hardcodes the same K3S_CONTAINER = "openshell-cluster-nemoclaw" and unconditionally routes kubectl through it. That command is going to fail the same way on a VM-driver gateway. Filing a separate follow-up rather than expanding the scope of this fix.


🤖 Generated with Claude Code

Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

  • Refactor
    • Centralized sandbox container resolution and privileged exec argument construction for Docker/VM drivers, improving container selection and exec path handling.
  • Bug Fixes
    • Avoid ambiguous container matches to prevent executing against the wrong sandbox.
  • Tests
    • Added unit tests covering driver detection, container selection, exec-argv construction, and disambiguation scenarios.

Review Change Stack

`nemoclaw <sandbox> shields up/down` on macOS Docker Desktop with the
OpenShell VM driver routed privileged execs through
`docker exec openshell-cluster-nemoclaw kubectl exec ...`. VM-driver
gateways have no k3s cluster container, so shields could neither lock
nor unlock config. The host-side config writer in
`sandbox/config.ts` had the same `openshellDriver === "docker"` gate
and was broken on the same path.

Extract a shared `privileged-container` resolver and argv dispatcher
that treats both `docker` and `vm` as direct-container drivers and
keeps the legacy `openshell-cluster-nemoclaw` kubectl path only for
gateways that actually use it. Preserve longest-known-name
disambiguation so sandbox names that are prefixes of other sandbox
names (`my` vs `my-assistant`) cannot steal each other's containers.

Add unit coverage:
- shields and config `privilegedSandboxExecArgv` on `vm` and `docker`
  drivers target `docker exec --user root <container> ...`
- legacy `kubernetes` driver still threads `-i` to both docker exec
  and kubectl exec
- prefix-collision sandbox names fall back to kubectl when no exact
  container exists

Closes NVIDIA#4245

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e659fc0f-783f-43dd-bb09-721077e0e1c6

📥 Commits

Reviewing files that changed from the base of the PR and between 76010b4 and 11776b5.

📒 Files selected for processing (1)
  • test/shields-vm-driver-argv.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/shields-vm-driver-argv.test.ts

📝 Walkthrough

Walkthrough

Extracts privileged sandbox container resolution and exec-argv construction into a shared helper that detects direct-container drivers (docker, vm), resolves target containers, builds docker exec or legacy kubectl exec argv, and updates shields and config to delegate to the new utilities.

Changes

Privileged Container Execution Refactoring

Layer / File(s) Summary
Privileged container resolver and argv builders
src/lib/sandbox/privileged-container.ts
New shared module detecting direct-container OpenShell drivers (docker, vm), resolving target sandbox containers via registry and Docker queries with prefix disambiguation, and building exec arguments for direct docker exec --user root and legacy docker execkubectl exec paths. Exports driver detection, container selection, argv builders, and legacy gateway container constant.
Config module privileged exec delegation
src/lib/sandbox/config.ts
Imports shared helpers, removes local k3s container constants and registry lookups, updates selectDockerDriverSandboxContainer signature to accept knownSandboxNames, and delegates exec-argv construction to buildPrivilegedExecArgv.
Shields module privileged exec delegation
src/lib/shields/index.ts
Imports shared helpers and replaces in-file container discovery with privilegedSandboxExecArgv wrapper that resolves the privileged container and delegates argv construction; exports the wrapper for tests/external consumers.
Unit tests for privileged container module
test/privileged-container.test.ts
Vitest suite validating driver detection (docker/vm), container selection (exact match, suffix fallback, prefix collision), direct-container and kubectl exec argv generation, stdin (-i) threading, and legacy gateway constant value.
Config module test coverage
test/config-set.test.ts
Adds VM-driver sandbox selection and prefix-disambiguation tests to ensure ambiguous sandbox name prefixes return null and avoid mis-targeting.
Integration tests for shields and config argv
test/shields-vm-driver-argv.test.ts
Regression tests confirming shields and config target direct vm/docker containers when available, fall back for legacy drivers, and thread -i consistently through outer and inner exec paths.

Sequence Diagram

sequenceDiagram
  participant Shields as shields.privilegedSandboxExecArgv
  participant Shared as privileged-container
  participant Registry as Registry
  participant Docker as Docker
  participant DirectExec as docker exec<br/>--user root
  participant GatewayExec as docker exec + kubectl<br/>legacy gateway

  Shields->>Shared: buildPrivilegedExecArgv(sandbox, cmd, directContainer)
  Shared->>Shared: isDirectContainerDriver(driver)?
  alt Direct Driver (docker/vm)
    Shared->>Registry: Query sandbox driver
    Shared->>Docker: docker ps (running containers)
    Shared->>Shared: selectPrivilegedSandboxContainer<br/>(resolve target container)
    Shared->>DirectExec: buildDirectContainerExecArgv<br/>(docker exec --user root)
  else Legacy Driver (k3s or unknown)
    Shared->>GatewayExec: buildKubectlExecArgv<br/>(docker exec + kubectl)
  end
  Shared-->>Shields: argv: string[]
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

fix, NemoClaw CLI, Sandbox, OpenShell, Docker

Suggested reviewers

  • ericksoa
  • cv
  • jyaunches

"I'm a rabbit in the codebase burrow,
I hopped to fix the VM's sorrow,
Found containers direct and true,
Shields now rise on macOS too,
Hooray — no more gateway woe!" 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(shields): exec direct sandbox container on VM driver (#4245)' accurately and concisely describes the main change: fixing shields to directly target sandbox containers on VM drivers instead of routing through a non-existent gateway container.
Linked Issues check ✅ Passed The PR fully addresses the coding requirements from Issue #4245: it implements direct container targeting for docker and vm drivers, preserves prefix disambiguation logic, and removes dependency on the non-existent k3s gateway container on VM-driver systems.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the issue: refactoring privileged-exec resolution, delegating logic to a shared helper module, updating related call-sites, and adding comprehensive tests—no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread test/shields-vm-driver-argv.test.ts Fixed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/shields-vm-driver-argv.test.ts`:
- Around line 47-65: stubRegistry currently mocks registry.getSandbox to only
return a sandbox when the exact name equals "my-assistant", which makes prefix
lookups (e.g., "my") fail and produces false-positive fallback behavior; update
the mock implementation in stubRegistry so registry.getSandbox(name) returns the
sandbox when the name equals the sandboxName OR when the provided name is a
prefix of sandboxName (also ensure listSandboxes returns the provided
opts.sandboxNames mapped to objects with openshellDriver), and keep the
dockerRun.dockerCapture mock returning opts.containerNames; update references:
stubRegistry, registry.getSandbox, registry.listSandboxes,
dockerRun.dockerCapture.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8668daec-f148-4d3c-9ab7-f2dff174222a

📥 Commits

Reviewing files that changed from the base of the PR and between a784f47 and 76010b4.

📒 Files selected for processing (6)
  • src/lib/sandbox/config.ts
  • src/lib/sandbox/privileged-container.ts
  • src/lib/shields/index.ts
  • test/config-set.test.ts
  • test/privileged-container.test.ts
  • test/shields-vm-driver-argv.test.ts

Comment thread test/shields-vm-driver-argv.test.ts
…test

CodeRabbit on NVIDIA#4290 pointed out that `stubRegistry` only registered a
sandbox entry for `"my-assistant"`. In the prefix-collision case the
test looks up `"my"`, so `registry.getSandbox("my")` returned null and
`resolvePrivilegedSandboxContainer` short-circuited before
`selectPrivilegedSandboxContainer`'s longest-known-name heuristic ever
ran. The test was passing for the wrong reason.

Mock `getSandbox` for every name in `sandboxNames` so the prefix test
actually hits the disambiguation logic and would catch a regression in
the longest-known-name selector. Also drop the unused `beforeEach`
import flagged by CodeQL.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@yimoj
Copy link
Copy Markdown
Contributor Author

yimoj commented May 27, 2026

Closing — #4287 just landed on main (commit 984b2f8) and closed the same issue #4245. The two PRs converged on the same fix shape (extract a shared resolver, treat the VM driver as a direct-container driver, disambiguate prefix-colliding sandbox names), and #4287 went further by removing the legacy k3s/kubectl fallback entirely and adding test/e2e/test-vm-driver-privileged-exec-routing.sh, so this PR is strictly superseded.

For the maintainers' awareness, the follow-up I called out in the PR description still applies after #4287: src/lib/actions/sandbox/host-aliases.ts still hardcodes K3S_CONTAINER = "openshell-cluster-nemoclaw" and unconditionally routes kubectl through it (line 9 / 72), so nemoclaw <sandbox> host-aliases will fail the same way on a VM-driver gateway. Happy to file a separate issue for that if it would help.

Signed-off-by: Yimo Jiang yimoj@nvidia.com

@yimoj
Copy link
Copy Markdown
Contributor Author

yimoj commented May 27, 2026

Superseded by #4287 (merged 2026-05-27 04:27 UTC, commit 984b2f8). See last comment for details.

@yimoj yimoj closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[macOS][Security] nemoclaw shields up/down fails with "No such container: openshell-cluster-nemoclaw" on Docker Desktop VM driver

3 participants