fix(shields,state): keep gateway readable and runtime sessions writable under shields-up by laitingsheng · Pull Request #4155 · NVIDIA/NemoClaw

laitingsheng · 2026-05-24T07:58:57Z

Summary

Under shields-up the OpenClaw gateway lost sandbox-group access to plugin/agent state dirs, so the TUI reported "plugin not found" for /nemoclaw and openclaw-weixin and crashed with EACCES: mkdir agents/main/sessions on first launch. Lock the high-risk state dirs as root:sandbox (top-level config dir stays root:root), lock secret-bearing dirs (credentials, identity, pairing) to a stricter root:root 700 with chmod -R go-rwX, restore agents/<id>/sessions to sandbox:sandbox 2770 after the lock, and refuse to lock when the state-dir symlink preflight fails or any state-dir root is itself a symlink.

Related Issue

Fixes #4065

Changes

src/lib/shields/state-dir-lock.ts (new) — lock pipeline extracted from the shields monolith: state-dir inventory (HIGH_RISK_STATE_DIRS, CONFIDENTIALITY_STATE_DIRS, WRITABLE_RUNTIME_SUBPATHS), symlink preflight, mutation scripts, post-lock stat verification, and the runtime-subpath restore helper. Preflight, mutation, verification, and restore each emit tab-prefixed marker lines so callers can surface every failure (exec errors, mutation failures, mid-lock symlink swaps, mode/owner drift, restore-time mkdir/chown/chmod failures, restore-time stat mismatches) as lock issues. All embedded scripts end with exit 0 so a trailing [ … ] test cannot leak a non-zero status to the privileged exec layer.
src/lib/shields/index.ts — lockAgentConfig hoists preflightStateDirLock before any chmod/chown so file mutations cannot leak out if a state-dir root is a symlink. Refers to the new state-dir-lock module via a small stateDirLockExec adapter; the shields entrypoint stays focused on shields state transitions.
test/shields-up-runtime-perms.test.ts (new) — regression coverage for state-dir lock ownership, the agents/*/sessions restore loop, the preflight symlink guard, and atomicity within the state-dir fan-out when a symlinked root is detected (file-level mutations are now guarded by the hoisted preflight in lockAgentConfig).
test/repro-2681-group-writable.test.ts — updates the existing lockAgentConfig assertion to expect root:sandbox in the state-dir-lock shell command shape.
docs/security/best-practices.mdx — documents the root:sandbox posture for high-risk dirs, the stricter root:root 700 posture for secret-bearing dirs, the runtime-mutable exemptions, and the symlink preflight contract.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

New Features
- Implements a host-side state-directory lockdown workflow: locks high-risk dirs to root:sandbox, secrets to root:root (700), and restores runtime-writable session subtrees to sandbox with restrictive perms.
Bug Fixes
- shields-up performs a symlink preflight and aborts on symlinked state-dir roots with a clear error; shields-down reliably restores group-writable/setgid modes and ownership.
Documentation
- Expanded lockdown guidance and symlink failure behavior.
Tests
- Added/expanded suites for lockdown, symlink defenses, atomicity, and permission restoration.

…te root-owned subdirs in audit NemoClaw #4065: shields-up locked HIGH_RISK_STATE_DIRS as root:root, which stripped sandbox-group ownership from descendants and broke OpenClaw plugin discovery (`extensions/<plugin>/` became unreachable to the gateway, which is only granted sandbox-group access). It also left `agents/main/` non-writable to the sandbox user, so the OpenClaw TUI's lazy mkdir of `agents/main/sessions/` failed with EACCES on first launch under lockdown. Switch the state-dir lock to `root:sandbox` (top-level configDir is still `root:root`) so the gateway keeps `r-x` via the sandbox group on descendants stripped to 2750 by `chmod -R go-w`, and restore `agents/*/sessions` to `sandbox:sandbox 2770` after the main lock loop so the agent keeps writing session metadata under lockdown. NemoClaw #4059: pre-backup audit joined per-dir `find` invocations with `&&`, so a single permission-denied subdir made the whole chain exit 1 and the rebuild treated every state dir as failed. Join with `;` and wrap each `find` with `|| true` — the audit's real signal is its stdout (symlink / hardlink / special-file rows); exit codes from perm-denied root-owned subdirs are noise. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai · 2026-05-24T08:01:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1fc1ad8c-f995-49bd-9b47-87a8c736ff8b

📥 Commits

Reviewing files that changed from the base of the PR and between 6d8233d and d2d24d7.

📒 Files selected for processing (2)

docs/security/best-practices.mdx
src/lib/shields/index.ts

✅ Files skipped from review due to trivial changes (1)

docs/security/best-practices.mdx

📝 Walkthrough

Walkthrough

Extract state-dir locking into a shared module with a PrivilegedExec adapter, add symlink preflight checks, implement applyStateDirLockMode (lock/unlock, verify, restore writable runtime subpaths), wire into shields up/down, and add tests and docs validating atomicity and symlink safety.

Changes

Shields Lockdown Refactoring and Runtime Permission Hardening

Layer / File(s)	Summary
PrivilegedExec interface and directory inventories `src/lib/shields/state-dir-lock.ts`	Add `PrivilegedExec` interface and constants enumerating `HIGH_RISK_STATE_DIRS`, `CONFIDENTIALITY_STATE_DIRS`, and `WRITABLE_RUNTIME_SUBPATHS`.
Preflight symlink checks and exported API `src/lib/shields/state-dir-lock.ts`	Export `preflightStateDirLock` that runs a symlink preflight and returns issue strings; expose the public API surface for state-dir locking.
Lock/unlock mutation scripts, verification, and restoration `src/lib/shields/state-dir-lock.ts`	Implement `applyStateDirLockMode` to run lock/unlock shell passes (high-risk + workspace glob, confidentiality), parse marker output for symlink/mutation errors, verify post-lock ownership/modes, and restore runtime-writable subpaths (`agents/*/sessions` -> `sandbox:sandbox 2770`) on successful locks.
shields index integration and adapters `src/lib/shields/index.ts`	Import `preflightStateDirLock`/`applyStateDirLockMode`; add `stateDirLockExec` adapter exposing privileged `exec`/`capture`; run preflight before mutation in `lockAgentConfig` and call `applyStateDirLockMode` with explicit `highRiskOwner` and `isLocking` from shields-up/down.
Existing test assertion updates for new ownership semantics `test/repro-2681-group-writable.test.ts`	Update shields-down assertions to filter `sh -c` invocations and assert `sandbox:sandbox` ownership, `g+rwX,o-rwx`/`2770` semantics and workspace handling; update shields-up assertion to expect `root:sandbox` owner string.
Runtime lock probe, atomicity, and fixture validations `test/shields-up-runtime-perms.test.ts`	Add subprocess probe that captures docker-exec argv, stubs stat/lsattr/sha256sum, locates generated state-dir lock/restore sh scripts, asserts `root:sandbox` semantics and symlink-guard markers, verifies workspace-* preflight ordering and top-level `chown root:root` calls, tests preflight atomicity (no mutations after symlink detection), and re-executes extracted scripts against fixtures to validate sessions creation and symlink safety.
Security docs update `docs/security/best-practices.mdx`	Expand host-side Agent Config Directory guidance to require locking high-risk subtrees to `root:sandbox` with `chmod -R go-w`, secrets to `root:root 700`, restore `agents/<id>/sessions` to `sandbox:sandbox 2770`, and abort on symlinked state-dir root with `Config not locked: state dir root is a symlink`.

Sequence Diagram

sequenceDiagram
  participant CLI as CLI (shields command)
  participant Index as shields/index
  participant StateLock as state-dir-lock module
  participant PrivExec as PrivilegedExec adapter
  participant Sandbox as docker/exec (container)
  participant FS as Host filesystem / fixtures

  CLI->>Index: run lockAgentConfig
  Index->>PrivExec: create stateDirLockExec (exec + capture)
  Index->>StateLock: preflightStateDirLock(privExec, configDir)
  StateLock->>PrivExec: capture sh -c preflight (workspace-* & roots)
  PrivExec->>Sandbox: docker exec -- sh -c preflight
  Sandbox->>PrivExec: stdout markers (symlinked roots or OK)
  StateLock->>Index: return issues[] or []
  alt issues found
    Index->>CLI: throw "Config not locked: ..."
  else no issues
    Index->>StateLock: applyStateDirLockMode(privExec, configDir, owner, isLocking=true)
    StateLock->>PrivExec: capture lock scripts (high-risk, workspace-*, confidentiality)
    PrivExec->>Sandbox: docker exec -- sh -c lock scripts
    Sandbox->>PrivExec: stdout markers (mutation results)
    StateLock->>PrivExec: capture verify + restore scripts
    PrivExec->>Sandbox: docker exec -- sh -c restore agents/*/sessions
    Sandbox->>FS: sessions dirs created with sandbox:sandbox 2770
    StateLock->>Index: return [] (no issues) or issues
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#4287: Related privileged exec integration that standardizes privileged executor usage consumed by this refactor.
NVIDIA/NemoClaw#4130: Modifies shields up/down flow and overlaps on lock/unlock lifecycle integration points.
NVIDIA/NemoClaw#4272: Overlaps with shields permission adjustments and touches the same shields index paths.

Suggested reviewers

ericksoa
cv
cjagwani

Poem

🐰 I hopped the sandbox, checked each root,
Guarded symlinks, kept the boot,
Restored the sessions soft and light,
Permissions snug — the TUI's right,
A careful hop—now everything's tight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main change: keeping gateway readable and runtime sessions writable under shields-up lockdown, which is the core objective of fixing issue `#4065`.
Linked Issues check	✅ Passed	The PR fully addresses all coding requirements from issue `#4065`: symlink preflight detection (`#4065` requirement), runtime-writable paths restoration (`#4065` requirement), selective lockdown for secrets vs high-risk dirs (`#4065` requirement), and early failure on symlinked state-dir roots (`#4065` requirement).
Out of Scope Changes check	✅ Passed	All changes are in scope: new state-dir-lock.ts module implements lockdown logic, shields/index.ts integrates symlink preflight and runtime-writable restore, tests cover regression and new functionality, and docs explain the security model.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/shields-up-lockdown-runtime-perms

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-24T08:01:20Z

E2E Advisor Recommendation

Required E2E: shields-config-e2e
Optional E2E: openclaw-onboard-security-posture-e2e, hermes-onboard-security-posture-e2e

Dispatch hint: shields-config-e2e

Auto-dispatched E2E: shields-config-e2e via nightly-e2e.yaml at d2d24d77c3dfa08dfdbca85c9272b5feb295dfe7 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

shields-config-e2e (medium (~30 min timeout)): Direct coverage for the changed shields lifecycle. It runs a live sandbox through shields status, shields up, config immutability checks, content-seal drift detection, shields down, writable-state restoration, audit trail, timer restore, and double shields-up rejection.

Optional E2E

openclaw-onboard-security-posture-e2e (high (~60 min timeout)): Useful adjacent confidence for OpenClaw runtime security posture and config-tree ownership assumptions after changing shields-related permission logic.
hermes-onboard-security-posture-e2e (high (~60 min timeout)): Useful adjacent confidence because the new state-dir inventory includes Hermes-specific directories such as plugins, platforms, pairing, profiles, and skins, even though the existing posture job does not fully exercise shields up/down.

New E2E recommendations

security-boundaries.shields-state-dir-lock (high): Existing shields-config-e2e validates config and workspace locking, but the new behavior adds many state-dir roots, root:sandbox vs root:root split ownership, confidentiality dirs, symlink-root refusal, and agents/*/sessions restoration. These are currently covered mainly by mocked/unit-style tests rather than a live sandbox E2E.
- Suggested test: Add a live shields state-dir permissions E2E that creates representative OpenClaw state dirs, runs shields up/down, verifies high-risk dirs are root:sandbox non-writable, credentials/identity/pairing are root:root 700, agents//sessions remains sandbox:sandbox 2770, and symlinked state-dir roots cause shields up to fail closed without following the symlink.
security-boundaries.hermes-shields (medium): The new inventory explicitly includes Hermes-specific state dirs, but the existing shields-config-e2e appears to exercise the default OpenClaw path. Hermes shields-up/down behavior can diverge because its config path, sensitive files, sticky root mode, plugins/platforms/pairing dirs, and runtime layout differ.
- Suggested test: Add a Hermes shields lifecycle E2E that onboards Hermes, runs shields up/down, and verifies Hermes config, plugin/runtime, platform, and pairing directory permissions match the new contract.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: shields-config-e2e

github-actions · 2026-05-24T08:01:20Z

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

ubuntu-repo-cloud-openclaw: Host-side shields lock/unlock code changed, including the new state-dir lock fan-out used by shields-up/down. The ubuntu-repo-cloud-openclaw scenario is the routed Ubuntu/OpenClaw path that includes the supplemental security-shields coverage for shields status/config consistency.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

None.

Relevant changed files

src/lib/shields/index.ts
src/lib/shields/state-dir-lock.ts

github-actions · 2026-05-24T08:02:14Z

PR Review Advisor

Findings: 4 needs attention, 7 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 10 still apply, 0 new items found

Review findings

🛠️ Needs attention

Reject descendant symlink bridges under locked high-risk state dirs (src/lib/shields/state-dir-lock.ts:414): The new preflight, mutation, and verification paths reject symlinked state-dir roots, but they do not reject symlinks below locked executable/config trees such as extensions, plugins, skills, hooks, or agents. A sandbox user can create a descendant symlink before shields-up that points to a still-mutable target; after the root is locked, plugin or agent discovery may still follow that bridge and load sandbox-controlled code while shields-up reports the tree as locked.
- Recommendation: During shields-up, either reject symlinks anywhere under locked executable/config state dirs before mutating, or freeze/replace them in a way that cannot point at sandbox-writable locations. Add a negative test with a symlink below a high-risk dir pointing to a writable target and verify shields-up refuses or neutralizes it before reporting locked.
- Evidence: preflightSymlinkedRoots() checks only "$config_dir/$dir" and "$config_dir"/workspace-* roots; verifyStateDirGroup() checks only top-level mode/owner. The tests cover symlinked roots and symlinked agents/<id> restore parents, not descendant symlink bridges inside plugin/agent trees.
Fail shields-down when state-dir restore fails (src/lib/shields/index.ts:462): unlockAgentConfig() now captures stateDirUnlockIssues, but appends them to the local errors array that is only logged as a warning. The final throw is based on a separate issues array that verifies only config files and the config root. A state-dir chown/chmod failure can therefore still allow shields-down to report success while high-risk or secret-bearing dirs remain partially locked.
- Recommendation: Append stateDirUnlockIssues to the verification issues that cause Config not unlocked, or otherwise throw when state-dir restore issues are non-empty. Add negative tests for unlock-time exec failure, emitted mutation-failed markers, and a partially restored state dir whose top-level config file/root checks still pass.
- Evidence: stateDirUnlockIssues from applyStateDirLockMode(..., false) are pushed into errors at lines 462-464, errors are only logged at lines 466-469, and the throw at line 522 checks only issues collected by later file/config-dir stat checks.
Validate the linked TUI and plugin behavior in an actual shields-up sandbox (test/shields-up-runtime-perms.test.ts:156): Issue [Brev][Security] nemoclaw shields up blocks openclaw tui with EACCES on agents/sessions directory and disables /nemoclaw slash command #4065 is about runtime behavior inside a shields-up OpenClaw sandbox: the TUI must start, create its session directory, load the NemoClaw and openclaw-weixin plugins, register /nemoclaw, and report the current shields state. The changed tests assert mocked command shapes and replay shell snippets against local fixtures, but they do not exercise the sandbox user, gateway group membership, plugin discovery, slash-command registration, or TUI startup path.
- Recommendation: Add or identify targeted runtime/integration validation that runs shields-up in a sandbox and verifies openclaw tui starts, /sandbox/.openclaw/agents/<id>/sessions is writable by the agent, the nemoclaw and openclaw-weixin plugins are discoverable, /nemoclaw shields outputs Shields: UP, high-risk dirs remain non-writable to sandbox-controlled code, symlink-root refusal works, and shields-down restores mutability.
- Evidence: The linked issue's Expected Result says "TUI starts normally, session created" and "`/nemoclaw shields` outputs "Shields: UP"". The new test suite uses Module._load mocks and local spawnSync("sh", ...) fixture replays rather than a running sandbox/TUI.
Split the new state-dir lock monolith (src/lib/shields/state-dir-lock.ts:1): The extraction reduces src/lib/shields/index.ts, but introduces a new 552-line security-sensitive module that combines state-dir inventories, shell construction, privileged mutation parsing, symlink preflight, verification, and runtime carve-out restoration. This concentrates sandbox-escape and policy-bypass logic in one large file, making review and future changes riskier.
- Recommendation: Split the module into focused units, for example inventory/constants, script builders, result parsing, verification, and runtime carve-out restoration, or otherwise offset the monolith growth so each privileged filesystem responsibility is independently auditable.
- Evidence: Trusted drift context flags src/lib/shields/state-dir-lock.ts as a new 552-line monolith; the file includes privileged chown -R/chmod -R scripts, symlink checks, verification, and writable runtime subpath restoration.

🔎 Worth checking

Source-of-truth review needed: Runtime writable subpath restore for agents/*/sessions: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: WRITABLE_RUNTIME_SUBPATHS = ["agents/*/sessions"] and restoreWritableRuntimeSubpaths() restore matching targets to sandbox:sandbox 2770; the linked issue requires real TUI session creation.
Source-of-truth review needed: Mutation-time state-dir symlink preflight and root refusal: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: preflightStateDirLock() checks root entries before file mutations and mutation scripts re-check roots; descendant symlink bridges are not covered.
Source-of-truth review needed: Tolerant shell marker parsing for state-dir mutation and restore: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: runStateDirLockScript() and restoreWritableRuntimeSubpaths() parse marker lines and catch exec errors, but added tests cover mostly command shape and happy-path fixture behavior.
Source-of-truth review needed: Secret-bearing state-dir confidentiality split: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: CONFIDENTIALITY_STATE_DIRS implements credentials, identity, and pairing, but the added test suite contains no direct assertions for those names or final permissions.
Source-of-truth review needed: Shields-down state-dir restore issue propagation: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: stateDirUnlockIssues are pushed into errors and logged, while the final throw only checks later issues for config files and config root.
Add direct tests for secret-bearing state-dir lockdown (test/shields-up-runtime-perms.test.ts:157): The implementation separates credentials, identity, and pairing into a stricter confidentiality posture, but the tests do not directly assert that these directories lock to root:root 700 and recursively strip group/other access. A later regression could reintroduce sandbox or gateway read access to token-bearing state.
- Recommendation: Add tests that inspect the confidentiality pass command shape and replay it on a fixture to confirm credentials, identity, and pairing roots become 700 root:root and descendants lose group/other read/write/execute access. Include a negative case showing high-risk code dirs remain gateway-readable while secret dirs do not.
- Evidence: CONFIDENTIALITY_STATE_DIRS contains credentials, identity, and pairing, and applyStateDirLockMode() uses root:root, 700, and go-rwX; grep of the added test found no direct assertions for those names, owners, modes, or recursive access.
Cover tolerant shell marker parsing and partial mutation failures (src/lib/shields/state-dir-lock.ts:446): The PR aggregates shell marker output for exec errors, mutation failures, verification drift, and restore failures, but the new tests do not cover several important negative paths. Without regressions for command-level throw/timeout, emitted mutation-failed, emitted restore-failed, and partial recursive chmod/chown failure with clean top-level stat, this localized tolerant parsing can silently regress.
- Recommendation: Add focused tests for privileged.capture() throwing, marker lines from each mutation pass, restore mkdir/chown/chmod/stat failures, and a partial recursive permission failure where the top-level state-dir mode/owner still match. Document the removal condition for this tolerant shell fan-out, or reduce the workaround by moving more guarantees to source-side invariants.
- Evidence: runStateDirLockScript() and restoreWritableRuntimeSubpaths() parse structured marker lines, but the changed tests cover command shape, root symlink handling, and happy-path session creation rather than failure markers or partial recursive failure cases.

🌱 Nice ideas

None.

Since last review details

Current findings:

Source-of-truth review needed: Runtime writable subpath restore for agents/*/sessions: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: WRITABLE_RUNTIME_SUBPATHS = ["agents/*/sessions"] and restoreWritableRuntimeSubpaths() restore matching targets to sandbox:sandbox 2770; the linked issue requires real TUI session creation.
Source-of-truth review needed: Mutation-time state-dir symlink preflight and root refusal: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: preflightStateDirLock() checks root entries before file mutations and mutation scripts re-check roots; descendant symlink bridges are not covered.
Source-of-truth review needed: Tolerant shell marker parsing for state-dir mutation and restore: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: runStateDirLockScript() and restoreWritableRuntimeSubpaths() parse marker lines and catch exec errors, but added tests cover mostly command shape and happy-path fixture behavior.
Source-of-truth review needed: Secret-bearing state-dir confidentiality split: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: CONFIDENTIALITY_STATE_DIRS implements credentials, identity, and pairing, but the added test suite contains no direct assertions for those names or final permissions.
Source-of-truth review needed: Shields-down state-dir restore issue propagation: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: stateDirUnlockIssues are pushed into errors and logged, while the final throw only checks later issues for config files and config root.
Reject descendant symlink bridges under locked high-risk state dirs (src/lib/shields/state-dir-lock.ts:414): The new preflight, mutation, and verification paths reject symlinked state-dir roots, but they do not reject symlinks below locked executable/config trees such as extensions, plugins, skills, hooks, or agents. A sandbox user can create a descendant symlink before shields-up that points to a still-mutable target; after the root is locked, plugin or agent discovery may still follow that bridge and load sandbox-controlled code while shields-up reports the tree as locked.
- Recommendation: During shields-up, either reject symlinks anywhere under locked executable/config state dirs before mutating, or freeze/replace them in a way that cannot point at sandbox-writable locations. Add a negative test with a symlink below a high-risk dir pointing to a writable target and verify shields-up refuses or neutralizes it before reporting locked.
- Evidence: preflightSymlinkedRoots() checks only "$config_dir/$dir" and "$config_dir"/workspace-* roots; verifyStateDirGroup() checks only top-level mode/owner. The tests cover symlinked roots and symlinked agents/<id> restore parents, not descendant symlink bridges inside plugin/agent trees.
Fail shields-down when state-dir restore fails (src/lib/shields/index.ts:462): unlockAgentConfig() now captures stateDirUnlockIssues, but appends them to the local errors array that is only logged as a warning. The final throw is based on a separate issues array that verifies only config files and the config root. A state-dir chown/chmod failure can therefore still allow shields-down to report success while high-risk or secret-bearing dirs remain partially locked.
- Recommendation: Append stateDirUnlockIssues to the verification issues that cause Config not unlocked, or otherwise throw when state-dir restore issues are non-empty. Add negative tests for unlock-time exec failure, emitted mutation-failed markers, and a partially restored state dir whose top-level config file/root checks still pass.
- Evidence: stateDirUnlockIssues from applyStateDirLockMode(..., false) are pushed into errors at lines 462-464, errors are only logged at lines 466-469, and the throw at line 522 checks only issues collected by later file/config-dir stat checks.
Validate the linked TUI and plugin behavior in an actual shields-up sandbox (test/shields-up-runtime-perms.test.ts:156): Issue [Brev][Security] nemoclaw shields up blocks openclaw tui with EACCES on agents/sessions directory and disables /nemoclaw slash command #4065 is about runtime behavior inside a shields-up OpenClaw sandbox: the TUI must start, create its session directory, load the NemoClaw and openclaw-weixin plugins, register /nemoclaw, and report the current shields state. The changed tests assert mocked command shapes and replay shell snippets against local fixtures, but they do not exercise the sandbox user, gateway group membership, plugin discovery, slash-command registration, or TUI startup path.
- Recommendation: Add or identify targeted runtime/integration validation that runs shields-up in a sandbox and verifies openclaw tui starts, /sandbox/.openclaw/agents/<id>/sessions is writable by the agent, the nemoclaw and openclaw-weixin plugins are discoverable, /nemoclaw shields outputs Shields: UP, high-risk dirs remain non-writable to sandbox-controlled code, symlink-root refusal works, and shields-down restores mutability.
- Evidence: The linked issue's Expected Result says "TUI starts normally, session created" and "`/nemoclaw shields` outputs "Shields: UP"". The new test suite uses Module._load mocks and local spawnSync("sh", ...) fixture replays rather than a running sandbox/TUI.
Split the new state-dir lock monolith (src/lib/shields/state-dir-lock.ts:1): The extraction reduces src/lib/shields/index.ts, but introduces a new 552-line security-sensitive module that combines state-dir inventories, shell construction, privileged mutation parsing, symlink preflight, verification, and runtime carve-out restoration. This concentrates sandbox-escape and policy-bypass logic in one large file, making review and future changes riskier.
- Recommendation: Split the module into focused units, for example inventory/constants, script builders, result parsing, verification, and runtime carve-out restoration, or otherwise offset the monolith growth so each privileged filesystem responsibility is independently auditable.
- Evidence: Trusted drift context flags src/lib/shields/state-dir-lock.ts as a new 552-line monolith; the file includes privileged chown -R/chmod -R scripts, symlink checks, verification, and writable runtime subpath restoration.
Add direct tests for secret-bearing state-dir lockdown (test/shields-up-runtime-perms.test.ts:157): The implementation separates credentials, identity, and pairing into a stricter confidentiality posture, but the tests do not directly assert that these directories lock to root:root 700 and recursively strip group/other access. A later regression could reintroduce sandbox or gateway read access to token-bearing state.
- Recommendation: Add tests that inspect the confidentiality pass command shape and replay it on a fixture to confirm credentials, identity, and pairing roots become 700 root:root and descendants lose group/other read/write/execute access. Include a negative case showing high-risk code dirs remain gateway-readable while secret dirs do not.
- Evidence: CONFIDENTIALITY_STATE_DIRS contains credentials, identity, and pairing, and applyStateDirLockMode() uses root:root, 700, and go-rwX; grep of the added test found no direct assertions for those names, owners, modes, or recursive access.
Cover tolerant shell marker parsing and partial mutation failures (src/lib/shields/state-dir-lock.ts:446): The PR aggregates shell marker output for exec errors, mutation failures, verification drift, and restore failures, but the new tests do not cover several important negative paths. Without regressions for command-level throw/timeout, emitted mutation-failed, emitted restore-failed, and partial recursive chmod/chown failure with clean top-level stat, this localized tolerant parsing can silently regress.
- Recommendation: Add focused tests for privileged.capture() throwing, marker lines from each mutation pass, restore mkdir/chown/chmod/stat failures, and a partial recursive permission failure where the top-level state-dir mode/owner still match. Document the removal condition for this tolerant shell fan-out, or reduce the workaround by moving more guarantees to source-side invariants.
- Evidence: runStateDirLockScript() and restoreWritableRuntimeSubpaths() parse structured marker lines, but the changed tests cover command shape, root symlink handling, and happy-path session creation rather than failure markers or partial recursive failure cases.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/lib/state/sandbox.ts (1)
1108-1122: Run the state lifecycle E2E jobs for this audit-semantic change before merge.

This changes backup audit behavior under permission-denied traversal, so it’s worth validating with state-backup-restore-e2e, snapshot-commands-e2e, and rebuild-openclaw-e2e.

As per coding guidelines: "src/lib/state/sandbox.ts: This file manages sandbox state ... E2E test recommendation: state-backup-restore-e2e, snapshot-commands-e2e, rebuild-openclaw-e2e."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/state/sandbox.ts` around lines 1108 - 1122, This change alters the
backup audit behavior in src/lib/state/sandbox.ts (see the auditCmd construction
and the "Pre-backup audit" log) so run the full state lifecycle E2E suites
before merging: execute state-backup-restore-e2e, snapshot-commands-e2e, and
rebuild-openclaw-e2e against a fresh sandbox to validate permission-denied
traversal is tolerated and legitimate rebuilds still succeed; if any test fails,
adjust the auditCmd logic or error handling around the shell-quoted dir mapping
(the code building auditCmd and the surrounding pre-backup audit flow) until the
E2E suites pass.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/snapshot.test.ts`:
- Around line 1056-1171: Update the fake ssh stub used in the two tests ("treats
audit-find exit 1 with empty stdout as a successful audit (NemoClaw `#4059`)" and
"still rejects violations from readable dirs even if a sibling find exits
non-zero") so that when the command contains "find " the script exits non-zero
(e.g., process.exit(1)) unless the command string also contains the tolerant
shape "|| true"; in other words, change the cmd.includes("find ") branch in both
writeExecutable calls to check for cmd.includes("|| true") and only return
non-zero when that tolerant token is absent, preserving existing stdout behavior
for other branches.

---

Nitpick comments:
In `@src/lib/state/sandbox.ts`:
- Around line 1108-1122: This change alters the backup audit behavior in
src/lib/state/sandbox.ts (see the auditCmd construction and the "Pre-backup
audit" log) so run the full state lifecycle E2E suites before merging: execute
state-backup-restore-e2e, snapshot-commands-e2e, and rebuild-openclaw-e2e
against a fresh sandbox to validate permission-denied traversal is tolerated and
legitimate rebuilds still succeed; if any test fails, adjust the auditCmd logic
or error handling around the shell-quoted dir mapping (the code building
auditCmd and the surrounding pre-backup audit flow) until the E2E suites pass.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b97cf8fe-d359-487b-8af3-17cb329e2c93

📥 Commits

Reviewing files that changed from the base of the PR and between 8f5db10 and 1ef364e.

📒 Files selected for processing (5)

src/lib/shields/index.ts
src/lib/state/sandbox.ts
test/repro-2681-group-writable.test.ts
test/repro-4065-shields-up-runtime-perms.test.ts
test/snapshot.test.ts

…file Address review: scrub `#4065` / `#4059` mentions from production code comments and test docstrings, and rename the new shields-up regression test from `repro-4065-…` to `shields-up-runtime-perms.test.ts` so the filename describes behaviour rather than an issue number. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-24T08:11:57Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26355812622
Target ref: 1ef364e6574a9298b239e75e6523d7966b6f3bf9
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e,rebuild-openclaw-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
rebuild-openclaw-e2e	✅ success
shields-config-e2e	✅ success
snapshot-commands-e2e	✅ success

github-actions · 2026-05-24T08:32:00Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26356219674
Target ref: 15d095f4845aef820a1b3661855c531758f15fc1
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e,rebuild-openclaw-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
rebuild-openclaw-e2e	✅ success
shields-config-e2e	✅ success
snapshot-commands-e2e	✅ success

…d test Address review: 1. `restoreWritableRuntimeSubpaths` expanded the full pattern `agents/*/sessions` as a single glob. On a fresh sandbox where `sessions` does not exist yet, the glob has no matches and the shell leaves the literal pattern, which the `*"*"*` guard then drops — so `sessions/` was never created and the post-lockdown TUI mkdir still failed with EACCES. Split each pattern into a parent glob (expanded against the existing tree) plus a leaf to create, so the helper always mkdir's the missing leaf inside every existing parent. 2. The two pre-backup audit tests stubbed the SSH fake as always-exit-0 on `find`, so the `|| true` tolerance wrapper was not actually exercised. Make the fake exit non-zero with a Permission-denied stderr unless the audit cmd includes `|| true`, so the tests fail loudly if the wrapper is dropped. 3. New behavioural test runs the actual restore-helper script body against a real filesystem fixture and asserts that `agents/main/sessions` is created when only `agents/main` exists beforehand. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/shields/index.ts`:
- Around line 490-499: The loop in restoreWritableRuntimeSubpaths
(WRITABLE_RUNTIME_SUBPATHS handling) currently treats symlinked directories as
regular dirs and thus creates/mutates sessions/ on the symlink target; update
the parent existence guard to skip symlinks (e.g., require directory AND not a
symlink) or explicitly check [ -L "$parent" ] and continue when true so
mkdir/chown/chmod are not applied to symlink targets; make this change around
the loop that iterates parents (the for parent in "$@"; do ... [ -d "$parent" ]
|| continue portion) to ensure symlinked runtime parents are ignored.

In `@test/shields-up-runtime-perms.test.ts`:
- Around line 129-135: The test extracts patterns with restoreShell.slice(4)
which includes configDir, causing configDir to be passed twice into spawnSync
and altering the argv layout; change the extraction to restoreShell.slice(5) so
that patterns contains only the glob patterns, leaving the explicit "configDir"
argument in the spawnSync call (referencing restoreShell, script, patterns, and
the spawnSync invocation).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1eb0107a-ac61-4d44-ab10-07c74795f156

📥 Commits

Reviewing files that changed from the base of the PR and between 15d095f and 31ce569.

📒 Files selected for processing (3)

src/lib/shields/index.ts
test/shields-up-runtime-perms.test.ts
test/snapshot.test.ts

github-actions · 2026-05-24T09:00:42Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26356782985
Target ref: 31ce5693e17db64ad95ce41d75623cfc296c3682
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e,rebuild-openclaw-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
rebuild-openclaw-e2e	✅ success
shields-config-e2e	✅ success
snapshot-commands-e2e	✅ success

… test argv slice Address review: 1. The privileged restore helper called `[ -d "$parent" ]` before `mkdir`/`chown`/`chmod`, but `[ -d ]` follows symlinks. A pre-lockdown agent that swapped `agents/<id>` for a symlink to an arbitrary host path could redirect the post-lock `mkdir -p ".../sessions"` and `chown -R sandbox:sandbox` through that link and rewrite ownership on any directory the privileged exec context can reach. Drop the parent (and the target leaf) when either is a symlink, before any mutation. 2. The behavioural test extracted patterns with `slice(4)`, which kept the captured `configDir` in the argv passed to bash — so the helper ran with `configDir` listed twice and the test argv diverged from the real `privilegedSandboxExec` call shape. Use `slice(5)` so only the patterns are forwarded. 3. New behavioural test asserts the symlink guard: when `agents/<id>` is a symlink to a sibling host directory, the helper must not create `sessions/` under either the link target or the link itself. Also reword one comment to avoid contested terminology. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/shields-up-runtime-perms.test.ts (1)
135-139: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use sh (not bash) when replaying the restore helper

Both restore-shell captures match sh -c, but both behavioral replays run spawnSync("bash", ["-c", ...]), which can mask /bin/sh compatibility regressions. Switch the replay shell to sh in both replay blocks (around lines 135-139 and 173-177).
♻️ Proposed fix
-    const result = spawnSync(
-      "bash",
-      ["-c", `${script}\n`, "sh", configDir, ...patterns],
-      { encoding: "utf-8", timeout: 5000 },
-    );
+    const result = spawnSync(
+      "sh",
+      ["-c", `${script}\n`, "sh", configDir, ...patterns],
+      { encoding: "utf-8", timeout: 5000 },
+    );
Apply the same change to the second replay block as well.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/shields-up-runtime-perms.test.ts` around lines 135 - 139, In
test/shields-up-runtime-perms.test.ts the replay uses spawnSync("bash", ["-c",
...]) which hides /bin/sh compatibility issues; change both replay invocations
to use spawnSync("sh", ["-c", ...]) instead (update the call sites where
spawnSync is invoked with "bash" in the two replay blocks around the current
spawnSync(...) usages so they call "sh" and keep the same args and options).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@test/shields-up-runtime-perms.test.ts`:
- Around line 135-139: In test/shields-up-runtime-perms.test.ts the replay uses
spawnSync("bash", ["-c", ...]) which hides /bin/sh compatibility issues; change
both replay invocations to use spawnSync("sh", ["-c", ...]) instead (update the
call sites where spawnSync is invoked with "bash" in the two replay blocks
around the current spawnSync(...) usages so they call "sh" and keep the same
args and options).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ea69a11f-9462-4b87-aa13-0ece1af0192b

📥 Commits

Reviewing files that changed from the base of the PR and between 31ce569 and 8608886.

📒 Files selected for processing (3)

src/lib/shields/index.ts
test/shields-up-runtime-perms.test.ts
test/snapshot.test.ts

github-actions · 2026-05-24T09:40:10Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26357693833
Target ref: 8608886b21b09df026e49326d798f54742b81072
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success
snapshot-commands-e2e	✅ success

…ed comment Address review: 1. The behavioural test for the runtime-subpath restore was running the captured script body under `bash` rather than `sh`, while the production helper invokes it via `sh -c`. A bash-only construct slipping into the helper would pass the test but break the real call site. Switch the replay to `spawnSync("sh", …)` in both fixtures so the test covers exactly what the privileged exec runs. 2. Drop the NC-2227-05 tag from the updated `lockAgentConfig` comment. The state-directory ownership story is now self-contained in `HIGH_RISK_STATE_DIRS` doc + the helper itself; the issue tag does not add information and conflicts with the no-issue-refs-in-comments guidance applied across the rest of this PR. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lib/shields/index.ts (1)

399-458: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Reject symlinked state-dir roots before recursive chmod -R/chown -R lock step

GNU coreutils defaults make chmod -R follow symlink command-line arguments, so if a configDir/<stateDir> or configDir/workspace-* entry is a symlink to a directory, the recursive chmod -R can apply outside configDir during the shields-up lock path. (chown -R defaults to non-dereference for symlink command-line args, but the recursive chmod -R is enough for this to be a security/correctness blocker.)

Suggested hardening

   for (const dirName of HIGH_RISK_STATE_DIRS) {
     const dirPath = `${configDir}/${dirName}`;
+    try {
+      privilegedSandboxExec(sandboxName, [
+        "sh",
+        "-c",
+        '[ -d "$1" ] && [ ! -L "$1" ]',
+        "sh",
+        dirPath,
+      ]);
+    } catch {
+      continue;
+    }
     try {
       privilegedSandboxExec(sandboxName, ["chown", "-R", owner, dirPath]);
     } catch {
       // Directory may not exist for this agent — silently skip
     }
@@
 for dir in "$config_dir"/workspace-*; do
+  [ -L "$dir" ] && continue
   [ -d "$dir" ] || continue
   chown -R "$owner" "$dir" 2>/dev/null || true
   chmod "$dir_mode" "$dir" 2>/dev/null || true
   [ "$clear_setgid" = "1" ] && chmod g-s "$dir" 2>/dev/null || true
   chmod -R "$recursive_mode" "$dir" 2>/dev/null || true
 done

Also applies to: 756-760

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/shields/index.ts` around lines 399 - 458, The recursive chown/chmod
operations in the HIGH_RISK_STATE_DIRS loop and the workspace-* shell block can
follow symlinked roots and affect paths outside configDir; before calling
privilegedSandboxExec for any dirPath or iterating workspace-* entries (symbols:
HIGH_RISK_STATE_DIRS, privilegedSandboxExec, configDir, dirPath, workspace-*
pattern, recursiveMode, dirMode, owner, clearSetgid, isLocking), reject/skips
any entry that is a symlink (use a non-following lstat/test -L check) so
recursive -R operations are only applied to real directories; update both the
TypeScript loop (skip when lstat indicates symlink) and the embedded shell
script (skip entries where [ -L "$dir" ] ) to avoid following symlink roots.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/lib/shields/index.ts`:
- Around line 399-458: The recursive chown/chmod operations in the
HIGH_RISK_STATE_DIRS loop and the workspace-* shell block can follow symlinked
roots and affect paths outside configDir; before calling privilegedSandboxExec
for any dirPath or iterating workspace-* entries (symbols: HIGH_RISK_STATE_DIRS,
privilegedSandboxExec, configDir, dirPath, workspace-* pattern, recursiveMode,
dirMode, owner, clearSetgid, isLocking), reject/skips any entry that is a
symlink (use a non-following lstat/test -L check) so recursive -R operations are
only applied to real directories; update both the TypeScript loop (skip when
lstat indicates symlink) and the embedded shell script (skip entries where [ -L
"$dir" ] ) to avoid following symlink roots.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cf8d1c18-3301-4936-baba-55a23d2cf2e7

📥 Commits

Reviewing files that changed from the base of the PR and between 8608886 and 1262ddf.

📒 Files selected for processing (2)

src/lib/shields/index.ts
test/shields-up-runtime-perms.test.ts

github-actions · 2026-05-24T10:27:09Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26358552843
Target ref: 1262ddfb28d5b697630c40beb935a162170ca364
Workflow ref: main
Requested jobs: shields-config-e2e,state-backup-restore-e2e,snapshot-commands-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success
snapshot-commands-e2e	✅ success
state-backup-restore-e2e	✅ success

…p lock Address review: 1. The shields-up lock loop ran `chown -R`/`chmod -R`/`chmod g-s` on each `${configDir}/${dirName}` (and on every `workspace-*` glob hit) without rejecting symlinked roots. A pre-lockdown agent that swapped e.g. `extensions/` or `workspace-main/` for a symlink to a host path could redirect those recursive ownership and mode mutations at an attacker-controlled directory. Consolidate the per-state-dir loop into a single privileged shell exec that skips symlinks (`[ -L "$path" ] && continue`) before any mutation, and add the same guard to the existing `workspace-*` shell loop. 2. Drop the `NC-2227-05` issue tag from the state-directory header comment for consistency with the rest of this PR. Updates the regression tests: - `repro-2681`: assert the unlock fan-out via the new `sh -c` script shape (workspace included as an arg, plus the workspace-* glob path still present). - `shields-up-runtime-perms`: assert the state-dir lock and workspace-* scripts both contain the `[ -L … ] && continue` guard, and add a behavioural fixture that proves a symlinked `extensions/` root is skipped (its host target keeps its original mode and file contents). Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-24T11:05:09Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26359318121
Target ref: 4bcb560e9a85f575385a155c4809e7c0d17480a1
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e,state-backup-restore-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success
snapshot-commands-e2e	✅ success
state-backup-restore-e2e	✅ success

Address review: Previously, when shields-up encountered a symlinked high-risk state-dir root (or a symlinked `workspace-*` dir), the privileged lock script silently skipped it via `[ -L "$path" ] && continue`. That refused to follow the link — good — but left the dir as-is and reported the lock as successful. The sandbox would then sit in "shields up (lockdown active)" status while a state-dir root still pointed at a writable host path, exactly the security regression the symlink guard was meant to prevent. Have the two consolidated lock shell scripts always exit 0 but emit `symlinked-root\t<path>` on stdout for every symlinked root they refuse to touch. `applyStateDirLockMode` parses those lines and returns them as lock failures when invoked under `isLocking=true`. `lockAgentConfig` now throws "Config not locked: state dir root is a symlink: …" before any further verification, refusing to acknowledge shields-up. Unlock is unchanged: skipping symlinked roots is the correct best-effort behaviour there. New regression test exercises the end-to-end path: when the captured shell script reports a symlinked root via the mocked exec, the `lockAgentConfig` call throws with the expected diagnostic. Tests for the static script shape are updated to match the new `if [ -L … ]; then printf …; fi` form instead of the previous `[ -L … ] && continue`. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-24T12:14:04Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26360739914
Target ref: b95913ad0c78d1efd972054c21318a6901c01446
Workflow ref: main
Requested jobs: shields-config-e2e,state-backup-restore-e2e,snapshot-commands-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success
snapshot-commands-e2e	✅ success
state-backup-restore-e2e	✅ success

… + doc update Address review: 1. The inline `[ -L … ] && printf 'symlinked-root\t…'` guard in the mutation script ran per-iteration. A symlinked root later in the list could still leave earlier (non-symlinked) state dirs already reowned to `root:sandbox` by the time the lock helper bailed out. shields-up would then report "config not locked" while the tree was partially mutated. Add a dedicated preflight pass that runs before any `chown`/`chmod`, scans every high-risk state-dir root *and* every `workspace-*` dir for symlinks, and returns the full list. When `isLocking=true` and the preflight finds any symlinked root, `applyStateDirLockMode` short-circuits without touching the mutation pass or the sessions-restore helper, and `lockAgentConfig` throws `Config not locked: state dir root is a symlink: …`. The inline symlink guards in the mutation scripts stay for defence-in-depth in case the preflight and the mutation observe different fs state. 2. New regression test mocks the preflight script to report a symlinked root and asserts (a) `lockAgentConfig` throws with the expected diagnostic and (b) no mutation calls (state-dir lock, workspace-* lock, or sessions-restore) were ever issued. 3. Add a second regression test that asserts a dedicated preflight script (no `chown`/`chmod`, just the `[ -L … ] && printf` checks) is present in the recorded call sequence. 4. Update `docs/security/best-practices.mdx` to document the new `root:sandbox` state-dir ownership, the `agents/<id>/sessions` runtime carve-out, and the hard fail on symlinked state-dir roots. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-24T12:36:38Z

🌿 Preview your docs: https://nvidia-preview-pr-4155.docs.buildwithfern.com/nemoclaw

github-actions · 2026-05-24T12:47:21Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26361443080
Target ref: f099b8b9a3cc84464272a3a34a478064a0f661ff
Workflow ref: main
Requested jobs: shields-config-e2e,state-backup-restore-e2e,snapshot-commands-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success
snapshot-commands-e2e	✅ success
state-backup-restore-e2e	✅ success

github-actions · 2026-05-25T03:20:47Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26381246289
Target ref: bd702919776d434b9a2f2ac2a752e0ad412b0404
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success
snapshot-commands-e2e	✅ success

github-actions · 2026-05-26T15:36:41Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26458071131
Target ref: 41a8bb3fa5bed9740ecff6fa7af41576840dc911
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success

Unapproving per maintainer request.

github-actions · 2026-05-30T18:29:27Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26691453416
Target ref: 9574166317f9901ef9d089f525e4b966cf7922ff
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success

github-actions · 2026-05-30T18:45:49Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26691795412
Target ref: ad984e90b36a9cb09ab20c305d180b48fc8f669f
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success

…leged-exec in tests Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…ip group read on credentials Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…wn-runtime-perms

coderabbitai

🧹 Nitpick comments (2)

docs/security/best-practices.mdx (2)

204-204: ⚡ Quick win

Rewrite in active voice.

Line 204 contains multiple passive constructions that should be rewritten in active voice. As per coding guidelines, active voice is required in NemoClaw documentation.

Examples from this line:

"are owned by root:sandbox" → "The lock helper owns these directories as root:sandbox"
"the sandbox user is denied write" → "the lock denies write access to the sandbox user"
"dirs that aren't present... are silently skipped" → "the lock helper silently skips dirs that aren't present"

As per coding guidelines: "Active voice required. Flag passive constructions. Bad: 'A gateway is created by the CLI.' Good: 'The CLI creates a gateway.'"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/security/best-practices.mdx` at line 204, Rewrite the sentence in active
voice throughout: have the lock helper own the high-risk state directories as
root:sandbox (instead of "are owned by root:sandbox"), state that the lock
helper lets the OpenClaw gateway (a member of the sandbox group) keep read
access while the lock denies write access to the sandbox user via chmod -R go-w,
and say the lock helper applies root:root 700 with chmod -R go-rwX to secret
directories to prevent gateway and sandbox reads; explicitly state that the lock
helper restores sandbox:sandbox 2770 on shields-down, that the list is the union
of state directories declared by every shipped agent manifest and that the lock
helper silently skips dirs not present in an agent's config tree, and finally
mark runtime-mutable subtrees (sessions/, memories/, logs/, cache/, plans/,
openclaw-weixin/) as intentionally exempt so agents can continue operating.

204-206: ⚡ Quick win

Split into one sentence per line.

Lines 204-206 contain multiple sentences on single lines, which makes diffs harder to review. As per coding guidelines, documentation source should use one sentence per line.

Proposed rewrite with one sentence per line

Line 204 should be split into approximately:

Under lockdown, the high-risk state directories (`skills`, `hooks`, `cron`, `agents`, `extensions`, `plugins`, `workspace`, `memory`, `devices`, `canvas`, `telegram`, `wechat`, `whatsapp`, `platforms`, `weixin`, `profiles`, `skins`) are owned by `root:sandbox` rather than `root:root`, so the OpenClaw gateway (a member of the `sandbox` group) keeps read access to plugin and agent code while the sandbox user is denied write through `chmod -R go-w`.
Secret-bearing directories (`credentials`, `identity`, `pairing`) get a stricter posture: `root:root 700` with `chmod -R go-rwX`, so neither the sandbox user nor the gateway can read them while shields are up.
The mutable-default posture (`sandbox:sandbox 2770`) is restored for both groups on shields-down.
The list is the union of state directories declared by every shipped agent manifest; dirs that aren't present in a given agent's config tree are silently skipped.
Runtime-mutable subtrees (`sessions/`, `memories/`, `logs/`, `cache/`, `plans/`, and `openclaw-weixin/` which is regenerated at image-build time) are intentionally exempt so the agent can keep operating.

Line 205 should be split:

A narrow set of runtime-data subpaths is exempted from the lock so the agent can keep operating — currently `agents/<agent-id>/sessions/`, which the OpenClaw TUI creates and writes session metadata into; the lock helper restores those subpaths to `sandbox:sandbox 2770` after the surrounding tree is locked.

Line 206 appears to already be a single sentence.

As per coding guidelines: "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/security/best-practices.mdx` around lines 204 - 206, Split the
multi-sentence paragraphs into one sentence per line in
docs/security/best-practices.mdx: break the paragraph that begins "Under
lockdown, the high-risk state directories..." into separate lines so each
sentence (the root:sandbox ownership sentence; the secret-bearing directories
sentence; the mutable-default posture sentence; the union-of-state-directories
sentence; the runtime-mutable subtrees sentence) is on its own line, and
likewise split the paragraph that begins "A narrow set of runtime-data
subpaths..." so that the single long sentence about agents/<agent-id>/sessions/
and the lock helper restoring sandbox:sandbox 2770 is on its own line; preserve
exact directory and permission strings (`skills`, `hooks`, `agents`,
`credentials`, `identity`, `pairing`, `sandbox:sandbox 2770`, `root:root 700`,
etc.) and do not change wording other than inserting line breaks.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/security/best-practices.mdx`:
- Line 204: Rewrite the sentence in active voice throughout: have the lock
helper own the high-risk state directories as root:sandbox (instead of "are
owned by root:sandbox"), state that the lock helper lets the OpenClaw gateway (a
member of the sandbox group) keep read access while the lock denies write access
to the sandbox user via chmod -R go-w, and say the lock helper applies root:root
700 with chmod -R go-rwX to secret directories to prevent gateway and sandbox
reads; explicitly state that the lock helper restores sandbox:sandbox 2770 on
shields-down, that the list is the union of state directories declared by every
shipped agent manifest and that the lock helper silently skips dirs not present
in an agent's config tree, and finally mark runtime-mutable subtrees (sessions/,
memories/, logs/, cache/, plans/, openclaw-weixin/) as intentionally exempt so
agents can continue operating.
- Around line 204-206: Split the multi-sentence paragraphs into one sentence per
line in docs/security/best-practices.mdx: break the paragraph that begins "Under
lockdown, the high-risk state directories..." into separate lines so each
sentence (the root:sandbox ownership sentence; the secret-bearing directories
sentence; the mutable-default posture sentence; the union-of-state-directories
sentence; the runtime-mutable subtrees sentence) is on its own line, and
likewise split the paragraph that begins "A narrow set of runtime-data
subpaths..." so that the single long sentence about agents/<agent-id>/sessions/
and the lock helper restoring sandbox:sandbox 2770 is on its own line; preserve
exact directory and permission strings (`skills`, `hooks`, `agents`,
`credentials`, `identity`, `pairing`, `sandbox:sandbox 2770`, `root:root 700`,
etc.) and do not change wording other than inserting line breaks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9a7293e3-fb88-4655-80b7-2f66b6550f45

📥 Commits

Reviewing files that changed from the base of the PR and between bd70291 and 4df0f53.

📒 Files selected for processing (1)

docs/security/best-practices.mdx

github-actions · 2026-05-31T08:40:52Z

Selective E2E Results — ❌ Some jobs failed

Run: 26707810999
Target ref: 4df0f5380c3fb7684bea6c0cf91ff2ed6e2143d9
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job	Result
shields-config-e2e	❌ failure

Failed jobs: shields-config-e2e. Check run artifacts for logs.

…restore failures Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/security/best-practices.mdx`:
- Around line 210-211: The docs claim that "Shields-up exempts runtime-mutable
subtrees (`sessions/`, `memories/`, `logs/`, `cache/`, `plans/`, plus
`openclaw-weixin/`)", which contradicts the actual shields behavior and the lock
helper that only preserves `agents/<agent-id>/sessions/` (restored to
`sandbox:sandbox 2770`); update the text to remove the outdated multi-subtree
exemption list and replace it with the accurate statement that the only runtime
carve-out is `agents/<agent-id>/sessions/`, and ensure the sentence references
the lock helper behavior (restores those subpaths to `sandbox:sandbox 2770`) so
operators aren’t misled by the old list.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f8fe9182-7699-41f7-98c2-a3a5ffdd6293

📥 Commits

Reviewing files that changed from the base of the PR and between 4df0f53 and 6d8233d.

📒 Files selected for processing (4)

docs/security/best-practices.mdx
src/lib/shields/index.ts
src/lib/shields/state-dir-lock.ts
test/shields-up-runtime-perms.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

test/shields-up-runtime-perms.test.ts

github-actions · 2026-05-31T09:05:49Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26708316674
Target ref: 6d8233d2c209eef05577eeba22916f87eda0d115
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success

…rding Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-05-31T09:28:12Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26708769299
Target ref: d2d24d77c3dfa08dfdbca85c9272b5feb295dfe7
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
shields-config-e2e	✅ success

laitingsheng added fix Sandbox Use this label to identify issues related to the NemoClaw isolated environment based on OpenShell. labels May 24, 2026

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

Comment thread test/snapshot.test.ts Outdated

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

Comment thread src/lib/shields/index.ts Outdated

Comment thread test/shields-up-runtime-perms.test.ts

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 24, 2026

fix(sandbox): auto-unlock shields during rebuild #4130

Merged

Merge branch 'main' into fix/shields-up-lockdown-runtime-perms

bd70291

Merge branch 'main' into fix/shields-up-lockdown-runtime-perms

41a8bb3

cv added v0.0.52 Release target v0.0.53 Release target and removed v0.0.51 Release target v0.0.52 Release target labels May 26, 2026

cv previously approved these changes May 27, 2026

View reviewed changes

ericksoa added v0.0.55 Release target and removed v0.0.53 Release target labels May 27, 2026

jyaunches added R1 v0.0.56 Release target and removed v0.0.55 Release target R1 labels May 29, 2026

cv requested a review from ericksoa May 30, 2026 01:17

cv approved these changes May 30, 2026

View reviewed changes

Merge branch 'main' into fix/shields-up-lockdown-runtime-perms

9574166

Merge branch 'main' into fix/shields-up-lockdown-runtime-perms

ad984e9

laitingsheng added 3 commits May 31, 2026 07:58

fix(shields): verify state-dir lock, expand high-risk set, mock privi…

8085cd1

…leged-exec in tests Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

refactor(shields): extract state-dir lock, fail-closed preflight, str…

9a6a635

…ip group read on credentials Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Merge remote-tracking branch 'origin/main' into fix/shields-up-lockdo…

4df0f53

…wn-runtime-perms

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

fix(shields): hoist preflight, exit 0 on empty roots, verify+surface …

6d8233d

…restore failures Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

Comment thread docs/security/best-practices.mdx Outdated

fix(shields): surface unlock state-dir issues, split doc exemption wo…

d2d24d7

…rding Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Conversation

laitingsheng commented May 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 24, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented May 24, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 24, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 24, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 24, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented May 24, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented May 24, 2026

Selective E2E Results — ✅ All requested jobs passed

laitingsheng commented May 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading