Skip to content

fix(shields,state): keep gateway readable and runtime sessions writable under shields-up#4155

Open
laitingsheng wants to merge 19 commits into
mainfrom
fix/shields-up-lockdown-runtime-perms
Open

fix(shields,state): keep gateway readable and runtime sessions writable under shields-up#4155
laitingsheng wants to merge 19 commits into
mainfrom
fix/shields-up-lockdown-runtime-perms

Conversation

@laitingsheng
Copy link
Copy Markdown
Contributor

@laitingsheng laitingsheng commented May 24, 2026

Summary

Under shields-up the OpenClaw gateway lost sandbox-group access to plugin/agent state dirs, so the TUI reported "plugin not found" for /nemoclaw and openclaw-weixin and crashed with EACCES: mkdir agents/main/sessions on first launch. Lock the high-risk state dirs as root:sandbox (top-level config dir stays root:root), lock secret-bearing dirs (credentials, identity, pairing) to a stricter root:root 700 with chmod -R go-rwX, restore agents/<id>/sessions to sandbox:sandbox 2770 after the lock, and refuse to lock when the state-dir symlink preflight fails or any state-dir root is itself a symlink.

Related Issue

Fixes #4065

Changes

  • src/lib/shields/state-dir-lock.ts (new) — lock pipeline extracted from the shields monolith: state-dir inventory (HIGH_RISK_STATE_DIRS, CONFIDENTIALITY_STATE_DIRS, WRITABLE_RUNTIME_SUBPATHS), symlink preflight, mutation scripts, post-lock stat verification, and the runtime-subpath restore helper. Preflight, mutation, verification, and restore each emit tab-prefixed marker lines so callers can surface every failure (exec errors, mutation failures, mid-lock symlink swaps, mode/owner drift, restore-time mkdir/chown/chmod failures, restore-time stat mismatches) as lock issues. All embedded scripts end with exit 0 so a trailing [ … ] test cannot leak a non-zero status to the privileged exec layer.
  • src/lib/shields/index.tslockAgentConfig hoists preflightStateDirLock before any chmod/chown so file mutations cannot leak out if a state-dir root is a symlink. Refers to the new state-dir-lock module via a small stateDirLockExec adapter; the shields entrypoint stays focused on shields state transitions.
  • test/shields-up-runtime-perms.test.ts (new) — regression coverage for state-dir lock ownership, the agents/*/sessions restore loop, the preflight symlink guard, and atomicity within the state-dir fan-out when a symlinked root is detected (file-level mutations are now guarded by the hoisted preflight in lockAgentConfig).
  • test/repro-2681-group-writable.test.ts — updates the existing lockAgentConfig assertion to expect root:sandbox in the state-dir-lock shell command shape.
  • docs/security/best-practices.mdx — documents the root:sandbox posture for high-risk dirs, the stricter root:root 700 posture for secret-bearing dirs, the runtime-mutable exemptions, and the symlink preflight contract.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • New Features

    • Implements a host-side state-directory lockdown workflow: locks high-risk dirs to root:sandbox, secrets to root:root (700), and restores runtime-writable session subtrees to sandbox with restrictive perms.
  • Bug Fixes

    • shields-up performs a symlink preflight and aborts on symlinked state-dir roots with a clear error; shields-down reliably restores group-writable/setgid modes and ownership.
  • Documentation

    • Expanded lockdown guidance and symlink failure behavior.
  • Tests

    • Added/expanded suites for lockdown, symlink defenses, atomicity, and permission restoration.

…te root-owned subdirs in audit

NemoClaw #4065: shields-up locked HIGH_RISK_STATE_DIRS as root:root,
which stripped sandbox-group ownership from descendants and broke
OpenClaw plugin discovery (`extensions/<plugin>/` became unreachable to
the gateway, which is only granted sandbox-group access). It also left
`agents/main/` non-writable to the sandbox user, so the OpenClaw TUI's
lazy mkdir of `agents/main/sessions/` failed with EACCES on first
launch under lockdown.

Switch the state-dir lock to `root:sandbox` (top-level configDir is
still `root:root`) so the gateway keeps `r-x` via the sandbox group on
descendants stripped to 2750 by `chmod -R go-w`, and restore
`agents/*/sessions` to `sandbox:sandbox 2770` after the main lock loop
so the agent keeps writing session metadata under lockdown.

NemoClaw #4059: pre-backup audit joined per-dir `find` invocations with
`&&`, so a single permission-denied subdir made the whole chain exit 1
and the rebuild treated every state dir as failed. Join with `;` and
wrap each `find` with `|| true` — the audit's real signal is its
stdout (symlink / hardlink / special-file rows); exit codes from
perm-denied root-owned subdirs are noise.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added fix Sandbox Use this label to identify issues related to the NemoClaw isolated environment based on OpenShell. labels May 24, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1fc1ad8c-f995-49bd-9b47-87a8c736ff8b

📥 Commits

Reviewing files that changed from the base of the PR and between 6d8233d and d2d24d7.

📒 Files selected for processing (2)
  • docs/security/best-practices.mdx
  • src/lib/shields/index.ts
✅ Files skipped from review due to trivial changes (1)
  • docs/security/best-practices.mdx

📝 Walkthrough

Walkthrough

Extract state-dir locking into a shared module with a PrivilegedExec adapter, add symlink preflight checks, implement applyStateDirLockMode (lock/unlock, verify, restore writable runtime subpaths), wire into shields up/down, and add tests and docs validating atomicity and symlink safety.

Changes

Shields Lockdown Refactoring and Runtime Permission Hardening

Layer / File(s) Summary
PrivilegedExec interface and directory inventories
src/lib/shields/state-dir-lock.ts
Add PrivilegedExec interface and constants enumerating HIGH_RISK_STATE_DIRS, CONFIDENTIALITY_STATE_DIRS, and WRITABLE_RUNTIME_SUBPATHS.
Preflight symlink checks and exported API
src/lib/shields/state-dir-lock.ts
Export preflightStateDirLock that runs a symlink preflight and returns issue strings; expose the public API surface for state-dir locking.
Lock/unlock mutation scripts, verification, and restoration
src/lib/shields/state-dir-lock.ts
Implement applyStateDirLockMode to run lock/unlock shell passes (high-risk + workspace glob, confidentiality), parse marker output for symlink/mutation errors, verify post-lock ownership/modes, and restore runtime-writable subpaths (agents/*/sessions -> sandbox:sandbox 2770) on successful locks.
shields index integration and adapters
src/lib/shields/index.ts
Import preflightStateDirLock/applyStateDirLockMode; add stateDirLockExec adapter exposing privileged exec/capture; run preflight before mutation in lockAgentConfig and call applyStateDirLockMode with explicit highRiskOwner and isLocking from shields-up/down.
Existing test assertion updates for new ownership semantics
test/repro-2681-group-writable.test.ts
Update shields-down assertions to filter sh -c invocations and assert sandbox:sandbox ownership, g+rwX,o-rwx/2770 semantics and workspace handling; update shields-up assertion to expect root:sandbox owner string.
Runtime lock probe, atomicity, and fixture validations
test/shields-up-runtime-perms.test.ts
Add subprocess probe that captures docker-exec argv, stubs stat/lsattr/sha256sum, locates generated state-dir lock/restore sh scripts, asserts root:sandbox semantics and symlink-guard markers, verifies workspace-* preflight ordering and top-level chown root:root calls, tests preflight atomicity (no mutations after symlink detection), and re-executes extracted scripts against fixtures to validate sessions creation and symlink safety.
Security docs update
docs/security/best-practices.mdx
Expand host-side Agent Config Directory guidance to require locking high-risk subtrees to root:sandbox with chmod -R go-w, secrets to root:root 700, restore agents/<id>/sessions to sandbox:sandbox 2770, and abort on symlinked state-dir root with Config not locked: state dir root is a symlink.

Sequence Diagram

sequenceDiagram
  participant CLI as CLI (shields command)
  participant Index as shields/index
  participant StateLock as state-dir-lock module
  participant PrivExec as PrivilegedExec adapter
  participant Sandbox as docker/exec (container)
  participant FS as Host filesystem / fixtures

  CLI->>Index: run lockAgentConfig
  Index->>PrivExec: create stateDirLockExec (exec + capture)
  Index->>StateLock: preflightStateDirLock(privExec, configDir)
  StateLock->>PrivExec: capture sh -c preflight (workspace-* & roots)
  PrivExec->>Sandbox: docker exec -- sh -c preflight
  Sandbox->>PrivExec: stdout markers (symlinked roots or OK)
  StateLock->>Index: return issues[] or []
  alt issues found
    Index->>CLI: throw "Config not locked: ..."
  else no issues
    Index->>StateLock: applyStateDirLockMode(privExec, configDir, owner, isLocking=true)
    StateLock->>PrivExec: capture lock scripts (high-risk, workspace-*, confidentiality)
    PrivExec->>Sandbox: docker exec -- sh -c lock scripts
    Sandbox->>PrivExec: stdout markers (mutation results)
    StateLock->>PrivExec: capture verify + restore scripts
    PrivExec->>Sandbox: docker exec -- sh -c restore agents/*/sessions
    Sandbox->>FS: sessions dirs created with sandbox:sandbox 2770
    StateLock->>Index: return [] (no issues) or issues
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4287: Related privileged exec integration that standardizes privileged executor usage consumed by this refactor.
  • NVIDIA/NemoClaw#4130: Modifies shields up/down flow and overlaps on lock/unlock lifecycle integration points.
  • NVIDIA/NemoClaw#4272: Overlaps with shields permission adjustments and touches the same shields index paths.

Suggested reviewers

  • ericksoa
  • cv
  • cjagwani

Poem

🐰 I hopped the sandbox, checked each root,
Guarded symlinks, kept the boot,
Restored the sessions soft and light,
Permissions snug — the TUI's right,
A careful hop—now everything's tight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: keeping gateway readable and runtime sessions writable under shields-up lockdown, which is the core objective of fixing issue #4065.
Linked Issues check ✅ Passed The PR fully addresses all coding requirements from issue #4065: symlink preflight detection (#4065 requirement), runtime-writable paths restoration (#4065 requirement), selective lockdown for secrets vs high-risk dirs (#4065 requirement), and early failure on symlinked state-dir roots (#4065 requirement).
Out of Scope Changes check ✅ Passed All changes are in scope: new state-dir-lock.ts module implements lockdown logic, shields/index.ts integrates symlink preflight and runtime-writable restore, tests cover regression and new functionality, and docs explain the security model.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/shields-up-lockdown-runtime-perms

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

E2E Advisor Recommendation

Required E2E: shields-config-e2e
Optional E2E: openclaw-onboard-security-posture-e2e, hermes-onboard-security-posture-e2e

Dispatch hint: shields-config-e2e

Auto-dispatched E2E: shields-config-e2e via nightly-e2e.yaml at d2d24d77c3dfa08dfdbca85c9272b5feb295dfe7nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • shields-config-e2e (medium (~30 min timeout)): Direct coverage for the changed shields lifecycle. It runs a live sandbox through shields status, shields up, config immutability checks, content-seal drift detection, shields down, writable-state restoration, audit trail, timer restore, and double shields-up rejection.

Optional E2E

  • openclaw-onboard-security-posture-e2e (high (~60 min timeout)): Useful adjacent confidence for OpenClaw runtime security posture and config-tree ownership assumptions after changing shields-related permission logic.
  • hermes-onboard-security-posture-e2e (high (~60 min timeout)): Useful adjacent confidence because the new state-dir inventory includes Hermes-specific directories such as plugins, platforms, pairing, profiles, and skins, even though the existing posture job does not fully exercise shields up/down.

New E2E recommendations

  • security-boundaries.shields-state-dir-lock (high): Existing shields-config-e2e validates config and workspace locking, but the new behavior adds many state-dir roots, root:sandbox vs root:root split ownership, confidentiality dirs, symlink-root refusal, and agents/*/sessions restoration. These are currently covered mainly by mocked/unit-style tests rather than a live sandbox E2E.
    • Suggested test: Add a live shields state-dir permissions E2E that creates representative OpenClaw state dirs, runs shields up/down, verifies high-risk dirs are root:sandbox non-writable, credentials/identity/pairing are root:root 700, agents//sessions remains sandbox:sandbox 2770, and symlinked state-dir roots cause shields up to fail closed without following the symlink.
  • security-boundaries.hermes-shields (medium): The new inventory explicitly includes Hermes-specific state dirs, but the existing shields-config-e2e appears to exercise the default OpenClaw path. Hermes shields-up/down behavior can diverge because its config path, sensitive files, sticky root mode, plugins/platforms/pairing dirs, and runtime layout differ.
    • Suggested test: Add a Hermes shields lifecycle E2E that onboards Hermes, runs shields up/down, and verifies Hermes config, plugin/runtime, platform, and pairing directory permissions match the new contract.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: shields-config-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • ubuntu-repo-cloud-openclaw: Host-side shields lock/unlock code changed, including the new state-dir lock fan-out used by shields-up/down. The ubuntu-repo-cloud-openclaw scenario is the routed Ubuntu/OpenClaw path that includes the supplemental security-shields coverage for shields status/config consistency.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

  • None.

Relevant changed files

  • src/lib/shields/index.ts
  • src/lib/shields/state-dir-lock.ts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

PR Review Advisor

Findings: 4 needs attention, 7 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 10 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • Reject descendant symlink bridges under locked high-risk state dirs (src/lib/shields/state-dir-lock.ts:414): The new preflight, mutation, and verification paths reject symlinked state-dir roots, but they do not reject symlinks below locked executable/config trees such as extensions, plugins, skills, hooks, or agents. A sandbox user can create a descendant symlink before shields-up that points to a still-mutable target; after the root is locked, plugin or agent discovery may still follow that bridge and load sandbox-controlled code while shields-up reports the tree as locked.
    • Recommendation: During shields-up, either reject symlinks anywhere under locked executable/config state dirs before mutating, or freeze/replace them in a way that cannot point at sandbox-writable locations. Add a negative test with a symlink below a high-risk dir pointing to a writable target and verify shields-up refuses or neutralizes it before reporting locked.
    • Evidence: preflightSymlinkedRoots() checks only "$config_dir/$dir" and "$config_dir"/workspace-* roots; verifyStateDirGroup() checks only top-level mode/owner. The tests cover symlinked roots and symlinked agents/<id> restore parents, not descendant symlink bridges inside plugin/agent trees.
  • Fail shields-down when state-dir restore fails (src/lib/shields/index.ts:462): unlockAgentConfig() now captures stateDirUnlockIssues, but appends them to the local errors array that is only logged as a warning. The final throw is based on a separate issues array that verifies only config files and the config root. A state-dir chown/chmod failure can therefore still allow shields-down to report success while high-risk or secret-bearing dirs remain partially locked.
    • Recommendation: Append stateDirUnlockIssues to the verification issues that cause Config not unlocked, or otherwise throw when state-dir restore issues are non-empty. Add negative tests for unlock-time exec failure, emitted mutation-failed markers, and a partially restored state dir whose top-level config file/root checks still pass.
    • Evidence: stateDirUnlockIssues from applyStateDirLockMode(..., false) are pushed into errors at lines 462-464, errors are only logged at lines 466-469, and the throw at line 522 checks only issues collected by later file/config-dir stat checks.
  • Validate the linked TUI and plugin behavior in an actual shields-up sandbox (test/shields-up-runtime-perms.test.ts:156): Issue [Brev][Security] nemoclaw shields up blocks openclaw tui with EACCES on agents/sessions directory and disables /nemoclaw slash command #4065 is about runtime behavior inside a shields-up OpenClaw sandbox: the TUI must start, create its session directory, load the NemoClaw and openclaw-weixin plugins, register /nemoclaw, and report the current shields state. The changed tests assert mocked command shapes and replay shell snippets against local fixtures, but they do not exercise the sandbox user, gateway group membership, plugin discovery, slash-command registration, or TUI startup path.
    • Recommendation: Add or identify targeted runtime/integration validation that runs shields-up in a sandbox and verifies openclaw tui starts, /sandbox/.openclaw/agents/<id>/sessions is writable by the agent, the nemoclaw and openclaw-weixin plugins are discoverable, /nemoclaw shields outputs Shields: UP, high-risk dirs remain non-writable to sandbox-controlled code, symlink-root refusal works, and shields-down restores mutability.
    • Evidence: The linked issue's Expected Result says "TUI starts normally, session created" and "`/nemoclaw shields` outputs "Shields: UP"". The new test suite uses Module._load mocks and local spawnSync("sh", ...) fixture replays rather than a running sandbox/TUI.
  • Split the new state-dir lock monolith (src/lib/shields/state-dir-lock.ts:1): The extraction reduces src/lib/shields/index.ts, but introduces a new 552-line security-sensitive module that combines state-dir inventories, shell construction, privileged mutation parsing, symlink preflight, verification, and runtime carve-out restoration. This concentrates sandbox-escape and policy-bypass logic in one large file, making review and future changes riskier.
    • Recommendation: Split the module into focused units, for example inventory/constants, script builders, result parsing, verification, and runtime carve-out restoration, or otherwise offset the monolith growth so each privileged filesystem responsibility is independently auditable.
    • Evidence: Trusted drift context flags src/lib/shields/state-dir-lock.ts as a new 552-line monolith; the file includes privileged chown -R/chmod -R scripts, symlink checks, verification, and writable runtime subpath restoration.

🔎 Worth checking

  • Source-of-truth review needed: Runtime writable subpath restore for agents/*/sessions: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: WRITABLE_RUNTIME_SUBPATHS = ["agents/*/sessions"] and restoreWritableRuntimeSubpaths() restore matching targets to sandbox:sandbox 2770; the linked issue requires real TUI session creation.
  • Source-of-truth review needed: Mutation-time state-dir symlink preflight and root refusal: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: preflightStateDirLock() checks root entries before file mutations and mutation scripts re-check roots; descendant symlink bridges are not covered.
  • Source-of-truth review needed: Tolerant shell marker parsing for state-dir mutation and restore: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: runStateDirLockScript() and restoreWritableRuntimeSubpaths() parse marker lines and catch exec errors, but added tests cover mostly command shape and happy-path fixture behavior.
  • Source-of-truth review needed: Secret-bearing state-dir confidentiality split: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: CONFIDENTIALITY_STATE_DIRS implements credentials, identity, and pairing, but the added test suite contains no direct assertions for those names or final permissions.
  • Source-of-truth review needed: Shields-down state-dir restore issue propagation: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: stateDirUnlockIssues are pushed into errors and logged, while the final throw only checks later issues for config files and config root.
  • Add direct tests for secret-bearing state-dir lockdown (test/shields-up-runtime-perms.test.ts:157): The implementation separates credentials, identity, and pairing into a stricter confidentiality posture, but the tests do not directly assert that these directories lock to root:root 700 and recursively strip group/other access. A later regression could reintroduce sandbox or gateway read access to token-bearing state.
    • Recommendation: Add tests that inspect the confidentiality pass command shape and replay it on a fixture to confirm credentials, identity, and pairing roots become 700 root:root and descendants lose group/other read/write/execute access. Include a negative case showing high-risk code dirs remain gateway-readable while secret dirs do not.
    • Evidence: CONFIDENTIALITY_STATE_DIRS contains credentials, identity, and pairing, and applyStateDirLockMode() uses root:root, 700, and go-rwX; grep of the added test found no direct assertions for those names, owners, modes, or recursive access.
  • Cover tolerant shell marker parsing and partial mutation failures (src/lib/shields/state-dir-lock.ts:446): The PR aggregates shell marker output for exec errors, mutation failures, verification drift, and restore failures, but the new tests do not cover several important negative paths. Without regressions for command-level throw/timeout, emitted mutation-failed, emitted restore-failed, and partial recursive chmod/chown failure with clean top-level stat, this localized tolerant parsing can silently regress.
    • Recommendation: Add focused tests for privileged.capture() throwing, marker lines from each mutation pass, restore mkdir/chown/chmod/stat failures, and a partial recursive permission failure where the top-level state-dir mode/owner still match. Document the removal condition for this tolerant shell fan-out, or reduce the workaround by moving more guarantees to source-side invariants.
    • Evidence: runStateDirLockScript() and restoreWritableRuntimeSubpaths() parse structured marker lines, but the changed tests cover command shape, root symlink handling, and happy-path session creation rather than failure markers or partial recursive failure cases.

🌱 Nice ideas

  • None.
Since last review details

Current findings:

  • Source-of-truth review needed: Runtime writable subpath restore for agents/*/sessions: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: WRITABLE_RUNTIME_SUBPATHS = ["agents/*/sessions"] and restoreWritableRuntimeSubpaths() restore matching targets to sandbox:sandbox 2770; the linked issue requires real TUI session creation.
  • Source-of-truth review needed: Mutation-time state-dir symlink preflight and root refusal: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: preflightStateDirLock() checks root entries before file mutations and mutation scripts re-check roots; descendant symlink bridges are not covered.
  • Source-of-truth review needed: Tolerant shell marker parsing for state-dir mutation and restore: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: runStateDirLockScript() and restoreWritableRuntimeSubpaths() parse marker lines and catch exec errors, but added tests cover mostly command shape and happy-path fixture behavior.
  • Source-of-truth review needed: Secret-bearing state-dir confidentiality split: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: CONFIDENTIALITY_STATE_DIRS implements credentials, identity, and pairing, but the added test suite contains no direct assertions for those names or final permissions.
  • Source-of-truth review needed: Shields-down state-dir restore issue propagation: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: stateDirUnlockIssues are pushed into errors and logged, while the final throw only checks later issues for config files and config root.
  • Reject descendant symlink bridges under locked high-risk state dirs (src/lib/shields/state-dir-lock.ts:414): The new preflight, mutation, and verification paths reject symlinked state-dir roots, but they do not reject symlinks below locked executable/config trees such as extensions, plugins, skills, hooks, or agents. A sandbox user can create a descendant symlink before shields-up that points to a still-mutable target; after the root is locked, plugin or agent discovery may still follow that bridge and load sandbox-controlled code while shields-up reports the tree as locked.
    • Recommendation: During shields-up, either reject symlinks anywhere under locked executable/config state dirs before mutating, or freeze/replace them in a way that cannot point at sandbox-writable locations. Add a negative test with a symlink below a high-risk dir pointing to a writable target and verify shields-up refuses or neutralizes it before reporting locked.
    • Evidence: preflightSymlinkedRoots() checks only "$config_dir/$dir" and "$config_dir"/workspace-* roots; verifyStateDirGroup() checks only top-level mode/owner. The tests cover symlinked roots and symlinked agents/<id> restore parents, not descendant symlink bridges inside plugin/agent trees.
  • Fail shields-down when state-dir restore fails (src/lib/shields/index.ts:462): unlockAgentConfig() now captures stateDirUnlockIssues, but appends them to the local errors array that is only logged as a warning. The final throw is based on a separate issues array that verifies only config files and the config root. A state-dir chown/chmod failure can therefore still allow shields-down to report success while high-risk or secret-bearing dirs remain partially locked.
    • Recommendation: Append stateDirUnlockIssues to the verification issues that cause Config not unlocked, or otherwise throw when state-dir restore issues are non-empty. Add negative tests for unlock-time exec failure, emitted mutation-failed markers, and a partially restored state dir whose top-level config file/root checks still pass.
    • Evidence: stateDirUnlockIssues from applyStateDirLockMode(..., false) are pushed into errors at lines 462-464, errors are only logged at lines 466-469, and the throw at line 522 checks only issues collected by later file/config-dir stat checks.
  • Validate the linked TUI and plugin behavior in an actual shields-up sandbox (test/shields-up-runtime-perms.test.ts:156): Issue [Brev][Security] nemoclaw shields up blocks openclaw tui with EACCES on agents/sessions directory and disables /nemoclaw slash command #4065 is about runtime behavior inside a shields-up OpenClaw sandbox: the TUI must start, create its session directory, load the NemoClaw and openclaw-weixin plugins, register /nemoclaw, and report the current shields state. The changed tests assert mocked command shapes and replay shell snippets against local fixtures, but they do not exercise the sandbox user, gateway group membership, plugin discovery, slash-command registration, or TUI startup path.
    • Recommendation: Add or identify targeted runtime/integration validation that runs shields-up in a sandbox and verifies openclaw tui starts, /sandbox/.openclaw/agents/<id>/sessions is writable by the agent, the nemoclaw and openclaw-weixin plugins are discoverable, /nemoclaw shields outputs Shields: UP, high-risk dirs remain non-writable to sandbox-controlled code, symlink-root refusal works, and shields-down restores mutability.
    • Evidence: The linked issue's Expected Result says "TUI starts normally, session created" and "`/nemoclaw shields` outputs "Shields: UP"". The new test suite uses Module._load mocks and local spawnSync("sh", ...) fixture replays rather than a running sandbox/TUI.
  • Split the new state-dir lock monolith (src/lib/shields/state-dir-lock.ts:1): The extraction reduces src/lib/shields/index.ts, but introduces a new 552-line security-sensitive module that combines state-dir inventories, shell construction, privileged mutation parsing, symlink preflight, verification, and runtime carve-out restoration. This concentrates sandbox-escape and policy-bypass logic in one large file, making review and future changes riskier.
    • Recommendation: Split the module into focused units, for example inventory/constants, script builders, result parsing, verification, and runtime carve-out restoration, or otherwise offset the monolith growth so each privileged filesystem responsibility is independently auditable.
    • Evidence: Trusted drift context flags src/lib/shields/state-dir-lock.ts as a new 552-line monolith; the file includes privileged chown -R/chmod -R scripts, symlink checks, verification, and writable runtime subpath restoration.
  • Add direct tests for secret-bearing state-dir lockdown (test/shields-up-runtime-perms.test.ts:157): The implementation separates credentials, identity, and pairing into a stricter confidentiality posture, but the tests do not directly assert that these directories lock to root:root 700 and recursively strip group/other access. A later regression could reintroduce sandbox or gateway read access to token-bearing state.
    • Recommendation: Add tests that inspect the confidentiality pass command shape and replay it on a fixture to confirm credentials, identity, and pairing roots become 700 root:root and descendants lose group/other read/write/execute access. Include a negative case showing high-risk code dirs remain gateway-readable while secret dirs do not.
    • Evidence: CONFIDENTIALITY_STATE_DIRS contains credentials, identity, and pairing, and applyStateDirLockMode() uses root:root, 700, and go-rwX; grep of the added test found no direct assertions for those names, owners, modes, or recursive access.
  • Cover tolerant shell marker parsing and partial mutation failures (src/lib/shields/state-dir-lock.ts:446): The PR aggregates shell marker output for exec errors, mutation failures, verification drift, and restore failures, but the new tests do not cover several important negative paths. Without regressions for command-level throw/timeout, emitted mutation-failed, emitted restore-failed, and partial recursive chmod/chown failure with clean top-level stat, this localized tolerant parsing can silently regress.
    • Recommendation: Add focused tests for privileged.capture() throwing, marker lines from each mutation pass, restore mkdir/chown/chmod/stat failures, and a partial recursive permission failure where the top-level state-dir mode/owner still match. Document the removal condition for this tolerant shell fan-out, or reduce the workaround by moving more guarantees to source-side invariants.
    • Evidence: runStateDirLockScript() and restoreWritableRuntimeSubpaths() parse structured marker lines, but the changed tests cover command shape, root symlink handling, and happy-path session creation rather than failure markers or partial recursive failure cases.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lib/state/sandbox.ts (1)

1108-1122: Run the state lifecycle E2E jobs for this audit-semantic change before merge.

This changes backup audit behavior under permission-denied traversal, so it’s worth validating with state-backup-restore-e2e, snapshot-commands-e2e, and rebuild-openclaw-e2e.

As per coding guidelines: "src/lib/state/sandbox.ts: This file manages sandbox state ... E2E test recommendation: state-backup-restore-e2e, snapshot-commands-e2e, rebuild-openclaw-e2e."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/state/sandbox.ts` around lines 1108 - 1122, This change alters the
backup audit behavior in src/lib/state/sandbox.ts (see the auditCmd construction
and the "Pre-backup audit" log) so run the full state lifecycle E2E suites
before merging: execute state-backup-restore-e2e, snapshot-commands-e2e, and
rebuild-openclaw-e2e against a fresh sandbox to validate permission-denied
traversal is tolerated and legitimate rebuilds still succeed; if any test fails,
adjust the auditCmd logic or error handling around the shell-quoted dir mapping
(the code building auditCmd and the surrounding pre-backup audit flow) until the
E2E suites pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/snapshot.test.ts`:
- Around line 1056-1171: Update the fake ssh stub used in the two tests ("treats
audit-find exit 1 with empty stdout as a successful audit (NemoClaw `#4059`)" and
"still rejects violations from readable dirs even if a sibling find exits
non-zero") so that when the command contains "find " the script exits non-zero
(e.g., process.exit(1)) unless the command string also contains the tolerant
shape "|| true"; in other words, change the cmd.includes("find ") branch in both
writeExecutable calls to check for cmd.includes("|| true") and only return
non-zero when that tolerant token is absent, preserving existing stdout behavior
for other branches.

---

Nitpick comments:
In `@src/lib/state/sandbox.ts`:
- Around line 1108-1122: This change alters the backup audit behavior in
src/lib/state/sandbox.ts (see the auditCmd construction and the "Pre-backup
audit" log) so run the full state lifecycle E2E suites before merging: execute
state-backup-restore-e2e, snapshot-commands-e2e, and rebuild-openclaw-e2e
against a fresh sandbox to validate permission-denied traversal is tolerated and
legitimate rebuilds still succeed; if any test fails, adjust the auditCmd logic
or error handling around the shell-quoted dir mapping (the code building
auditCmd and the surrounding pre-backup audit flow) until the E2E suites pass.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b97cf8fe-d359-487b-8af3-17cb329e2c93

📥 Commits

Reviewing files that changed from the base of the PR and between 8f5db10 and 1ef364e.

📒 Files selected for processing (5)
  • src/lib/shields/index.ts
  • src/lib/state/sandbox.ts
  • test/repro-2681-group-writable.test.ts
  • test/repro-4065-shields-up-runtime-perms.test.ts
  • test/snapshot.test.ts

Comment thread test/snapshot.test.ts Outdated
…file

Address review: scrub `#4065` / `#4059` mentions from production code
comments and test docstrings, and rename the new shields-up regression
test from `repro-4065-…` to `shields-up-runtime-perms.test.ts` so the
filename describes behaviour rather than an issue number.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26355812622
Target ref: 1ef364e6574a9298b239e75e6523d7966b6f3bf9
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e,rebuild-openclaw-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
rebuild-openclaw-e2e ✅ success
shields-config-e2e ✅ success
snapshot-commands-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26356219674
Target ref: 15d095f4845aef820a1b3661855c531758f15fc1
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e,rebuild-openclaw-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
rebuild-openclaw-e2e ✅ success
shields-config-e2e ✅ success
snapshot-commands-e2e ✅ success

…d test

Address review:

1. `restoreWritableRuntimeSubpaths` expanded the full pattern
   `agents/*/sessions` as a single glob. On a fresh sandbox where
   `sessions` does not exist yet, the glob has no matches and the shell
   leaves the literal pattern, which the `*"*"*` guard then drops — so
   `sessions/` was never created and the post-lockdown TUI mkdir still
   failed with EACCES. Split each pattern into a parent glob (expanded
   against the existing tree) plus a leaf to create, so the helper
   always mkdir's the missing leaf inside every existing parent.
2. The two pre-backup audit tests stubbed the SSH fake as always-exit-0
   on `find`, so the `|| true` tolerance wrapper was not actually
   exercised. Make the fake exit non-zero with a Permission-denied
   stderr unless the audit cmd includes `|| true`, so the tests fail
   loudly if the wrapper is dropped.
3. New behavioural test runs the actual restore-helper script body
   against a real filesystem fixture and asserts that
   `agents/main/sessions` is created when only `agents/main` exists
   beforehand.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/shields/index.ts`:
- Around line 490-499: The loop in restoreWritableRuntimeSubpaths
(WRITABLE_RUNTIME_SUBPATHS handling) currently treats symlinked directories as
regular dirs and thus creates/mutates sessions/ on the symlink target; update
the parent existence guard to skip symlinks (e.g., require directory AND not a
symlink) or explicitly check [ -L "$parent" ] and continue when true so
mkdir/chown/chmod are not applied to symlink targets; make this change around
the loop that iterates parents (the for parent in "$@"; do ... [ -d "$parent" ]
|| continue portion) to ensure symlinked runtime parents are ignored.

In `@test/shields-up-runtime-perms.test.ts`:
- Around line 129-135: The test extracts patterns with restoreShell.slice(4)
which includes configDir, causing configDir to be passed twice into spawnSync
and altering the argv layout; change the extraction to restoreShell.slice(5) so
that patterns contains only the glob patterns, leaving the explicit "configDir"
argument in the spawnSync call (referencing restoreShell, script, patterns, and
the spawnSync invocation).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1eb0107a-ac61-4d44-ab10-07c74795f156

📥 Commits

Reviewing files that changed from the base of the PR and between 15d095f and 31ce569.

📒 Files selected for processing (3)
  • src/lib/shields/index.ts
  • test/shields-up-runtime-perms.test.ts
  • test/snapshot.test.ts

Comment thread src/lib/shields/index.ts Outdated
Comment thread test/shields-up-runtime-perms.test.ts
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26356782985
Target ref: 31ce5693e17db64ad95ce41d75623cfc296c3682
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e,rebuild-openclaw-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
rebuild-openclaw-e2e ✅ success
shields-config-e2e ✅ success
snapshot-commands-e2e ✅ success

… test argv slice

Address review:

1. The privileged restore helper called `[ -d "$parent" ]` before
   `mkdir`/`chown`/`chmod`, but `[ -d ]` follows symlinks. A pre-lockdown
   agent that swapped `agents/<id>` for a symlink to an arbitrary host
   path could redirect the post-lock `mkdir -p ".../sessions"` and
   `chown -R sandbox:sandbox` through that link and rewrite ownership on
   any directory the privileged exec context can reach. Drop the parent
   (and the target leaf) when either is a symlink, before any mutation.

2. The behavioural test extracted patterns with `slice(4)`, which kept
   the captured `configDir` in the argv passed to bash — so the helper
   ran with `configDir` listed twice and the test argv diverged from
   the real `privilegedSandboxExec` call shape. Use `slice(5)` so only
   the patterns are forwarded.

3. New behavioural test asserts the symlink guard: when
   `agents/<id>` is a symlink to a sibling host directory, the helper
   must not create `sessions/` under either the link target or the link
   itself.

Also reword one comment to avoid contested terminology.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/shields-up-runtime-perms.test.ts (1)

135-139: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use sh (not bash) when replaying the restore helper

Both restore-shell captures match sh -c, but both behavioral replays run spawnSync("bash", ["-c", ...]), which can mask /bin/sh compatibility regressions. Switch the replay shell to sh in both replay blocks (around lines 135-139 and 173-177).

♻️ Proposed fix
-    const result = spawnSync(
-      "bash",
-      ["-c", `${script}\n`, "sh", configDir, ...patterns],
-      { encoding: "utf-8", timeout: 5000 },
-    );
+    const result = spawnSync(
+      "sh",
+      ["-c", `${script}\n`, "sh", configDir, ...patterns],
+      { encoding: "utf-8", timeout: 5000 },
+    );

Apply the same change to the second replay block as well.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/shields-up-runtime-perms.test.ts` around lines 135 - 139, In
test/shields-up-runtime-perms.test.ts the replay uses spawnSync("bash", ["-c",
...]) which hides /bin/sh compatibility issues; change both replay invocations
to use spawnSync("sh", ["-c", ...]) instead (update the call sites where
spawnSync is invoked with "bash" in the two replay blocks around the current
spawnSync(...) usages so they call "sh" and keep the same args and options).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@test/shields-up-runtime-perms.test.ts`:
- Around line 135-139: In test/shields-up-runtime-perms.test.ts the replay uses
spawnSync("bash", ["-c", ...]) which hides /bin/sh compatibility issues; change
both replay invocations to use spawnSync("sh", ["-c", ...]) instead (update the
call sites where spawnSync is invoked with "bash" in the two replay blocks
around the current spawnSync(...) usages so they call "sh" and keep the same
args and options).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ea69a11f-9462-4b87-aa13-0ece1af0192b

📥 Commits

Reviewing files that changed from the base of the PR and between 31ce569 and 8608886.

📒 Files selected for processing (3)
  • src/lib/shields/index.ts
  • test/shields-up-runtime-perms.test.ts
  • test/snapshot.test.ts

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26357693833
Target ref: 8608886b21b09df026e49326d798f54742b81072
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success
snapshot-commands-e2e ✅ success

…ed comment

Address review:

1. The behavioural test for the runtime-subpath restore was running the
   captured script body under `bash` rather than `sh`, while the
   production helper invokes it via `sh -c`. A bash-only construct
   slipping into the helper would pass the test but break the real call
   site. Switch the replay to `spawnSync("sh", …)` in both fixtures so
   the test covers exactly what the privileged exec runs.

2. Drop the NC-2227-05 tag from the updated `lockAgentConfig` comment.
   The state-directory ownership story is now self-contained in
   `HIGH_RISK_STATE_DIRS` doc + the helper itself; the issue tag does
   not add information and conflicts with the no-issue-refs-in-comments
   guidance applied across the rest of this PR.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/shields/index.ts (1)

399-458: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Reject symlinked state-dir roots before recursive chmod -R/chown -R lock step

GNU coreutils defaults make chmod -R follow symlink command-line arguments, so if a configDir/<stateDir> or configDir/workspace-* entry is a symlink to a directory, the recursive chmod -R can apply outside configDir during the shields-up lock path. (chown -R defaults to non-dereference for symlink command-line args, but the recursive chmod -R is enough for this to be a security/correctness blocker.)

Suggested hardening
   for (const dirName of HIGH_RISK_STATE_DIRS) {
     const dirPath = `${configDir}/${dirName}`;
+    try {
+      privilegedSandboxExec(sandboxName, [
+        "sh",
+        "-c",
+        '[ -d "$1" ] && [ ! -L "$1" ]',
+        "sh",
+        dirPath,
+      ]);
+    } catch {
+      continue;
+    }
     try {
       privilegedSandboxExec(sandboxName, ["chown", "-R", owner, dirPath]);
     } catch {
       // Directory may not exist for this agent — silently skip
     }
@@
 for dir in "$config_dir"/workspace-*; do
+  [ -L "$dir" ] && continue
   [ -d "$dir" ] || continue
   chown -R "$owner" "$dir" 2>/dev/null || true
   chmod "$dir_mode" "$dir" 2>/dev/null || true
   [ "$clear_setgid" = "1" ] && chmod g-s "$dir" 2>/dev/null || true
   chmod -R "$recursive_mode" "$dir" 2>/dev/null || true
 done

Also applies to: 756-760

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/shields/index.ts` around lines 399 - 458, The recursive chown/chmod
operations in the HIGH_RISK_STATE_DIRS loop and the workspace-* shell block can
follow symlinked roots and affect paths outside configDir; before calling
privilegedSandboxExec for any dirPath or iterating workspace-* entries (symbols:
HIGH_RISK_STATE_DIRS, privilegedSandboxExec, configDir, dirPath, workspace-*
pattern, recursiveMode, dirMode, owner, clearSetgid, isLocking), reject/skips
any entry that is a symlink (use a non-following lstat/test -L check) so
recursive -R operations are only applied to real directories; update both the
TypeScript loop (skip when lstat indicates symlink) and the embedded shell
script (skip entries where [ -L "$dir" ] ) to avoid following symlink roots.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/lib/shields/index.ts`:
- Around line 399-458: The recursive chown/chmod operations in the
HIGH_RISK_STATE_DIRS loop and the workspace-* shell block can follow symlinked
roots and affect paths outside configDir; before calling privilegedSandboxExec
for any dirPath or iterating workspace-* entries (symbols: HIGH_RISK_STATE_DIRS,
privilegedSandboxExec, configDir, dirPath, workspace-* pattern, recursiveMode,
dirMode, owner, clearSetgid, isLocking), reject/skips any entry that is a
symlink (use a non-following lstat/test -L check) so recursive -R operations are
only applied to real directories; update both the TypeScript loop (skip when
lstat indicates symlink) and the embedded shell script (skip entries where [ -L
"$dir" ] ) to avoid following symlink roots.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cf8d1c18-3301-4936-baba-55a23d2cf2e7

📥 Commits

Reviewing files that changed from the base of the PR and between 8608886 and 1262ddf.

📒 Files selected for processing (2)
  • src/lib/shields/index.ts
  • test/shields-up-runtime-perms.test.ts

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26358552843
Target ref: 1262ddfb28d5b697630c40beb935a162170ca364
Workflow ref: main
Requested jobs: shields-config-e2e,state-backup-restore-e2e,snapshot-commands-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success
snapshot-commands-e2e ✅ success
state-backup-restore-e2e ✅ success

…p lock

Address review:

1. The shields-up lock loop ran `chown -R`/`chmod -R`/`chmod g-s` on
   each `${configDir}/${dirName}` (and on every `workspace-*` glob hit)
   without rejecting symlinked roots. A pre-lockdown agent that swapped
   e.g. `extensions/` or `workspace-main/` for a symlink to a host path
   could redirect those recursive ownership and mode mutations at an
   attacker-controlled directory. Consolidate the per-state-dir loop
   into a single privileged shell exec that skips symlinks (`[ -L "$path" ]
   && continue`) before any mutation, and add the same guard to the
   existing `workspace-*` shell loop.

2. Drop the `NC-2227-05` issue tag from the state-directory header
   comment for consistency with the rest of this PR.

Updates the regression tests:
- `repro-2681`: assert the unlock fan-out via the new `sh -c` script
  shape (workspace included as an arg, plus the workspace-* glob path
  still present).
- `shields-up-runtime-perms`: assert the state-dir lock and workspace-*
  scripts both contain the `[ -L … ] && continue` guard, and add a
  behavioural fixture that proves a symlinked `extensions/` root is
  skipped (its host target keeps its original mode and file contents).

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26359318121
Target ref: 4bcb560e9a85f575385a155c4809e7c0d17480a1
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e,state-backup-restore-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success
snapshot-commands-e2e ✅ success
state-backup-restore-e2e ✅ success

Address review:

Previously, when shields-up encountered a symlinked high-risk state-dir
root (or a symlinked `workspace-*` dir), the privileged lock script
silently skipped it via `[ -L "$path" ] && continue`. That refused to
follow the link — good — but left the dir as-is and reported the lock
as successful. The sandbox would then sit in "shields up (lockdown
active)" status while a state-dir root still pointed at a writable
host path, exactly the security regression the symlink guard was meant
to prevent.

Have the two consolidated lock shell scripts always exit 0 but emit
`symlinked-root\t<path>` on stdout for every symlinked root they
refuse to touch. `applyStateDirLockMode` parses those lines and
returns them as lock failures when invoked under `isLocking=true`.
`lockAgentConfig` now throws "Config not locked: state dir root is a
symlink: …" before any further verification, refusing to acknowledge
shields-up. Unlock is unchanged: skipping symlinked roots is the
correct best-effort behaviour there.

New regression test exercises the end-to-end path: when the captured
shell script reports a symlinked root via the mocked exec, the
`lockAgentConfig` call throws with the expected diagnostic.

Tests for the static script shape are updated to match the new
`if [ -L … ]; then printf …; fi` form instead of the previous
`[ -L … ] && continue`.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26360739914
Target ref: b95913ad0c78d1efd972054c21318a6901c01446
Workflow ref: main
Requested jobs: shields-config-e2e,state-backup-restore-e2e,snapshot-commands-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success
snapshot-commands-e2e ✅ success
state-backup-restore-e2e ✅ success

… + doc update

Address review:

1. The inline `[ -L … ] && printf 'symlinked-root\t…'` guard in the
   mutation script ran per-iteration. A symlinked root later in the
   list could still leave earlier (non-symlinked) state dirs already
   reowned to `root:sandbox` by the time the lock helper bailed out.
   shields-up would then report "config not locked" while the tree was
   partially mutated.

   Add a dedicated preflight pass that runs before any `chown`/`chmod`,
   scans every high-risk state-dir root *and* every `workspace-*` dir
   for symlinks, and returns the full list. When `isLocking=true` and
   the preflight finds any symlinked root, `applyStateDirLockMode`
   short-circuits without touching the mutation pass or the
   sessions-restore helper, and `lockAgentConfig` throws `Config not
   locked: state dir root is a symlink: …`. The inline symlink guards
   in the mutation scripts stay for defence-in-depth in case the
   preflight and the mutation observe different fs state.

2. New regression test mocks the preflight script to report a
   symlinked root and asserts (a) `lockAgentConfig` throws with the
   expected diagnostic and (b) no mutation calls (state-dir lock,
   workspace-* lock, or sessions-restore) were ever issued.

3. Add a second regression test that asserts a dedicated preflight
   script (no `chown`/`chmod`, just the `[ -L … ] && printf` checks)
   is present in the recorded call sequence.

4. Update `docs/security/best-practices.mdx` to document the new
   `root:sandbox` state-dir ownership, the `agents/<id>/sessions`
   runtime carve-out, and the hard fail on symlinked state-dir roots.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26361443080
Target ref: f099b8b9a3cc84464272a3a34a478064a0f661ff
Workflow ref: main
Requested jobs: shields-config-e2e,state-backup-restore-e2e,snapshot-commands-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success
snapshot-commands-e2e ✅ success
state-backup-restore-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26381246289
Target ref: bd702919776d434b9a2f2ac2a752e0ad412b0404
Workflow ref: main
Requested jobs: shields-config-e2e,snapshot-commands-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success
snapshot-commands-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26458071131
Target ref: 41a8bb3fa5bed9740ecff6fa7af41576840dc911
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success

@cv cv added v0.0.52 Release target v0.0.53 Release target and removed v0.0.51 Release target v0.0.52 Release target labels May 26, 2026
cv
cv previously approved these changes May 27, 2026
@ericksoa ericksoa dismissed cv’s stale review May 27, 2026 18:28

Unapproving per maintainer request.

@ericksoa ericksoa added v0.0.55 Release target and removed v0.0.53 Release target labels May 27, 2026
@jyaunches jyaunches added R1 v0.0.56 Release target and removed v0.0.55 Release target R1 labels May 29, 2026
@cv cv requested a review from ericksoa May 30, 2026 01:17
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26691453416
Target ref: 9574166317f9901ef9d089f525e4b966cf7922ff
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26691795412
Target ref: ad984e90b36a9cb09ab20c305d180b48fc8f669f
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success

…leged-exec in tests

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
…ip group read on credentials

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
docs/security/best-practices.mdx (2)

204-204: ⚡ Quick win

Rewrite in active voice.

Line 204 contains multiple passive constructions that should be rewritten in active voice. As per coding guidelines, active voice is required in NemoClaw documentation.

Examples from this line:

  • "are owned by root:sandbox" → "The lock helper owns these directories as root:sandbox"
  • "the sandbox user is denied write" → "the lock denies write access to the sandbox user"
  • "dirs that aren't present... are silently skipped" → "the lock helper silently skips dirs that aren't present"

As per coding guidelines: "Active voice required. Flag passive constructions. Bad: 'A gateway is created by the CLI.' Good: 'The CLI creates a gateway.'"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/security/best-practices.mdx` at line 204, Rewrite the sentence in active
voice throughout: have the lock helper own the high-risk state directories as
root:sandbox (instead of "are owned by root:sandbox"), state that the lock
helper lets the OpenClaw gateway (a member of the sandbox group) keep read
access while the lock denies write access to the sandbox user via chmod -R go-w,
and say the lock helper applies root:root 700 with chmod -R go-rwX to secret
directories to prevent gateway and sandbox reads; explicitly state that the lock
helper restores sandbox:sandbox 2770 on shields-down, that the list is the union
of state directories declared by every shipped agent manifest and that the lock
helper silently skips dirs not present in an agent's config tree, and finally
mark runtime-mutable subtrees (sessions/, memories/, logs/, cache/, plans/,
openclaw-weixin/) as intentionally exempt so agents can continue operating.

204-206: ⚡ Quick win

Split into one sentence per line.

Lines 204-206 contain multiple sentences on single lines, which makes diffs harder to review. As per coding guidelines, documentation source should use one sentence per line.

Proposed rewrite with one sentence per line

Line 204 should be split into approximately:

Under lockdown, the high-risk state directories (`skills`, `hooks`, `cron`, `agents`, `extensions`, `plugins`, `workspace`, `memory`, `devices`, `canvas`, `telegram`, `wechat`, `whatsapp`, `platforms`, `weixin`, `profiles`, `skins`) are owned by `root:sandbox` rather than `root:root`, so the OpenClaw gateway (a member of the `sandbox` group) keeps read access to plugin and agent code while the sandbox user is denied write through `chmod -R go-w`.
Secret-bearing directories (`credentials`, `identity`, `pairing`) get a stricter posture: `root:root 700` with `chmod -R go-rwX`, so neither the sandbox user nor the gateway can read them while shields are up.
The mutable-default posture (`sandbox:sandbox 2770`) is restored for both groups on shields-down.
The list is the union of state directories declared by every shipped agent manifest; dirs that aren't present in a given agent's config tree are silently skipped.
Runtime-mutable subtrees (`sessions/`, `memories/`, `logs/`, `cache/`, `plans/`, and `openclaw-weixin/` which is regenerated at image-build time) are intentionally exempt so the agent can keep operating.

Line 205 should be split:

A narrow set of runtime-data subpaths is exempted from the lock so the agent can keep operating — currently `agents/<agent-id>/sessions/`, which the OpenClaw TUI creates and writes session metadata into; the lock helper restores those subpaths to `sandbox:sandbox 2770` after the surrounding tree is locked.

Line 206 appears to already be a single sentence.

As per coding guidelines: "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/security/best-practices.mdx` around lines 204 - 206, Split the
multi-sentence paragraphs into one sentence per line in
docs/security/best-practices.mdx: break the paragraph that begins "Under
lockdown, the high-risk state directories..." into separate lines so each
sentence (the root:sandbox ownership sentence; the secret-bearing directories
sentence; the mutable-default posture sentence; the union-of-state-directories
sentence; the runtime-mutable subtrees sentence) is on its own line, and
likewise split the paragraph that begins "A narrow set of runtime-data
subpaths..." so that the single long sentence about agents/<agent-id>/sessions/
and the lock helper restoring sandbox:sandbox 2770 is on its own line; preserve
exact directory and permission strings (`skills`, `hooks`, `agents`,
`credentials`, `identity`, `pairing`, `sandbox:sandbox 2770`, `root:root 700`,
etc.) and do not change wording other than inserting line breaks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/security/best-practices.mdx`:
- Line 204: Rewrite the sentence in active voice throughout: have the lock
helper own the high-risk state directories as root:sandbox (instead of "are
owned by root:sandbox"), state that the lock helper lets the OpenClaw gateway (a
member of the sandbox group) keep read access while the lock denies write access
to the sandbox user via chmod -R go-w, and say the lock helper applies root:root
700 with chmod -R go-rwX to secret directories to prevent gateway and sandbox
reads; explicitly state that the lock helper restores sandbox:sandbox 2770 on
shields-down, that the list is the union of state directories declared by every
shipped agent manifest and that the lock helper silently skips dirs not present
in an agent's config tree, and finally mark runtime-mutable subtrees (sessions/,
memories/, logs/, cache/, plans/, openclaw-weixin/) as intentionally exempt so
agents can continue operating.
- Around line 204-206: Split the multi-sentence paragraphs into one sentence per
line in docs/security/best-practices.mdx: break the paragraph that begins "Under
lockdown, the high-risk state directories..." into separate lines so each
sentence (the root:sandbox ownership sentence; the secret-bearing directories
sentence; the mutable-default posture sentence; the union-of-state-directories
sentence; the runtime-mutable subtrees sentence) is on its own line, and
likewise split the paragraph that begins "A narrow set of runtime-data
subpaths..." so that the single long sentence about agents/<agent-id>/sessions/
and the lock helper restoring sandbox:sandbox 2770 is on its own line; preserve
exact directory and permission strings (`skills`, `hooks`, `agents`,
`credentials`, `identity`, `pairing`, `sandbox:sandbox 2770`, `root:root 700`,
etc.) and do not change wording other than inserting line breaks.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9a7293e3-fb88-4655-80b7-2f66b6550f45

📥 Commits

Reviewing files that changed from the base of the PR and between bd70291 and 4df0f53.

📒 Files selected for processing (1)
  • docs/security/best-practices.mdx

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 26707810999
Target ref: 4df0f5380c3fb7684bea6c0cf91ff2ed6e2143d9
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
shields-config-e2e ❌ failure

Failed jobs: shields-config-e2e. Check run artifacts for logs.

…restore failures

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/security/best-practices.mdx`:
- Around line 210-211: The docs claim that "Shields-up exempts runtime-mutable
subtrees (`sessions/`, `memories/`, `logs/`, `cache/`, `plans/`, plus
`openclaw-weixin/`)", which contradicts the actual shields behavior and the lock
helper that only preserves `agents/<agent-id>/sessions/` (restored to
`sandbox:sandbox 2770`); update the text to remove the outdated multi-subtree
exemption list and replace it with the accurate statement that the only runtime
carve-out is `agents/<agent-id>/sessions/`, and ensure the sentence references
the lock helper behavior (restores those subpaths to `sandbox:sandbox 2770`) so
operators aren’t misled by the old list.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f8fe9182-7699-41f7-98c2-a3a5ffdd6293

📥 Commits

Reviewing files that changed from the base of the PR and between 4df0f53 and 6d8233d.

📒 Files selected for processing (4)
  • docs/security/best-practices.mdx
  • src/lib/shields/index.ts
  • src/lib/shields/state-dir-lock.ts
  • test/shields-up-runtime-perms.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/shields-up-runtime-perms.test.ts

Comment thread docs/security/best-practices.mdx Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26708316674
Target ref: 6d8233d2c209eef05577eeba22916f87eda0d115
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success

…rding

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26708769299
Target ref: d2d24d77c3dfa08dfdbca85c9272b5feb295dfe7
Workflow ref: main
Requested jobs: shields-config-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
shields-config-e2e ✅ success

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Sandbox Use this label to identify issues related to the NemoClaw isolated environment based on OpenShell. v0.0.56 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Brev][Security] nemoclaw shields up blocks openclaw tui with EACCES on agents/sessions directory and disables /nemoclaw slash command

4 participants