Skip to content

fix(onboard): guard config-sync chmod with ownership check#4206

Closed
hunglp6d wants to merge 1 commit into
NVIDIA:mainfrom
hunglp6d:fix/nightly-e2e-config-sync-chmod-eperm-4f9b749
Closed

fix(onboard): guard config-sync chmod with ownership check#4206
hunglp6d wants to merge 1 commit into
NVIDIA:mainfrom
hunglp6d:fix/nightly-e2e-config-sync-chmod-eperm-4f9b749

Conversation

@hunglp6d
Copy link
Copy Markdown
Contributor

@hunglp6d hunglp6d commented May 26, 2026

Summary

Guard chmod 700 ~/.nemoclaw and chmod 600 ~/.nemoclaw/config.json in the sandbox config-sync script with an ownership check so the commands only run when the current user owns the target. This prevents EPERM crashes inside OpenClaw sandbox containers where /sandbox/.nemoclaw is root-owned (Dockerfile L733: root:root 1755), fixing all 39 failed OpenClaw E2E jobs in the May 26 nightly run.

Related Issue

Fixes #4207

Changes

  • Wrap chmod 700 ~/.nemoclaw with if [ "$(stat -c '%u' ~/.nemoclaw)" = "$(id -u)" ] so it only runs when the sandbox user owns the directory (host installs). Inside root-owned containers, prepare_filesystem already handles permissions.
  • Wrap chmod 600 ~/.nemoclaw/config.json with the same ownership guard.
  • Existing tests in config-sync.test.ts continue to pass (3/3) since the chmod 700 and chmod 600 strings are still present inside the if-blocks.

Root Cause

PR #4054 (commit f1044ad) added unconditional chmod 700 ~/.nemoclaw and chmod 600 ~/.nemoclaw/config.json to buildSandboxConfigSyncScript() in src/lib/onboard/config-sync.ts. The script runs as the sandbox user inside OpenClaw containers where /sandbox/.nemoclaw is root:root 1755 (Dockerfile L733). Under set -euo pipefail, the EPERM from the failed chmod becomes exit 1, piped through openshell sandbox connect, crashing every OpenClaw E2E job at onboard step 7/8.

Hermes containers are unaffected because Dockerfile.base L74-75 chowns /sandbox to sandbox:sandbox.

Validation

A focused custom-e2e.yaml workflow was run on a sibling branch to confirm this fix repairs the regression. The workflow re-runs only the jobs from the original nightly that this PR targets, on ubuntu-latest, off the same fix commit as this PR.

The validation branch is intentionally not the head of this PR — it carries an extra .github/workflows/custom-e2e.yaml commit that is scaffolding, not part of the fix. Re-run the validation by pushing any commit to the validation branch.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

AI Disclosure

  • AI-assisted — tool: Claude Code (nemoclaw-diagnosis skill)

Signed-off-by: Hung Le hple@nvidia.com

… EPERM

PR NVIDIA#4054 added `chmod 700 ~/.nemoclaw` and `chmod 600 ~/.nemoclaw/config.json`
to the sandbox config-sync script. Inside OpenClaw sandbox containers the
`/sandbox/.nemoclaw` directory is root-owned (Dockerfile L733: root:root 1755
for DAC protection of blueprints/), so the sandbox user cannot chmod it.
Under `set -euo pipefail` the EPERM becomes exit 1, crashing every OpenClaw
E2E job at onboard step 7/8.

Wrap both chmod calls in an ownership check:
  if [ "$(stat -c '%u' <path>)" = "$(id -u)" ]; then chmod ...; fi

This preserves the hardening on host installs (where the user owns the dir)
while skipping the chmod inside root-owned sandbox containers where
prepare_filesystem already handles permissions.

Hermes containers are unaffected because their Dockerfile.base chowns
/sandbox to sandbox:sandbox.

Fixes: NemoClaw nightly-e2e run 26425202514 (39/39 OpenClaw jobs failed)
Refs: NemoClaw#4054, NemoClaw#4009

Signed-off-by: Hung Le <hple@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 26, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f9d091d8-c7bb-4a2d-9155-df92fe7a809e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@hunglp6d
Copy link
Copy Markdown
Contributor Author

Closed due to a fix #4199 was merged

@hunglp6d hunglp6d closed this May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nightly-e2e: config-sync chmod EPERM crashes all OpenClaw jobs (run 26425202514)

2 participants