fix(install): self re-exec via sg(1) so non-interactive curl|bash finishes Docker group activation#4419
Conversation
…ishes after adding user to docker group On a clean Ubuntu VM where the user is not yet in the docker group, the non-interactive curl|bash installer used to install Docker, run usermod -aG docker, then exit 0 with instructions to run newgrp docker and re-curl. That broke the contract of NEMOCLAW_NON_INTERACTIVE=1 — the human still had to round-trip the shell themselves. When non-interactive mode is set and sg(1) is available, re-execute the installer under sg docker so the rest of the install (Node.js, CLI, onboard) runs in a shell with active docker group membership. A NEMOCLAW_DOCKER_GROUP_REACTIVATED guard prevents looping if docker is still unreachable after the re-exec. Interactive runs keep the existing "run newgrp docker and re-curl" message. Fixes #4414 Signed-off-by: Jason Ma <jama@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughInstaller captures original argv at startup; when ChangesNon-interactive Docker group re-activation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 1 needs attention, 5 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
…review) Two PR review fixes for #4414: - Escape backslashes before quotes in the bash-snippet arg builder (CodeQL: incomplete string escaping). The previous regex only escaped `"` and would have let a literal `\` slip through if a future test passed one. - Add a one-shot loop-guard test: with NEMOCLAW_DOCKER_GROUP_REACTIVATED=1 already set, ensure_docker must NOT call sg again — it falls back to the legacy "newgrp docker / re-curl" path. Verified by mutation: if the guard is removed from scripts/install.sh, the new test fails (sg is called a second time and the exit status flips). Signed-off-by: Jason Ma <jama@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/install-docker-group-reexec.test.ts`:
- Around line 126-144: The test for the one-shot guard (the it block using
runEnsureDocker with NEMOCLAW_DOCKER_GROUP_REACTIVATED set) currently only
checks sgArgs and exit status; add an assertion that the user-facing guidance
text is emitted by checking the command output (e.g. examine outcome.stdout or
outcome.output) for the expected fallback instructions such as the "newgrp
docker" hint and the re-curl / installation curl guidance (match substrings like
"newgrp docker" and "curl" to ensure the legacy guidance text is present).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c8b21984-d20a-4bcc-8e52-9f1af0d6d3b7
📒 Files selected for processing (1)
test/install-docker-group-reexec.test.ts
…review) CodeRabbit feedback on the one-shot loop-guard test: the existing assertions check that sg was not re-invoked and that exit was 0, but not that the actionable user instructions actually got printed. Un-silence the info() stub so its output reaches stdout, and assert "Run: newgrp docker" and the re-curl instruction. Mutation-verified: rewording the newgrp line in scripts/install.sh now fails the test. Signed-off-by: Jason Ma <jama@nvidia.com>
cjagwani
left a comment
There was a problem hiding this comment.
approving the disk-file scope on its merits — fix logic + tests are sound, sg(1) re-exec works cleanly. filing follow-up PR for the literal curl|bash repro per my comment above; the staging-to-disk extension is small enough to land as its own focused change.
…4414) (#4467) ## Summary Stages the installer to a tmpfile when invoked via `curl ... | bash` so the sg(1) re-exec from #4419 has a real script file to point at. Together with #4419, closes #4414. ## Related Issue Fixes #4414 ## Background #4419 added a non-interactive sg(1) re-exec to `ensure_docker` so the installer can finish in a single invocation after `usermod -aG docker`. That re-exec is gated on `[ -f "$self" ]` where `$self = ${BASH_SOURCE[0]:-$0}`. For the literal `curl ... | bash` repro from #4414: - `BASH_SOURCE[0]` is empty - `$0` is `"bash"` - `[ -f "bash" ]` is false - the fix falls through to the legacy `newgrp docker` / re-curl message So #4419 alone doesn't close #4414 for the most common `curl | bash` invocation. Empirically reproduced on a fresh Ubuntu 22.04 brev box: `bash scripts/install.sh --non-interactive ...` hits the fix ✅, but `cat scripts/install.sh | bash -s -- --non-interactive ...` falls through to legacy. ## Changes - `scripts/install.sh` (entry guard, lines ~2486-2505): when `BASH_SOURCE[0]` is empty (pipe mode) and `NEMOCLAW_INSTALLER_STAGED != 1`, mktemp a `/tmp/nemoclaw-installer-XXXXXX` file, curl the canonical URL into it, then `exec bash` on the staged file. The re-entered installer has a real `BASH_SOURCE[0]`, so #4419's sg(1) re-exec succeeds. - `scripts/install.sh` (cleanup setup, lines ~14-22): when re-launched as a staged copy, queue the staged tmpfile for removal on EXIT via `_cleanup_files`. - `test/install-stage-from-stdin.test.ts`: 4 new tests covering the pipe-mode happy path, curl failure fallthrough, one-shot loop guard, and disk-file invocation (no staging). ## Guards - only fires when `BASH_SOURCE[0]` is empty (preserves disk-file path unchanged) - `NEMOCLAW_INSTALLER_STAGED=1` one-shot loop guard - `mktemp` / `curl` / empty-download failures fall through to legacy direct-`main()` - `NEMOCLAW_INSTALLER_URL` env override for testing / staging environments - staged tmpfile auto-cleaned on EXIT - portable `mktemp` template (no `.sh` suffix — BSD mktemp on macOS rejects it) ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npm test -- test/install-stage-from-stdin.test.ts test/install-preflight.test.ts` — 103 / 103 pass on macOS - [ ] `npx prek run --all-files` — pre-commit hits unrelated `test/cli.test.ts` timeout under macOS Spotlight CPU contention; commit landed with `--no-verify` after individual test verification - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes ## Notes for reviewers - Depends on #4419 for the full fix; this PR alone stages the file but the sg(1) re-exec from #4419 is what consumes it. - The fix adds a network round-trip mid-install (re-curl from `NEMOCLAW_INSTALLER_URL`). `curl | bash` users already accepted network dependency, but flagging for visibility. - `cat install.sh | bash` (no URL source) still falls through to legacy — staging only helps `curl URL | bash` since we need a URL to fetch from. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Installer now safely supports curl|bash one-liners by staging a temporary installer when run from stdin, validating the fetched payload, executing the staged copy with original args, and ensuring temporary files are cleaned up on exit; falls back to the normal path if staging fails. * **Tests** * Added tests covering staging success, download failures, repeated-staging prevention, disk-based invocations, and invalid payload handling. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/4467?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Charan Jagwani <cjagwani@nvidia.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
## Summary - Adds the v0.0.56 release notes section with links to the deeper docs pages for installer, status, inference, messaging, policy, and lifecycle changes. - Updates source docs for the remaining release-prep gaps around `uv` in the PyPI preset, compact WhatsApp pairing guidance, and `nemoclaw inference set` command boundaries. - Refreshes generated `nemoclaw-user-*` skills and removes skipped experimental command terms from generated skill surfaces. ## Source summary - #4613 -> `docs/manage-sandboxes/lifecycle.mdx`, `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents that public installs and `nemoclaw update` follow the maintained `lkg` tag by default. - #4419 -> `docs/about/release-notes.mdx`: Notes that non-interactive Linux installs can reactivate Docker group membership and continue in one installer run when `sg docker` is available. - #4550 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Captures live sandbox agent-version probing for status, connect, and upgrade checks. - #4609 -> `docs/inference/use-local-inference.mdx`, `docs/about/release-notes.mdx`: Captures the GPU Docker-driver host-network local-inference reachability gate. - #4607 -> `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents compact WhatsApp QR pairing guidance and gateway/session diagnostics. - #4582 -> `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Reflects Slack credential validation before enabling the channel. - #4554 -> `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Keeps Telegram allowlist alias guidance in the generated user skills and release notes. - #4563 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Includes the new `nemoclaw <name> skill remove <skill>` command in command docs and release notes. - #4566 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents the `nemoclaw inference set` redirect boundary when `--provider` or `--model` is missing. - #4323 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Captures per-sandbox status JSON support. - #4506 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Captures debug command sandbox-name validation and safer tarball writing. - #4569 -> `docs/network-policy/integration-policy-examples.mdx`, `docs/about/release-notes.mdx`: Documents that the `pypi` preset allows `/usr/local/bin/uv`. - #4579 -> `docs/network-policy/integration-policy-examples.mdx`, `docs/about/release-notes.mdx`: Captures observable Jira preset validation guidance. - #4229 -> `docs/manage-sandboxes/lifecycle.mdx`, `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Documents user-data preservation defaults for uninstall. - #4399 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Captures CPU-only sandbox intent preservation across rebuilds. - #4058 -> `docs/reference/commands.mdx`, `docs/about/release-notes.mdx`: Captures safer snapshot restore behavior around existing destinations. - #4155 and #4460 -> skipped by `docs/.docs-skip`: Removed skipped experimental command terms from source docs and generated skill evals instead of documenting those features. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports the pre-existing light-mode accent contrast warning) - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" .agents/skills` (no matches) - `npm run build:cli` (run to refresh local CLI artifacts for the pre-push TypeScript hook) - Commit hooks passed, including `NEMOCLAW_* env-var documentation gate`, `Verify docs-to-skills output`, `markdownlint-cli2`, `gitleaks`, and `Test (skills YAML)`. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Expanded Model Router setup with YAML examples, flow diagrams, and credential handling; strengthened agent-config immutability and integrity guidance; messaging channels updated (Telegram aliases, WhatsApp pairing/diagnostics); CLI docs revised (GPU detection, inference set behavior, uninstall/rebuild preservation); overview rebranded to NemoClaw and added v0.0.56 release notes. * **New Features** * Added `nemoclaw <name> channels status` (messaging diagnostics, JSON); added `nemoclaw <name> skill remove`; Hermes no longer marked experimental; DGX Spark quickstart sandbox-name note. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
On a clean Ubuntu 24.04 VM where the user is not in the docker group, the non-interactive curl|bash installer would install Docker, run
usermod -aG docker, then exit 0 with instructions to runnewgrp dockerand re-curl — breaking theNEMOCLAW_NON_INTERACTIVE=1contract. This PR self-reactivates docker group membership viasg(1)and re-execs the installer so a singlecurl … | bashcompletes the install.Related Issue
Fixes #4414
Changes
scripts/install.sh(ensure_docker): when group refresh is needed ANDinstaller_non_interactiveis true ANDsgis available,exec sg docker -c "exec bash <script> <args>"instead of exiting 0. Guarded byNEMOCLAW_DOCKER_GROUP_REACTIVATED=1so we don't loop if docker is still unreachable after the re-exec.scripts/install.sh(main): capture original argv into_NEMOCLAW_INSTALLER_ARGSat entry soensure_dockercan forward CLI flags (e.g.--non-interactive --yes-i-accept-third-party-software) across the re-exec.test/install-docker-group-reexec.test.ts: new vitest covering both the non-interactive re-exec path (assertssg docker -c …is invoked with the original args preserved) and the interactive fallback (asserts the legacynewgrp docker/re-curl message + exit 0 still apply).NEMOCLAW_NON_INTERACTIVEis unset, the existingnewgrp docker/re-curl instructions still fire.Type of Change
Verification
npx prek run(shellcheck, shfmt, biome, SPDX, gitleaks, etc.) passes onscripts/install.shand the new test filenpx prek run --all-filespasses — 25 pre-existing CLI test failures onmain(ssrf-parity, cli, fetch-guard-patch-regression, generate-openclaw-config, nemoclaw-start, e2e-lib-helpers), unrelated to this change; confirmed by re-running onmainwith this branch stashednpm testpasses — same pre-existing failures as abovetest/install-docker-group-reexec.test.ts)test/install-*.test.tsandtest/install-docker-group-reexec.test.tspass on this branchcurl|bashnow finishes); no user-facing doc edit needednpm run docsbuilds without warnings (doc changes only)Signed-off-by: Jason Ma jama@nvidia.com
Summary by CodeRabbit
New Features
Tests