fix(onboard): use OpenShell gateway user service#4580
Conversation
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds Linux systemd user-service startup for the OpenShell Docker-driver gateway, gates Debian env-override writes on service presence, integrates package-managed startup into onboarding, expands unit and e2e tests, and updates architecture/CLI docs and TCP-readiness comments. ChangesLinux Gateway User Service Feature
Sequence DiagramssequenceDiagram
participant Onboard as startDockerDriverGateway
participant EnvOverride as writeDockerGatewayDebEnvOverride
participant PkgMgd as startPackageManagedDockerDriverGateway
participant Service as startOpenShellGatewayUserService
participant Systemctl as systemctl --user
participant Gateway as openshell-gateway
Onboard->>Onboard: Compute gatewayEnv from OpenShell --version
Onboard->>EnvOverride: Write DEB gateway.env override
EnvOverride-->>Onboard: wrote boolean
Onboard->>PkgMgd: startPackageManagedDockerDriverGateway(...)
PkgMgd->>PkgMgd: Check unit file exists
alt Unit exists
PkgMgd->>Service: startOpenShellGatewayUserService()
Service->>Systemctl: daemon-reload
Systemctl-->>Service: result
Service->>Systemctl: enable openshell-gateway
Systemctl-->>Service: result
Service->>Systemctl: restart openshell-gateway
Systemctl-->>Gateway: restart signal
Gateway-->>Service: started / failure
alt Service started
Service-->>PkgMgd: started true
PkgMgd-->>Onboard: success
else Service failed
Service-->>PkgMgd: fallbackAllowed / reason
PkgMgd-->>Onboard: false or exit/throw
end
else Unit not found
PkgMgd-->>Onboard: false
end
Onboard->>Onboard: Standalone gateway fallback path
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
|
🌿 Preview your docs: https://nvidia-preview-pr-4580.docs.buildwithfern.com/nemoclaw |
E2E Advisor RecommendationRequired E2E: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 4 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 26717014158
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 2414-2471: The helper function
tryStartPackageManagedDockerDriverGateway should be moved out of
src/lib/onboard.ts into a new sibling module (e.g.,
src/lib/onboard-package-managed.ts) so the big block no longer inflates
onboard.ts; keep startDockerDriverGateway() in onboard.ts as the coordinator and
have it import and call the relocated tryStartPackageManagedDockerDriverGateway.
Ensure you preserve the function signature and all referenced symbols
(dockerDriverGatewayService, clearDockerDriverGatewayRuntimeFiles,
registerDockerDriverGatewayEndpoint, runCaptureOpenshell, isGatewayHealthy,
isGatewayTcpReady, verifySandboxBridgeGatewayReachableOrExit, envInt,
sleepSeconds, GATEWAY_NAME) and their imports/exports so behavior is unchanged,
update module imports in onboard.ts, and export the helper from the new module
for testing/consumption.
- Around line 2443-2461: The call to clearDockerDriverGatewayRuntimeFiles
currently runs before the health-poll loop and removes runtime PID/marker files
prematurely; move that cleanup so it only runs after the gateway is confirmed
healthy (i.e., after isGatewayHealthy(status, namedInfo, currentInfo) && await
isGatewayTcpReady() succeeds) — update the code so
clearDockerDriverGatewayRuntimeFiles is invoked after the successful health
checks (for functions/variables involved: clearDockerDriverGatewayRuntimeFiles,
registerDockerDriverGatewayEndpoint, isGatewayHealthy, isGatewayTcpReady,
verifySandboxBridgeGatewayReachableOrExit) so recovery/fallback logic retains
runtime breadcrumbs until the service is truly up.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ad5de779-9431-49cf-972d-2394b910a053
📒 Files selected for processing (9)
docs/reference/architecture.mdxdocs/reference/commands.mdxscripts/install-openshell.shsrc/lib/onboard.tssrc/lib/onboard/docker-driver-gateway-env.test.tssrc/lib/onboard/docker-driver-gateway-env.tssrc/lib/onboard/docker-driver-gateway-service.test.tssrc/lib/onboard/docker-driver-gateway-service.tstest/install-openshell-version-check.test.ts
Selective E2E Results — ✅ All requested jobs passedRun: 26717115183
|
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26717180037
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/onboard.ts (1)
2424-2426: Please run one service-path E2E before merge.This branch now short-circuits the legacy startup flow when the package-managed gateway takes ownership, so
openshell-gateway-upgrade-e2eplus one happy-path onboard flow such ascloud-e2eorsandbox-operations-e2ewould give good coverage of the new handoff.As per coding guidelines,
src/lib/onboard.ts: "This file contains core onboarding logic. Changes here affect the full sandbox creation and configuration flow."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` around lines 2424 - 2426, Run a service-path E2E before merging to validate the new short-circuit in src/lib/onboard.ts: execute the openshell-gateway-upgrade-e2e test plus one happy-path onboarding E2E (cloud-e2e or sandbox-operations-e2e) to confirm dockerDriverGatewayEnv.startPackageManagedDockerDriverGateway correctly takes ownership and that the legacy startup path still behaves when it should; verify the flow around verifySandboxBridgeGatewayReachableOrExit and GATEWAY_NAME, observe logs/errors and ensure no regressions in gateway handoff or sandbox creation before merging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/onboard.ts`:
- Around line 2424-2426: Run a service-path E2E before merging to validate the
new short-circuit in src/lib/onboard.ts: execute the
openshell-gateway-upgrade-e2e test plus one happy-path onboarding E2E (cloud-e2e
or sandbox-operations-e2e) to confirm
dockerDriverGatewayEnv.startPackageManagedDockerDriverGateway correctly takes
ownership and that the legacy startup path still behaves when it should; verify
the flow around verifySandboxBridgeGatewayReachableOrExit and GATEWAY_NAME,
observe logs/errors and ensure no regressions in gateway handoff or sandbox
creation before merging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e9cba808-c0cf-473b-950a-5efbb2171a45
📒 Files selected for processing (3)
src/lib/onboard.tssrc/lib/onboard/docker-driver-gateway-env.tssrc/lib/onboard/docker-driver-gateway-service.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/onboard/docker-driver-gateway-env.ts
Selective E2E Results — ❌ Some jobs failedRun: 26717266019
|
Selective E2E Results — ❌ Some jobs failedRun: 26718114347
|
Selective E2E Results — ✅ All requested jobs passedRun: 26718192181
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/reference/commands.mdx`:
- Line 1422: Update the description for the environment variable
NEMOCLAW_OPENSHELL_SANDBOX_BIN to use active voice to match the parallel entry
for NEMOCLAW_OPENSHELL_GATEWAY_BIN: replace the passive phrase "passed to the
Linux Docker-driver standalone fallback" with the active construction "used by
the Linux Docker-driver standalone fallback" while keeping the rest of the
sentence unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b6c6b2a5-0d6f-4db2-a49c-f202dd4ad0d1
📒 Files selected for processing (2)
docs/reference/commands.mdxsrc/lib/onboard/docker-driver-gateway-service.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/onboard/docker-driver-gateway-service.ts
Selective E2E Results — ✅ All requested jobs passedRun: 26718200252
|
Selective E2E Results — ✅ All requested jobs passedRun: 26718263958
|
|
✅ Brev E2E (full): PASSED on branch |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard/docker-driver-gateway-service.ts (1)
145-150:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAllow standalone fallback for bus/user-manager errors on
restarttoo.Limiting
fallbackAllowedtodaemon-reloadfailures turns asystemctl --user restartbus/user-manager outage into a hard stop. Since the caller insrc/lib/onboard.ts(Lines 2411-2427) routes through this helper before the standalone path, that blocks the documented fallback even though the package-managed service is simply unavailable.🔧 Proposed fix
return { attempted: true, - fallbackAllowed: args[0] === "daemon-reload" && userManagerLooksUnavailable(result.reason ?? ""), + fallbackAllowed: userManagerLooksUnavailable(result.reason ?? ""), reason, started: false, };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/docker-driver-gateway-service.ts` around lines 145 - 150, The fallbackAllowed check in the runSystemctlUser error handling currently only permits fallback for daemon-reload failures; update that condition to also allow fallback when the invoked command is "restart" and the failure matches the user manager/bus outage detector. In the error branch where runSystemctlUser result.ok is false (inside docker-driver-gateway-service.ts), change the fallbackAllowed expression that references args[0] and userManagerLooksUnavailable(result.reason ?? "") so it returns true for args[0] === "daemon-reload" OR args[0] === "restart" when userManagerLooksUnavailable(...) is true, ensuring restart failures due to bus/user-manager outages permit the standalone fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/lib/onboard/docker-driver-gateway-service.ts`:
- Around line 145-150: The fallbackAllowed check in the runSystemctlUser error
handling currently only permits fallback for daemon-reload failures; update that
condition to also allow fallback when the invoked command is "restart" and the
failure matches the user manager/bus outage detector. In the error branch where
runSystemctlUser result.ok is false (inside docker-driver-gateway-service.ts),
change the fallbackAllowed expression that references args[0] and
userManagerLooksUnavailable(result.reason ?? "") so it returns true for args[0]
=== "daemon-reload" OR args[0] === "restart" when
userManagerLooksUnavailable(...) is true, ensuring restart failures due to
bus/user-manager outages permit the standalone fallback.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fc5291fc-ff9e-4407-9444-a734dc2c9a01
📒 Files selected for processing (3)
src/lib/onboard/docker-driver-gateway-service.test.tssrc/lib/onboard/docker-driver-gateway-service.tstest/e2e/test-openshell-version-pin.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- src/lib/onboard/docker-driver-gateway-service.test.ts
- test/e2e/test-openshell-version-pin.sh
Selective E2E Results — ✅ All requested jobs passedRun: 26719041225
|
Selective E2E Results — ✅ All requested jobs passedRun: 26719097375
|
Selective E2E Results — ✅ All requested jobs passedRun: 26719158501
|
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Selective E2E Results — ✅ All requested jobs passedRun: 26720532591
|
Selective E2E Results — ✅ All requested jobs passedRun: 26720778107
|
Summary
Post-Computex / v0.0.60 follow-up for the gateway lifecycle half of #4423. This is intentionally separate from #4578, which remains the v0.0.56 safety hotfix for non-destructive status behavior.
openshell-gatewayuser service when its vendor/package unit is present, writing Docker-driver gateway env before restart.Validation
npm run build:clinpm run typecheck:clinpm run checksnpm run source-shape:checknpm run check:installer-hashbash -n scripts/install-openshell.shbash test/e2e/test-openshell-version-pin.shnpx vitest run src/lib/onboard/docker-driver-gateway-env.test.ts src/lib/onboard/docker-driver-gateway-service.test.ts test/install-openshell-version-check.test.ts test/runner.test.ts test/onboard-gateway-runtime.test.ts test/gateway-state-reconcile-2276.test.tsRefs #4423
Follow-up to #4578
Summary by CodeRabbit