From 04b7523a4b3e0692374bc8229b4eab11be5ee0df Mon Sep 17 00:00:00 2001 From: Aaron Erickson Date: Sun, 31 May 2026 08:39:39 -0700 Subject: [PATCH 1/9] fix(onboard): use OpenShell gateway user service Signed-off-by: Aaron Erickson --- docs/reference/architecture.mdx | 5 +- docs/reference/commands.mdx | 7 +- scripts/install-openshell.sh | 80 +++++++++- src/lib/onboard.ts | 73 ++++++++- .../onboard/docker-driver-gateway-env.test.ts | 31 +++- src/lib/onboard/docker-driver-gateway-env.ts | 12 +- .../docker-driver-gateway-service.test.ts | 95 ++++++++++++ .../onboard/docker-driver-gateway-service.ts | 145 ++++++++++++++++++ test/install-openshell-version-check.test.ts | 98 +++++++++++- 9 files changed, 525 insertions(+), 21 deletions(-) create mode 100644 src/lib/onboard/docker-driver-gateway-service.test.ts create mode 100644 src/lib/onboard/docker-driver-gateway-service.ts diff --git a/docs/reference/architecture.mdx b/docs/reference/architecture.mdx index 91067b2c3a..c291f99c01 100644 --- a/docs/reference/architecture.mdx +++ b/docs/reference/architecture.mdx @@ -77,8 +77,9 @@ graph LR The logical diagram above shows how components relate. This section shows what actually runs where on the host. NemoClaw's default Docker-driver topology does not place the sandbox in an embedded k3s cluster. -On Linux and Apple Silicon macOS, NemoClaw starts the OpenShell Docker-driver gateway and creates the sandbox as a Docker container. -The gateway normally runs as a host process; Linux hosts that need the gateway compatibility patch may run the same gateway binary inside a small container. +On Linux, NemoClaw configures and restarts the package-managed OpenShell gateway user service when it is installed, then creates the sandbox as a Docker container. +If the upstream service is unavailable, NemoClaw falls back to the standalone gateway process used by earlier installs. +On Apple Silicon macOS, NemoClaw starts the OpenShell Docker-driver gateway and creates the sandbox as a Docker container. In both Docker-driver modes, the sandbox is a Docker container, not a Kubernetes pod. Legacy non-Docker-driver installs still use the k3s-based gateway path; the diagram below shows the standard Docker-driver topology. diff --git a/docs/reference/commands.mdx b/docs/reference/commands.mdx index d1a45c5a4e..3afc173488 100644 --- a/docs/reference/commands.mdx +++ b/docs/reference/commands.mdx @@ -1253,7 +1253,7 @@ Earlier releases only stopped `openshell forward` processes, so those orphans ac For Local Ollama setups, uninstall also stops matching Ollama auth proxy processes before deleting `~/.nemoclaw` state so stale proxy listeners do not block a later reinstall. -On Linux, uninstall removes `~/.local/state/nemoclaw`, which contains Docker-driver gateway PID files, SQLite data, audit logs, and VM-driver state. +On Linux, uninstall removes `~/.local/state/nemoclaw`, which contains Docker-driver gateway SQLite data, audit logs, VM-driver state, and standalone-fallback gateway PID files. | Flag | Effect | |---|---| @@ -1418,9 +1418,10 @@ These flags toggle optional behaviors during onboarding; set them before running | `NEMOCLAW_SANDBOX_GPU` | `auto`, `1`, or `0` | Controls sandbox GPU passthrough during onboarding. `auto` enables GPU passthrough when an NVIDIA GPU is detected, `1` requires GPU passthrough, and `0` forces CPU-only sandbox creation. | | `NEMOCLAW_SANDBOX_GPU_DEVICE` | OpenShell GPU device selector | Selects the GPU device passed with `openshell sandbox create --gpu-device`. Requires explicit sandbox GPU enablement with `NEMOCLAW_SANDBOX_GPU=1` (or `--sandbox-gpu` for CLI-driven onboarding); otherwise onboarding rejects the selector instead of treating it as an implicit opt-in. | | `NEMOCLAW_DOCKER_GPU_PATCH` | `0` to disable, anything else to keep the default | Controls the Linux Docker-driver GPU sandbox compatibility patch. Set to `0` only as an escape hatch when the patch fails and you need onboarding to continue without patching the GPU sandbox container. | -| `NEMOCLAW_OPENSHELL_GATEWAY_BIN` | path | Advanced override for the `openshell-gateway` binary used by the Linux Docker-driver gateway. Defaults to the binary next to `openshell`, then common install paths. | +| `NEMOCLAW_OPENSHELL_GATEWAY_BIN` | path | Advanced override for the `openshell-gateway` binary used by the Linux Docker-driver standalone fallback. Defaults to the binary next to `openshell`, then common install paths. | | `NEMOCLAW_OPENSHELL_SANDBOX_BIN` | path | Advanced override for the `openshell-sandbox` binary passed to the Linux Docker-driver gateway supervisor. Defaults to the binary next to `openshell`, then common install paths. | -| `NEMOCLAW_OPENSHELL_GATEWAY_STATE_DIR` | path | Advanced override for the Linux Docker-driver gateway pid file and SQLite state directory. Defaults to `~/.local/state/nemoclaw/openshell-docker-gateway`. | +| `NEMOCLAW_OPENSHELL_GATEWAY_STATE_DIR` | path | Advanced override for the Linux Docker-driver gateway SQLite state directory and standalone-fallback PID file. Defaults to `~/.local/state/nemoclaw/openshell-docker-gateway`. | +| `NEMOCLAW_OPENSHELL_STANDALONE_INSTALL` | `1` to enable | Advanced fallback that skips the upstream Linux OpenShell package/service installer and installs standalone release binaries instead. | | `NEMOCLAW_WECHAT_QUIET` | `1` to enable | Silences the `[wechat]` diagnostic lines printed during the host-side WeChat QR login (poll status, IDC redirects, swallowed gateway errors), which are visible by default while the experimental WeChat path stabilizes; set `1` once the flow is reliable in your environment. | ### Onboard Profiling Traces diff --git a/scripts/install-openshell.sh b/scripts/install-openshell.sh index 6ba0381679..cebdda6360 100755 --- a/scripts/install-openshell.sh +++ b/scripts/install-openshell.sh @@ -130,6 +130,21 @@ required_driver_bins_present() { esac } +openshell_gateway_user_service_present() { + [ "$OS" = "Linux" ] || return 1 + [ -f "${HOME:-}/.config/systemd/user/openshell-gateway.service" ] || \ + [ -f /etc/systemd/user/openshell-gateway.service ] || \ + [ -f /usr/local/lib/systemd/user/openshell-gateway.service ] || \ + [ -f /usr/lib/systemd/user/openshell-gateway.service ] || \ + [ -f /lib/systemd/user/openshell-gateway.service ] +} + +should_require_openshell_gateway_user_service() { + [ "$OS" = "Linux" ] && \ + [ "$RESOLVED_CHANNEL" != "dev" ] && \ + [ "${NEMOCLAW_OPENSHELL_STANDALONE_INSTALL:-}" != "1" ] +} + OPENSHELL_FEATURE_CHECK_ERROR="" openshell_has_required_messaging_features() { @@ -275,6 +290,8 @@ if command -v openshell >/dev/null 2>&1; then warn "openshell $INSTALLED_VERSION is missing Docker-driver binaries — reinstalling pinned OpenShell ${PIN_VERSION}..." elif ! openshell_has_required_messaging_features; then fail "${OPENSHELL_FEATURE_CHECK_ERROR:-openshell $INSTALLED_VERSION is missing required messaging credential rewrite support. Install an OpenShell build that includes provider aliases, WebSocket text rewrite, and request-body credential rewrite.}" + elif should_require_openshell_gateway_user_service && ! openshell_gateway_user_service_present; then + warn "openshell $INSTALLED_VERSION is missing the package-managed gateway user service — reinstalling pinned OpenShell ${PIN_VERSION}..." else info "openshell already installed: $INSTALLED_VERSION (>= $MIN_VERSION, <= $MAX_VERSION, messaging rewrite capable)" exit 0 @@ -287,6 +304,66 @@ fi info "Installing OpenShell from release '$RELEASE_TAG'..." +tmpdir="$(mktemp -d)" +trap 'rm -rf "$tmpdir"' EXIT + +install_with_upstream_package_service() { + [ "$OS" = "Linux" ] || return 1 + [ "$RESOLVED_CHANNEL" != "dev" ] || return 1 + [ "${NEMOCLAW_OPENSHELL_STANDALONE_INSTALL:-}" != "1" ] || return 1 + command -v curl >/dev/null 2>&1 || return 1 + + local installer="$tmpdir/openshell-install.sh" + local installer_url="https://raw.githubusercontent.com/NVIDIA/OpenShell/${RELEASE_TAG}/install.sh" + local installer_status=0 + local installed_bin="" + local feature_status=0 + local breaking_ack="${OPENSHELL_ACK_BREAKING_UPGRADE:-}" + + if [ -z "$breaking_ack" ] && [ -n "${NEMOCLAW_OPENSHELL_UPGRADE_PREPARED:-}" ]; then + breaking_ack=1 + fi + + info "Installing OpenShell ${RELEASE_TAG} with the upstream package installer..." + if ! curl -fLsS --retry 3 --max-redirs 5 -o "$installer" "$installer_url"; then + warn "upstream package installer could not be downloaded — falling back to standalone binaries" + return 1 + fi + chmod 755 "$installer" + + OPENSHELL_VERSION="$RELEASE_TAG" \ + OPENSHELL_ACK_BREAKING_UPGRADE="$breaking_ack" \ + sh "$installer" || installer_status=$? + + installed_bin="$(command -v openshell 2>/dev/null || true)" + if [ -n "$installed_bin" ] && required_driver_bins_present && openshell_gateway_user_service_present; then + if openshell_has_required_messaging_features "$installed_bin"; then + if [ "$installer_status" != "0" ]; then + warn "upstream installer returned exit ${installer_status} after installing binaries and the gateway user service; NemoClaw will restart the service during onboarding" + fi + info "$("$installed_bin" --version 2>&1 || echo openshell) installed with upstream package/service support" + return 0 + else + feature_status=$? + if [ "$feature_status" = "2" ]; then + fail "$OPENSHELL_FEATURE_CHECK_ERROR" + fi + warn "${OPENSHELL_FEATURE_CHECK_ERROR:-upstream package install did not provide the required OpenShell messaging features}" + fi + fi + + if [ "$installer_status" != "0" ]; then + warn "upstream package installer failed (exit ${installer_status}) — falling back to standalone binaries" + else + warn "upstream package installer did not provide required Docker-driver binaries and user service — falling back to standalone binaries" + fi + return 1 +} + +if install_with_upstream_package_service; then + exit 0 +fi + case "$OS" in Darwin) case "$ARCH_LABEL" in @@ -332,9 +409,6 @@ case "$OS" in ;; esac -tmpdir="$(mktemp -d)" -trap 'rm -rf "$tmpdir"' EXIT - download_with_curl() { local name local -a curl_progress diff --git a/src/lib/onboard.ts b/src/lib/onboard.ts index 504004f75f..9a3827d06c 100644 --- a/src/lib/onboard.ts +++ b/src/lib/onboard.ts @@ -465,6 +465,8 @@ const { reportDockerDriverGatewayStartFailure } = const dockerDriverGatewayEnv: typeof import("./onboard/docker-driver-gateway-env") = require("./onboard/docker-driver-gateway-env"); const { getDockerDriverGatewayEndpoint } = dockerDriverGatewayEnv; +const dockerDriverGatewayService: typeof import("./onboard/docker-driver-gateway-service") = + require("./onboard/docker-driver-gateway-service"); const dockerDriverGatewayRuntimeMarker: typeof import("./onboard/docker-driver-gateway-runtime-marker") = require("./onboard/docker-driver-gateway-runtime-marker"); const hostGatewayProcess: typeof import("./onboard/host-gateway-process") = @@ -2409,13 +2411,72 @@ async function startGatewayWithOptions( process.env.OPENSHELL_GATEWAY = GATEWAY_NAME; } +async function tryStartPackageManagedDockerDriverGateway({ + exitOnFailure, + skipSandboxBridgeReachability, + verifySandboxBridgeGatewayReachableOrExit, +}: { + exitOnFailure: boolean; + skipSandboxBridgeReachability: boolean; + verifySandboxBridgeGatewayReachableOrExit: ( + exitOnFailure: boolean, + options?: { skip?: boolean }, + ) => Promise; +}): Promise { + if (!dockerDriverGatewayService.hasOpenShellGatewayUserService()) return false; + + console.log(" Starting OpenShell Docker-driver gateway via upstream user service..."); + const serviceStart = dockerDriverGatewayService.startOpenShellGatewayUserService(); + if (!serviceStart.started) { + const detail = serviceStart.reason ? ` (${serviceStart.reason})` : ""; + if (serviceStart.fallbackAllowed) { + console.warn(` OpenShell gateway user service is unavailable${detail}; using standalone fallback.`); + return false; + } + const message = `OpenShell gateway user service failed to start${detail}.`; + console.error(` ${message}`); + console.error(" Check: systemctl --user status openshell-gateway"); + if (exitOnFailure) process.exit(1); + throw new Error(message); + } + + clearDockerDriverGatewayRuntimeFiles(); + const pollCount = envInt("NEMOCLAW_HEALTH_POLL_COUNT", 30); + const pollInterval = envInt("NEMOCLAW_HEALTH_POLL_INTERVAL", 2); + for (let i = 0; i < pollCount; i += 1) { + if (!registerDockerDriverGatewayEndpoint()) { + if (i < pollCount - 1) sleepSeconds(pollInterval); + continue; + } + const status = runCaptureOpenshell(["status"], { ignoreError: true }); + const namedInfo = runCaptureOpenshell(["gateway", "info", "-g", GATEWAY_NAME], { + ignoreError: true, + }); + const currentInfo = runCaptureOpenshell(["gateway", "info"], { ignoreError: true }); + if (isGatewayHealthy(status, namedInfo, currentInfo) && (await isGatewayTcpReady())) { + await verifySandboxBridgeGatewayReachableOrExit(exitOnFailure, { + skip: skipSandboxBridgeReachability, + }); + console.log(" ✓ OpenShell gateway user service is healthy"); + return true; + } + if (i < pollCount - 1) sleepSeconds(pollInterval); + } + + const message = "OpenShell gateway user service started but did not become healthy."; + console.error(` ${message}`); + console.error(" Check: systemctl --user status openshell-gateway"); + if (exitOnFailure) process.exit(1); + throw new Error(message); +} + async function startDockerDriverGateway({ exitOnFailure = true, skipSandboxBridgeReachability = false }: { exitOnFailure?: boolean; skipSandboxBridgeReachability?: boolean } = {}): Promise { - dockerDriverGatewayEnv.writeDockerGatewayDebEnvOverride(() => getDockerDriverGatewayEnv()); const gatewayBin = resolveOpenShellGatewayBinary(); const openshellVersionOutput = runCaptureOpenshell(["--version"], { ignoreError: true, }); const gatewayEnv = getDockerDriverGatewayEnv(openshellVersionOutput); + dockerDriverGatewayEnv.writeDockerGatewayDebEnvOverride(() => gatewayEnv); const stateDir = getDockerDriverGatewayStateDir(); const runtimeIdentity = gatewayBin ? dockerDriverGatewayLaunch.buildDockerDriverGatewayRuntimeIdentity({ gatewayBin, gatewayEnv, stateDir, sandboxBin: resolveOpenShellSandboxBinary() }) : null; const gatewayLaunch = runtimeIdentity?.launch ?? null; @@ -2425,6 +2486,16 @@ async function startDockerDriverGateway({ exitOnFailure = true, skipSandboxBridg const { verifySandboxBridgeGatewayReachableOrExit } = require("./onboard/gateway-sandbox-reachability") as typeof import("./onboard/gateway-sandbox-reachability"); + if ( + await tryStartPackageManagedDockerDriverGateway({ + exitOnFailure, + skipSandboxBridgeReachability, + verifySandboxBridgeGatewayReachableOrExit, + }) + ) { + return; + } + const gatewayStatus = runCaptureOpenshell(["status"], { ignoreError: true }); const gwInfo = runCaptureOpenshell(["gateway", "info", "-g", GATEWAY_NAME], { ignoreError: true, diff --git a/src/lib/onboard/docker-driver-gateway-env.test.ts b/src/lib/onboard/docker-driver-gateway-env.test.ts index 21d82312ca..476281c3da 100644 --- a/src/lib/onboard/docker-driver-gateway-env.test.ts +++ b/src/lib/onboard/docker-driver-gateway-env.test.ts @@ -127,15 +127,16 @@ describe("writeDockerGatewayDebEnvOverride", () => { const existsSpy = vi .spyOn(fs, "existsSync") - .mockImplementation((candidate) => candidate === "/usr/bin/openshell-gateway"); + .mockImplementation((candidate) => candidate === "/usr/lib/systemd/user/openshell-gateway.service"); const homedirSpy = vi.spyOn(os, "homedir").mockReturnValue(tempHome); try { - writeDockerGatewayDebEnvOverride(() => ({ + const wrote = writeDockerGatewayDebEnvOverride(() => ({ OPENSHELL_BIND_ADDRESS: "127.0.0.1", - })); + }), { platform: "linux" }); const envFileContent = fs.readFileSync(envFile, "utf-8"); + expect(wrote).toBe(true); expect(fs.statSync(envDir).mode & 0o777).toBe(0o700); expect(fs.statSync(envFile).mode & 0o777).toBe(0o600); expect(envFileContent).toContain("KEEP_ME=1\n"); @@ -146,4 +147,28 @@ describe("writeDockerGatewayDebEnvOverride", () => { fs.rmSync(tempHome, { recursive: true, force: true }); } }); + + it("does not write service env for standalone gateway binaries", () => { + const tempHome = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-gateway-env-")); + const existsSpy = vi + .spyOn(fs, "existsSync") + .mockImplementation((candidate) => candidate === "/usr/bin/openshell-gateway"); + const homedirSpy = vi.spyOn(os, "homedir").mockReturnValue(tempHome); + + try { + const wrote = writeDockerGatewayDebEnvOverride( + () => ({ + OPENSHELL_BIND_ADDRESS: "127.0.0.1", + }), + { platform: "linux" }, + ); + + expect(wrote).toBe(false); + expect(fs.existsSync(path.join(tempHome, ".config", "openshell", "gateway.env"))).toBe(false); + } finally { + existsSpy.mockRestore(); + homedirSpy.mockRestore(); + fs.rmSync(tempHome, { recursive: true, force: true }); + } + }); }); diff --git a/src/lib/onboard/docker-driver-gateway-env.ts b/src/lib/onboard/docker-driver-gateway-env.ts index 5962d76106..a9036afdf0 100644 --- a/src/lib/onboard/docker-driver-gateway-env.ts +++ b/src/lib/onboard/docker-driver-gateway-env.ts @@ -13,6 +13,7 @@ import { getGatewayHttpsEndpoint, } from "../core/gateway-address"; import { GATEWAY_PORT } from "../core/ports"; +import { hasOpenShellGatewayUserService } from "./docker-driver-gateway-service"; export { getGatewayHttpsEndpoint }; @@ -133,13 +134,9 @@ function readTextFileIfPresent(filePath: string): string { export function writeDockerGatewayDebEnvOverride( getOverride: () => Record, -): void { - const servicePaths = [ - "/usr/bin/openshell-gateway", - "/usr/lib/systemd/user/openshell-gateway.service", - "/lib/systemd/user/openshell-gateway.service", - ]; - if (!servicePaths.some((candidate) => fs.existsSync(candidate))) return; + opts: Parameters[0] = {}, +): boolean { + if (!hasOpenShellGatewayUserService(opts)) return false; const override = getOverride(); const envDir = path.join(os.homedir(), ".config", "openshell"); const envFile = path.join(envDir, "gateway.env"); @@ -151,4 +148,5 @@ export function writeDockerGatewayDebEnvOverride( mode: 0o600, }); fs.chmodSync(envFile, 0o600); + return true; } diff --git a/src/lib/onboard/docker-driver-gateway-service.test.ts b/src/lib/onboard/docker-driver-gateway-service.test.ts new file mode 100644 index 0000000000..dce7895fa9 --- /dev/null +++ b/src/lib/onboard/docker-driver-gateway-service.test.ts @@ -0,0 +1,95 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import { describe, expect, it, vi } from "vitest"; + +import { + getOpenShellGatewayUserServicePaths, + hasOpenShellGatewayUserService, + startOpenShellGatewayUserService, + type SpawnSyncLikeResult, +} from "./docker-driver-gateway-service"; + +function spawnResult(status = 0, stderr = ""): SpawnSyncLikeResult { + return { + error: undefined, + status, + stderr, + stdout: "", + }; +} + +describe("docker-driver-gateway-service", () => { + it("detects the upstream OpenShell user service only on Linux", () => { + const homeDir = "/home/nvidia"; + const existsSync = (candidate: string) => + candidate === "/usr/lib/systemd/user/openshell-gateway.service"; + + expect(hasOpenShellGatewayUserService({ existsSync, homeDir, platform: "linux" })).toBe(true); + expect(hasOpenShellGatewayUserService({ existsSync, homeDir, platform: "darwin" })).toBe(false); + expect(getOpenShellGatewayUserServicePaths(homeDir)).toContain( + "/home/nvidia/.config/systemd/user/openshell-gateway.service", + ); + }); + + it("restarts the upstream user service with systemctl --user", () => { + const spawnSyncImpl = vi.fn((_command: string, _args: string[]) => spawnResult()); + + const result = startOpenShellGatewayUserService({ + commandExists: (command) => command === "systemctl", + env: {}, + existsSync: (candidate) => candidate === "/lib/systemd/user/openshell-gateway.service", + platform: "linux", + spawnSyncImpl, + }); + + expect(result).toEqual({ attempted: true, fallbackAllowed: false, started: true }); + expect(spawnSyncImpl.mock.calls.map(([command, args]) => [command, args])).toEqual([ + ["systemctl", ["--user", "daemon-reload"]], + ["systemctl", ["--user", "enable", "openshell-gateway"]], + ["systemctl", ["--user", "restart", "openshell-gateway"]], + ]); + }); + + it("allows standalone fallback when the user systemd manager is unavailable", () => { + const result = startOpenShellGatewayUserService({ + commandExists: () => true, + env: {}, + existsSync: () => true, + platform: "linux", + spawnSyncImpl: vi.fn((_command: string, args: string[]) => + Array.isArray(args) && args.includes("daemon-reload") + ? spawnResult(1, "Failed to connect to bus") + : spawnResult(), + ), + }); + + expect(result).toMatchObject({ + attempted: true, + fallbackAllowed: true, + started: false, + }); + expect(result.reason).toContain("Failed to connect to bus"); + }); + + it("does not silently fall back when the installed service fails to restart", () => { + const result = startOpenShellGatewayUserService({ + commandExists: () => true, + env: {}, + existsSync: () => true, + platform: "linux", + spawnSyncImpl: vi.fn((_command: string, args: string[]) => + Array.isArray(args) && args.includes("restart") + ? spawnResult(1, "Job failed") + : spawnResult(), + ), + }); + + expect(result).toMatchObject({ + attempted: true, + fallbackAllowed: false, + started: false, + }); + expect(result.reason).toContain("Job failed"); + }); +}); diff --git a/src/lib/onboard/docker-driver-gateway-service.ts b/src/lib/onboard/docker-driver-gateway-service.ts new file mode 100644 index 0000000000..5c40a0bd2a --- /dev/null +++ b/src/lib/onboard/docker-driver-gateway-service.ts @@ -0,0 +1,145 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import { spawnSync, type SpawnSyncOptions } from "node:child_process"; +import fs from "node:fs"; +import os from "node:os"; +import path from "node:path"; + +export const OPENSHELL_GATEWAY_USER_SERVICE = "openshell-gateway"; + +export interface OpenShellGatewayUserServiceOptions { + commandExists?: (command: string) => boolean; + env?: NodeJS.ProcessEnv; + existsSync?: (filePath: string) => boolean; + homeDir?: string; + platform?: NodeJS.Platform; + spawnSyncImpl?: SpawnSyncLike; +} + +export interface OpenShellGatewayUserServiceStartResult { + attempted: boolean; + fallbackAllowed: boolean; + reason?: string; + started: boolean; +} + +export interface SpawnSyncLikeResult { + error?: Error; + status: number | null; + stderr?: Buffer | string | null; + stdout?: Buffer | string | null; +} + +export type SpawnSyncLike = ( + command: string, + args: string[], + options?: SpawnSyncOptions, +) => SpawnSyncLikeResult; + +export function getOpenShellGatewayUserServicePaths(homeDir = os.homedir()): string[] { + return [ + path.join(homeDir, ".config", "systemd", "user", "openshell-gateway.service"), + "/etc/systemd/user/openshell-gateway.service", + "/usr/local/lib/systemd/user/openshell-gateway.service", + "/usr/lib/systemd/user/openshell-gateway.service", + "/lib/systemd/user/openshell-gateway.service", + ]; +} + +export function hasOpenShellGatewayUserService( + opts: Pick = {}, +): boolean { + if ((opts.platform ?? process.platform) !== "linux") return false; + const existsSync = opts.existsSync ?? fs.existsSync; + return getOpenShellGatewayUserServicePaths(opts.homeDir).some((candidate) => existsSync(candidate)); +} + +function defaultCommandExists(command: string, env: NodeJS.ProcessEnv): boolean { + return ( + spawnSync("sh", ["-c", 'command -v "$1" >/dev/null 2>&1', "sh", command], { + encoding: "utf-8", + env, + }).status === 0 + ); +} + +function text(value: Buffer | string | null | undefined): string { + if (typeof value === "string") return value; + if (Buffer.isBuffer(value)) return value.toString("utf-8"); + return ""; +} + +function userManagerLooksUnavailable(reason: string): boolean { + return /Failed to connect to bus|No medium found|XDG_RUNTIME_DIR|System has not been booted|Host is down|No such file or directory/i.test( + reason, + ); +} + +function runSystemctlUser( + args: string[], + opts: Required>, +): { ok: boolean; reason?: string } { + const result = opts.spawnSyncImpl("systemctl", ["--user", ...args], { + encoding: "utf-8", + env: opts.env, + stdio: ["ignore", "pipe", "pipe"], + } satisfies SpawnSyncOptions); + if (result.error) { + return { ok: false, reason: result.error.message }; + } + if (result.status !== 0) { + const detail = text(result.stderr).trim() || text(result.stdout).trim() || `exit ${String(result.status)}`; + return { ok: false, reason: detail }; + } + return { ok: true }; +} + +export function startOpenShellGatewayUserService( + opts: OpenShellGatewayUserServiceOptions = {}, +): OpenShellGatewayUserServiceStartResult { + const platform = opts.platform ?? process.platform; + if (platform !== "linux") { + return { attempted: false, fallbackAllowed: true, started: false, reason: "not a Linux host" }; + } + const existsSync = opts.existsSync ?? fs.existsSync; + if (!hasOpenShellGatewayUserService({ existsSync, homeDir: opts.homeDir, platform })) { + return { + attempted: false, + fallbackAllowed: true, + started: false, + reason: "service unit not installed", + }; + } + + const env = opts.env ?? process.env; + const commandExists = opts.commandExists ?? ((command) => defaultCommandExists(command, env)); + if (!commandExists("systemctl")) { + return { + attempted: true, + fallbackAllowed: true, + started: false, + reason: "systemctl is not available", + }; + } + + const spawnSyncImpl = opts.spawnSyncImpl ?? spawnSync; + for (const args of [ + ["daemon-reload"], + ["enable", OPENSHELL_GATEWAY_USER_SERVICE], + ["restart", OPENSHELL_GATEWAY_USER_SERVICE], + ]) { + const result = runSystemctlUser(args, { env, spawnSyncImpl }); + if (!result.ok) { + const reason = `systemctl --user ${args.join(" ")} failed: ${result.reason}`; + return { + attempted: true, + fallbackAllowed: args[0] === "daemon-reload" && userManagerLooksUnavailable(result.reason ?? ""), + reason, + started: false, + }; + } + } + + return { attempted: true, fallbackAllowed: false, started: true }; +} diff --git a/test/install-openshell-version-check.test.ts b/test/install-openshell-version-check.test.ts index 59a329681c..1d32121edb 100644 --- a/test/install-openshell-version-check.test.ts +++ b/test/install-openshell-version-check.test.ts @@ -26,11 +26,13 @@ function runWithInstalledVersion( options: { capability?: boolean; driverBins?: boolean | "gateway" | "gateway-vm"; + gatewayService?: boolean; os?: string; arch?: string; } = {}, ) { const capability = options.capability ?? true; + const hostOs = options.os ?? "Linux"; const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-ver-")); try { const fakeBin = path.join(tmp, "bin"); @@ -39,7 +41,7 @@ function runWithInstalledVersion( writeExecutable( path.join(fakeBin, "uname"), `#!/usr/bin/env bash -if [ "\${1:-}" = "-m" ]; then echo "${options.arch ?? "x86_64"}"; else echo "${options.os ?? "Linux"}"; fi`, +if [ "\${1:-}" = "-m" ]; then echo "${options.arch ?? "x86_64"}"; else echo "${hostOs}"; fi`, ); // Fake openshell that reports the given version @@ -88,7 +90,13 @@ exit 1`, exit 1`, ); - if ((options.os ?? "Linux") === "Darwin") { + if (hostOs === "Linux" && options.gatewayService !== false) { + const serviceDir = path.join(tmp, ".config", "systemd", "user"); + fs.mkdirSync(serviceDir, { recursive: true }); + fs.writeFileSync(path.join(serviceDir, "openshell-gateway.service"), "[Service]\n"); + } + + if (hostOs === "Darwin") { writeExecutable( path.join(fakeBin, "codesign"), `#!/usr/bin/env bash @@ -112,6 +120,7 @@ exit 0`, return spawnSync("bash", [SCRIPT], { env: { ...process.env, + HOME: tmp, NEMOCLAW_OPENSHELL_CHANNEL: "stable", ...extraEnv, PATH: `${fakeBin}:/usr/bin:/bin`, @@ -130,6 +139,23 @@ describe("install-openshell.sh version check", { timeout: 15_000 }, () => { expect(result.stdout).toMatch(/already installed.*0\.0\.44/); }); + it("reinstalls Linux stable OpenShell when the package-managed gateway user service is missing", () => { + const result = runWithInstalledVersion("0.0.44", {}, { gatewayService: false }); + expect(result.status).not.toBe(0); + expect(result.stdout).toMatch(/missing the package-managed gateway user service/); + expect(result.stdout).toMatch(/Installing OpenShell from release 'v0\.0\.44'/); + }); + + it("allows standalone Linux installs as an explicit fallback", () => { + const result = runWithInstalledVersion( + "0.0.44", + { NEMOCLAW_OPENSHELL_STANDALONE_INSTALL: "1" }, + { gatewayService: false }, + ); + expect(result.status).toBe(0); + expect(result.stdout).toMatch(/already installed.*0\.0\.44/); + }); + it("triggers reinstall when openshell 0.0.44 is missing Docker-driver binaries", () => { const result = runWithInstalledVersion("0.0.44", {}, { driverBins: false, os: "Linux" }); expect(result.status).not.toBe(0); @@ -476,6 +502,74 @@ exit 0`, expect(result.stdout).toMatch(/required dev-channel messaging-rewrite build/); }); + it("uses the upstream Linux package installer by default", () => { + const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-package-")); + try { + const fakeBin = path.join(tmp, "bin"); + fs.mkdirSync(fakeBin); + + writeExecutable( + path.join(fakeBin, "uname"), + `#!/usr/bin/env bash +if [ "\${1:-}" = "-m" ]; then echo "x86_64"; else echo "Linux"; fi`, + ); + writeExecutable( + path.join(fakeBin, "curl"), + `#!/usr/bin/env bash +out="" +while [ "$#" -gt 0 ]; do + if [ "$1" = "-o" ]; then + shift + out="$1" + fi + shift || true +done +[ -n "$out" ] || exit 1 +cat > "$out" <<'INSTALLER' +#!/usr/bin/env sh +mkdir -p "$NEMOCLAW_FAKE_INSTALL_BIN" "$HOME/.config/systemd/user" +cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell" <<'BIN' +#!/usr/bin/env sh +if [ "\${1:-}" = "--version" ]; then echo "openshell 0.0.44"; exit 0; fi +# request-body-credential-rewrite websocket-credential-rewrite +exit 0 +BIN +cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-gateway" <<'BIN' +#!/usr/bin/env sh +exit 0 +BIN +cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-sandbox" <<'BIN' +#!/usr/bin/env sh +exit 0 +BIN +printf '[Service]\\n' > "$HOME/.config/systemd/user/openshell-gateway.service" +chmod 755 "$NEMOCLAW_FAKE_INSTALL_BIN/openshell" "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-gateway" "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-sandbox" +exit "\${NEMOCLAW_FAKE_INSTALL_STATUS:-0}" +INSTALLER +exit 0`, + ); + + const result = spawnSync("bash", [SCRIPT], { + env: { + ...process.env, + HOME: tmp, + NEMOCLAW_FAKE_INSTALL_BIN: fakeBin, + NEMOCLAW_FAKE_INSTALL_STATUS: "17", + NEMOCLAW_OPENSHELL_CHANNEL: "stable", + PATH: `${fakeBin}:/usr/bin:/bin`, + }, + encoding: "utf8", + }); + + expect(result.status, `${result.stdout}\n${result.stderr}`).toBe(0); + expect(result.stdout).toMatch(/upstream package installer/); + expect(result.stdout).toMatch(/installed with upstream package\/service support/); + expect(result.stdout).not.toMatch(/Downloading OpenShell release assets/); + } finally { + fs.rmSync(tmp, { recursive: true, force: true }); + } + }); + it("proceeds to install when openshell is not present", () => { const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-noop-")); try { From 90a5959cb9b39e7058ef96abd1468cbd46bd8b83 Mon Sep 17 00:00:00 2001 From: Aaron Erickson Date: Sun, 31 May 2026 08:44:41 -0700 Subject: [PATCH 2/9] refactor(onboard): keep gateway service start outside entrypoint Signed-off-by: Aaron Erickson --- src/lib/onboard.ts | 73 +----------------- .../onboard/docker-driver-gateway-service.ts | 74 +++++++++++++++++++ 2 files changed, 77 insertions(+), 70 deletions(-) diff --git a/src/lib/onboard.ts b/src/lib/onboard.ts index 9a3827d06c..8791c40607 100644 --- a/src/lib/onboard.ts +++ b/src/lib/onboard.ts @@ -460,13 +460,11 @@ const { isGatewayTcpReady } = require("./onboard/gateway-tcp-readiness") as typeof import("./onboard/gateway-tcp-readiness"); const { trackChildExit } = require("./onboard/child-exit-tracker") as typeof import("./onboard/child-exit-tracker"); -const { reportDockerDriverGatewayStartFailure } = - require("./onboard/docker-driver-gateway-failure") as typeof import("./onboard/docker-driver-gateway-failure"); +const { reportDockerDriverGatewayStartFailure } = require("./onboard/docker-driver-gateway-failure") as typeof import("./onboard/docker-driver-gateway-failure"); const dockerDriverGatewayEnv: typeof import("./onboard/docker-driver-gateway-env") = require("./onboard/docker-driver-gateway-env"); const { getDockerDriverGatewayEndpoint } = dockerDriverGatewayEnv; -const dockerDriverGatewayService: typeof import("./onboard/docker-driver-gateway-service") = - require("./onboard/docker-driver-gateway-service"); +const dockerDriverGatewayService: typeof import("./onboard/docker-driver-gateway-service") = require("./onboard/docker-driver-gateway-service"); const dockerDriverGatewayRuntimeMarker: typeof import("./onboard/docker-driver-gateway-runtime-marker") = require("./onboard/docker-driver-gateway-runtime-marker"); const hostGatewayProcess: typeof import("./onboard/host-gateway-process") = @@ -2411,65 +2409,6 @@ async function startGatewayWithOptions( process.env.OPENSHELL_GATEWAY = GATEWAY_NAME; } -async function tryStartPackageManagedDockerDriverGateway({ - exitOnFailure, - skipSandboxBridgeReachability, - verifySandboxBridgeGatewayReachableOrExit, -}: { - exitOnFailure: boolean; - skipSandboxBridgeReachability: boolean; - verifySandboxBridgeGatewayReachableOrExit: ( - exitOnFailure: boolean, - options?: { skip?: boolean }, - ) => Promise; -}): Promise { - if (!dockerDriverGatewayService.hasOpenShellGatewayUserService()) return false; - - console.log(" Starting OpenShell Docker-driver gateway via upstream user service..."); - const serviceStart = dockerDriverGatewayService.startOpenShellGatewayUserService(); - if (!serviceStart.started) { - const detail = serviceStart.reason ? ` (${serviceStart.reason})` : ""; - if (serviceStart.fallbackAllowed) { - console.warn(` OpenShell gateway user service is unavailable${detail}; using standalone fallback.`); - return false; - } - const message = `OpenShell gateway user service failed to start${detail}.`; - console.error(` ${message}`); - console.error(" Check: systemctl --user status openshell-gateway"); - if (exitOnFailure) process.exit(1); - throw new Error(message); - } - - clearDockerDriverGatewayRuntimeFiles(); - const pollCount = envInt("NEMOCLAW_HEALTH_POLL_COUNT", 30); - const pollInterval = envInt("NEMOCLAW_HEALTH_POLL_INTERVAL", 2); - for (let i = 0; i < pollCount; i += 1) { - if (!registerDockerDriverGatewayEndpoint()) { - if (i < pollCount - 1) sleepSeconds(pollInterval); - continue; - } - const status = runCaptureOpenshell(["status"], { ignoreError: true }); - const namedInfo = runCaptureOpenshell(["gateway", "info", "-g", GATEWAY_NAME], { - ignoreError: true, - }); - const currentInfo = runCaptureOpenshell(["gateway", "info"], { ignoreError: true }); - if (isGatewayHealthy(status, namedInfo, currentInfo) && (await isGatewayTcpReady())) { - await verifySandboxBridgeGatewayReachableOrExit(exitOnFailure, { - skip: skipSandboxBridgeReachability, - }); - console.log(" ✓ OpenShell gateway user service is healthy"); - return true; - } - if (i < pollCount - 1) sleepSeconds(pollInterval); - } - - const message = "OpenShell gateway user service started but did not become healthy."; - console.error(` ${message}`); - console.error(" Check: systemctl --user status openshell-gateway"); - if (exitOnFailure) process.exit(1); - throw new Error(message); -} - async function startDockerDriverGateway({ exitOnFailure = true, skipSandboxBridgeReachability = false }: { exitOnFailure?: boolean; skipSandboxBridgeReachability?: boolean } = {}): Promise { const gatewayBin = resolveOpenShellGatewayBinary(); const openshellVersionOutput = runCaptureOpenshell(["--version"], { @@ -2486,13 +2425,7 @@ async function startDockerDriverGateway({ exitOnFailure = true, skipSandboxBridg const { verifySandboxBridgeGatewayReachableOrExit } = require("./onboard/gateway-sandbox-reachability") as typeof import("./onboard/gateway-sandbox-reachability"); - if ( - await tryStartPackageManagedDockerDriverGateway({ - exitOnFailure, - skipSandboxBridgeReachability, - verifySandboxBridgeGatewayReachableOrExit, - }) - ) { + if (await dockerDriverGatewayService.startPackageManagedDockerDriverGateway({ clearDockerDriverGatewayRuntimeFiles, exitOnFailure, gatewayName: GATEWAY_NAME, registerDockerDriverGatewayEndpoint, runCaptureOpenshell, skipSandboxBridgeReachability, verifySandboxBridgeGatewayReachableOrExit })) { return; } diff --git a/src/lib/onboard/docker-driver-gateway-service.ts b/src/lib/onboard/docker-driver-gateway-service.ts index 5c40a0bd2a..363d36495b 100644 --- a/src/lib/onboard/docker-driver-gateway-service.ts +++ b/src/lib/onboard/docker-driver-gateway-service.ts @@ -6,6 +6,11 @@ import fs from "node:fs"; import os from "node:os"; import path from "node:path"; +import { sleepSeconds } from "../core/wait"; +import { isGatewayHealthy } from "../state/gateway"; +import { envInt } from "./env"; +import { isGatewayTcpReady } from "./gateway-tcp-readiness"; + export const OPENSHELL_GATEWAY_USER_SERVICE = "openshell-gateway"; export interface OpenShellGatewayUserServiceOptions { @@ -37,6 +42,19 @@ export type SpawnSyncLike = ( options?: SpawnSyncOptions, ) => SpawnSyncLikeResult; +export interface PackageManagedDockerDriverGatewayOptions { + clearDockerDriverGatewayRuntimeFiles: () => void; + exitOnFailure: boolean; + gatewayName: string; + registerDockerDriverGatewayEndpoint: () => boolean; + runCaptureOpenshell: (args: string[], opts?: { ignoreError?: boolean }) => string; + skipSandboxBridgeReachability: boolean; + verifySandboxBridgeGatewayReachableOrExit: ( + exitOnFailure: boolean, + options?: { skip?: boolean }, + ) => Promise; +} + export function getOpenShellGatewayUserServicePaths(homeDir = os.homedir()): string[] { return [ path.join(homeDir, ".config", "systemd", "user", "openshell-gateway.service"), @@ -143,3 +161,59 @@ export function startOpenShellGatewayUserService( return { attempted: true, fallbackAllowed: false, started: true }; } + +export async function startPackageManagedDockerDriverGateway({ + clearDockerDriverGatewayRuntimeFiles, + exitOnFailure, + gatewayName, + registerDockerDriverGatewayEndpoint, + runCaptureOpenshell, + skipSandboxBridgeReachability, + verifySandboxBridgeGatewayReachableOrExit, +}: PackageManagedDockerDriverGatewayOptions): Promise { + if (!hasOpenShellGatewayUserService()) return false; + + console.log(" Starting OpenShell Docker-driver gateway via upstream user service..."); + const serviceStart = startOpenShellGatewayUserService(); + if (!serviceStart.started) { + const detail = serviceStart.reason ? ` (${serviceStart.reason})` : ""; + if (serviceStart.fallbackAllowed) { + console.warn(` OpenShell gateway user service is unavailable${detail}; using standalone fallback.`); + return false; + } + const message = `OpenShell gateway user service failed to start${detail}.`; + console.error(` ${message}`); + console.error(" Check: systemctl --user status openshell-gateway"); + if (exitOnFailure) process.exit(1); + throw new Error(message); + } + + clearDockerDriverGatewayRuntimeFiles(); + const pollCount = envInt("NEMOCLAW_HEALTH_POLL_COUNT", 30); + const pollInterval = envInt("NEMOCLAW_HEALTH_POLL_INTERVAL", 2); + for (let i = 0; i < pollCount; i += 1) { + if (!registerDockerDriverGatewayEndpoint()) { + if (i < pollCount - 1) sleepSeconds(pollInterval); + continue; + } + const status = runCaptureOpenshell(["status"], { ignoreError: true }); + const namedInfo = runCaptureOpenshell(["gateway", "info", "-g", gatewayName], { + ignoreError: true, + }); + const currentInfo = runCaptureOpenshell(["gateway", "info"], { ignoreError: true }); + if (isGatewayHealthy(status, namedInfo, currentInfo) && (await isGatewayTcpReady())) { + await verifySandboxBridgeGatewayReachableOrExit(exitOnFailure, { + skip: skipSandboxBridgeReachability, + }); + console.log(" ✓ OpenShell gateway user service is healthy"); + return true; + } + if (i < pollCount - 1) sleepSeconds(pollInterval); + } + + const message = "OpenShell gateway user service started but did not become healthy."; + console.error(` ${message}`); + console.error(" Check: systemctl --user status openshell-gateway"); + if (exitOnFailure) process.exit(1); + throw new Error(message); +} From 795b3c376c96cd021ac9091c6160b37ab9ae2f51 Mon Sep 17 00:00:00 2001 From: Aaron Erickson Date: Sun, 31 May 2026 08:47:30 -0700 Subject: [PATCH 3/9] refactor(onboard): keep service lifecycle out of entrypoint Signed-off-by: Aaron Erickson --- src/lib/onboard.ts | 8 ++------ src/lib/onboard/docker-driver-gateway-env.ts | 1 + 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/src/lib/onboard.ts b/src/lib/onboard.ts index 8791c40607..cb0da8915e 100644 --- a/src/lib/onboard.ts +++ b/src/lib/onboard.ts @@ -464,7 +464,6 @@ const { reportDockerDriverGatewayStartFailure } = require("./onboard/docker-driv const dockerDriverGatewayEnv: typeof import("./onboard/docker-driver-gateway-env") = require("./onboard/docker-driver-gateway-env"); const { getDockerDriverGatewayEndpoint } = dockerDriverGatewayEnv; -const dockerDriverGatewayService: typeof import("./onboard/docker-driver-gateway-service") = require("./onboard/docker-driver-gateway-service"); const dockerDriverGatewayRuntimeMarker: typeof import("./onboard/docker-driver-gateway-runtime-marker") = require("./onboard/docker-driver-gateway-runtime-marker"); const hostGatewayProcess: typeof import("./onboard/host-gateway-process") = @@ -2422,12 +2421,9 @@ async function startDockerDriverGateway({ exitOnFailure = true, skipSandboxBridg const driftGatewayBin = runtimeIdentity?.driftGatewayBin ?? gatewayBin; const driftGatewayEnv = runtimeIdentity?.desiredEnv ?? gatewayEnv; const identityGatewayBin = runtimeIdentity?.identityGatewayBin ?? gatewayBin; - const { verifySandboxBridgeGatewayReachableOrExit } = - require("./onboard/gateway-sandbox-reachability") as typeof import("./onboard/gateway-sandbox-reachability"); + const { verifySandboxBridgeGatewayReachableOrExit } = require("./onboard/gateway-sandbox-reachability") as typeof import("./onboard/gateway-sandbox-reachability"); - if (await dockerDriverGatewayService.startPackageManagedDockerDriverGateway({ clearDockerDriverGatewayRuntimeFiles, exitOnFailure, gatewayName: GATEWAY_NAME, registerDockerDriverGatewayEndpoint, runCaptureOpenshell, skipSandboxBridgeReachability, verifySandboxBridgeGatewayReachableOrExit })) { - return; - } + if (await dockerDriverGatewayEnv.startPackageManagedDockerDriverGateway({ clearDockerDriverGatewayRuntimeFiles, exitOnFailure, gatewayName: GATEWAY_NAME, registerDockerDriverGatewayEndpoint, runCaptureOpenshell, skipSandboxBridgeReachability, verifySandboxBridgeGatewayReachableOrExit })) return; const gatewayStatus = runCaptureOpenshell(["status"], { ignoreError: true }); const gwInfo = runCaptureOpenshell(["gateway", "info", "-g", GATEWAY_NAME], { diff --git a/src/lib/onboard/docker-driver-gateway-env.ts b/src/lib/onboard/docker-driver-gateway-env.ts index a9036afdf0..c6f2442496 100644 --- a/src/lib/onboard/docker-driver-gateway-env.ts +++ b/src/lib/onboard/docker-driver-gateway-env.ts @@ -16,6 +16,7 @@ import { GATEWAY_PORT } from "../core/ports"; import { hasOpenShellGatewayUserService } from "./docker-driver-gateway-service"; export { getGatewayHttpsEndpoint }; +export { startPackageManagedDockerDriverGateway } from "./docker-driver-gateway-service"; export const DOCKER_DRIVER_GATEWAY_RUNTIME_ENV_KEYS = [ "OPENSHELL_DRIVERS", From 098007d653201720acea8b9f796c267c9feff776 Mon Sep 17 00:00:00 2001 From: Aaron Erickson Date: Sun, 31 May 2026 08:51:38 -0700 Subject: [PATCH 4/9] fix(installer): verify upstream OpenShell package installer Signed-off-by: Aaron Erickson --- scripts/check-installer-hash.sh | 7 + scripts/install-openshell.sh | 52 ++++++-- test/install-openshell-version-check.test.ts | 132 ++++++++++++++++++- 3 files changed, 181 insertions(+), 10 deletions(-) diff --git a/scripts/check-installer-hash.sh b/scripts/check-installer-hash.sh index 362c086c4b..e611e38953 100755 --- a/scripts/check-installer-hash.sh +++ b/scripts/check-installer-hash.sh @@ -7,6 +7,8 @@ # # Checked installers: # 1. Ollama installer — scripts/install.sh (OLLAMA_INSTALL_SHA256) +# 2. OpenShell installer — scripts/install-openshell.sh +# (OPENSHELL_INSTALLER_SHA256_0_0_44) # # Usage: # scripts/check-installer-hash.sh # exit 0 if current, 1 if stale @@ -78,6 +80,11 @@ register "Ollama installer" \ "OLLAMA_INSTALL_SHA256" \ "https://ollama.com/install.sh" +register "OpenShell v0.0.44 installer" \ + "${REPO_ROOT}/scripts/install-openshell.sh" \ + "OPENSHELL_INSTALLER_SHA256_0_0_44" \ + "https://raw.githubusercontent.com/NVIDIA/OpenShell/v0.0.44/install.sh" + # --------------------------------------------------------------------------- # Main # --------------------------------------------------------------------------- diff --git a/scripts/install-openshell.sh b/scripts/install-openshell.sh index cebdda6360..e88a1a6aef 100755 --- a/scripts/install-openshell.sh +++ b/scripts/install-openshell.sh @@ -45,6 +45,7 @@ MAX_VERSION="0.0.44" # (see #3404). The hardcoded value is the fallback for offline runs. PIN_VERSION="$MAX_VERSION" DEV_MIN_VERSION="0.0.44" +OPENSHELL_INSTALLER_SHA256_0_0_44="fabb30f4ad7af2b14e4994420ba10ecbf4a195236166199abe90daeb671c6d70" CHANNEL="${NEMOCLAW_OPENSHELL_CHANNEL:-auto}" case "$CHANNEL" in @@ -307,6 +308,34 @@ info "Installing OpenShell from release '$RELEASE_TAG'..." tmpdir="$(mktemp -d)" trap 'rm -rf "$tmpdir"' EXIT +file_sha256() { + local file="$1" + if command -v sha256sum >/dev/null 2>&1; then + sha256sum "$file" | awk '{print $1}' + elif command -v shasum >/dev/null 2>&1; then + shasum -a 256 "$file" | awk '{print $1}' + else + return 1 + fi +} + +verify_file_sha256() { + local file="$1" expected="$2" label="$3" actual + if ! actual="$(file_sha256 "$file")"; then + fail "No SHA-256 tool available (sha256sum/shasum)" + fi + if [ "$actual" != "$expected" ]; then + fail "$label checksum verification failed. Expected $expected but got $actual." + fi +} + +openshell_installer_sha256() { + case "$RELEASE_TAG" in + v0.0.44) printf '%s\n' "$OPENSHELL_INSTALLER_SHA256_0_0_44" ;; + *) return 1 ;; + esac +} + install_with_upstream_package_service() { [ "$OS" = "Linux" ] || return 1 [ "$RESOLVED_CHANNEL" != "dev" ] || return 1 @@ -315,6 +344,7 @@ install_with_upstream_package_service() { local installer="$tmpdir/openshell-install.sh" local installer_url="https://raw.githubusercontent.com/NVIDIA/OpenShell/${RELEASE_TAG}/install.sh" + local expected_installer_sha="" local installer_status=0 local installed_bin="" local feature_status=0 @@ -324,23 +354,31 @@ install_with_upstream_package_service() { breaking_ack=1 fi + if ! expected_installer_sha="$(openshell_installer_sha256)"; then + warn "no pinned checksum for upstream OpenShell package installer ${RELEASE_TAG} — falling back to standalone binaries" + return 1 + fi + info "Installing OpenShell ${RELEASE_TAG} with the upstream package installer..." - if ! curl -fLsS --retry 3 --max-redirs 5 -o "$installer" "$installer_url"; then + if ! curl --proto '=https' --tlsv1.2 -fLsS --retry 3 --retry-delay 1 --retry-all-errors --max-redirs 5 -o "$installer" "$installer_url"; then warn "upstream package installer could not be downloaded — falling back to standalone binaries" return 1 fi + verify_file_sha256 "$installer" "$expected_installer_sha" "upstream OpenShell package installer" chmod 755 "$installer" OPENSHELL_VERSION="$RELEASE_TAG" \ OPENSHELL_ACK_BREAKING_UPGRADE="$breaking_ack" \ sh "$installer" || installer_status=$? + if [ "$installer_status" != "0" ]; then + warn "upstream package installer failed (exit ${installer_status}) — falling back to standalone binaries" + return 1 + fi + installed_bin="$(command -v openshell 2>/dev/null || true)" if [ -n "$installed_bin" ] && required_driver_bins_present && openshell_gateway_user_service_present; then if openshell_has_required_messaging_features "$installed_bin"; then - if [ "$installer_status" != "0" ]; then - warn "upstream installer returned exit ${installer_status} after installing binaries and the gateway user service; NemoClaw will restart the service during onboarding" - fi info "$("$installed_bin" --version 2>&1 || echo openshell) installed with upstream package/service support" return 0 else @@ -352,11 +390,7 @@ install_with_upstream_package_service() { fi fi - if [ "$installer_status" != "0" ]; then - warn "upstream package installer failed (exit ${installer_status}) — falling back to standalone binaries" - else - warn "upstream package installer did not provide required Docker-driver binaries and user service — falling back to standalone binaries" - fi + warn "upstream package installer did not provide required Docker-driver binaries and user service — falling back to standalone binaries" return 1 } diff --git a/test/install-openshell-version-check.test.ts b/test/install-openshell-version-check.test.ts index 1d32121edb..14afa9ae85 100644 --- a/test/install-openshell-version-check.test.ts +++ b/test/install-openshell-version-check.test.ts @@ -428,6 +428,7 @@ exit 0`, env: { ...process.env, HOME: tmp, + NEMOCLAW_OPENSHELL_STANDALONE_INSTALL: "1", NEMOCLAW_OPENSHELL_CHANNEL: "stable", PATH: `${fakeBin}:${activeBin}:/usr/bin:/bin`, }, @@ -546,6 +547,12 @@ printf '[Service]\\n' > "$HOME/.config/systemd/user/openshell-gateway.service" chmod 755 "$NEMOCLAW_FAKE_INSTALL_BIN/openshell" "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-gateway" "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-sandbox" exit "\${NEMOCLAW_FAKE_INSTALL_STATUS:-0}" INSTALLER +exit 0`, + ); + writeExecutable( + path.join(fakeBin, "sha256sum"), + `#!/usr/bin/env bash +printf '%s %s\\n' 'fabb30f4ad7af2b14e4994420ba10ecbf4a195236166199abe90daeb671c6d70' "\${1:-}" exit 0`, ); @@ -554,7 +561,7 @@ exit 0`, ...process.env, HOME: tmp, NEMOCLAW_FAKE_INSTALL_BIN: fakeBin, - NEMOCLAW_FAKE_INSTALL_STATUS: "17", + NEMOCLAW_FAKE_INSTALL_STATUS: "0", NEMOCLAW_OPENSHELL_CHANNEL: "stable", PATH: `${fakeBin}:/usr/bin:/bin`, }, @@ -570,6 +577,129 @@ exit 0`, } }); + it("fails closed when the upstream Linux package installer checksum mismatches", () => { + const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-package-hash-")); + try { + const fakeBin = path.join(tmp, "bin"); + fs.mkdirSync(fakeBin); + + writeExecutable( + path.join(fakeBin, "uname"), + `#!/usr/bin/env bash +if [ "\${1:-}" = "-m" ]; then echo "x86_64"; else echo "Linux"; fi`, + ); + writeExecutable( + path.join(fakeBin, "curl"), + `#!/usr/bin/env bash +out="" +while [ "$#" -gt 0 ]; do + if [ "$1" = "-o" ]; then + shift + out="$1" + fi + shift || true +done +[ -n "$out" ] || exit 1 +printf '%s\\n' '#!/usr/bin/env sh' 'exit 0' > "$out" +exit 0`, + ); + writeExecutable( + path.join(fakeBin, "sha256sum"), + `#!/usr/bin/env bash +printf '%s %s\\n' '0000000000000000000000000000000000000000000000000000000000000000' "\${1:-}" +exit 0`, + ); + + const result = spawnSync("bash", [SCRIPT], { + env: { + ...process.env, + HOME: tmp, + NEMOCLAW_OPENSHELL_CHANNEL: "stable", + PATH: `${fakeBin}:/usr/bin:/bin`, + }, + encoding: "utf8", + }); + + expect(result.status, `${result.stdout}\n${result.stderr}`).toBe(1); + expect(result.stderr).toMatch(/upstream OpenShell package installer checksum verification failed/); + expect(result.stdout).not.toMatch(/Downloading OpenShell release assets/); + } finally { + fs.rmSync(tmp, { recursive: true, force: true }); + } + }); + + it("falls back when the upstream Linux package installer exits nonzero", () => { + const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-package-nonzero-")); + try { + const fakeBin = path.join(tmp, "bin"); + fs.mkdirSync(fakeBin); + + writeExecutable( + path.join(fakeBin, "uname"), + `#!/usr/bin/env bash +if [ "\${1:-}" = "-m" ]; then echo "x86_64"; else echo "Linux"; fi`, + ); + writeExecutable( + path.join(fakeBin, "curl"), + `#!/usr/bin/env bash +out="" +while [ "$#" -gt 0 ]; do + if [ "$1" = "-o" ]; then + shift + out="$1" + fi + shift || true +done +[ -n "$out" ] || exit 1 +cat > "$out" <<'INSTALLER' +#!/usr/bin/env sh +mkdir -p "$NEMOCLAW_FAKE_INSTALL_BIN" "$HOME/.config/systemd/user" +cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell" <<'BIN' +#!/usr/bin/env sh +if [ "\${1:-}" = "--version" ]; then echo "openshell 0.0.44"; exit 0; fi +# request-body-credential-rewrite websocket-credential-rewrite +exit 0 +BIN +cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-gateway" <<'BIN' +#!/usr/bin/env sh +exit 0 +BIN +cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-sandbox" <<'BIN' +#!/usr/bin/env sh +exit 0 +BIN +printf '[Service]\\n' > "$HOME/.config/systemd/user/openshell-gateway.service" +chmod 755 "$NEMOCLAW_FAKE_INSTALL_BIN/openshell" "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-gateway" "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-sandbox" +exit 17 +INSTALLER +exit 0`, + ); + writeExecutable( + path.join(fakeBin, "sha256sum"), + `#!/usr/bin/env bash +printf '%s %s\\n' 'fabb30f4ad7af2b14e4994420ba10ecbf4a195236166199abe90daeb671c6d70' "\${1:-}" +exit 0`, + ); + + const result = spawnSync("bash", [SCRIPT], { + env: { + ...process.env, + HOME: tmp, + NEMOCLAW_FAKE_INSTALL_BIN: fakeBin, + NEMOCLAW_OPENSHELL_CHANNEL: "stable", + PATH: `${fakeBin}:/usr/bin:/bin`, + }, + encoding: "utf8", + }); + + expect(result.status, `${result.stdout}\n${result.stderr}`).not.toBe(0); + expect(result.stdout).toMatch(/upstream package installer failed \(exit 17\).*falling back to standalone binaries/); + expect(result.stdout).not.toMatch(/installed with upstream package\/service support/); + } finally { + fs.rmSync(tmp, { recursive: true, force: true }); + } + }); + it("proceeds to install when openshell is not present", () => { const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-noop-")); try { From c24aa4e2f798b44cedc7665ffc5750b610d2e28a Mon Sep 17 00:00:00 2001 From: Aaron Erickson Date: Sun, 31 May 2026 09:29:42 -0700 Subject: [PATCH 5/9] refactor(onboard): scope OpenShell gateway service handoff --- docs/reference/commands.mdx | 3 +- scripts/check-installer-hash.sh | 7 - scripts/install-openshell.sh | 114 +-------- .../onboard/docker-driver-gateway-service.ts | 2 +- test/install-openshell-version-check.test.ts | 228 +----------------- 5 files changed, 7 insertions(+), 347 deletions(-) diff --git a/docs/reference/commands.mdx b/docs/reference/commands.mdx index 3afc173488..faa4deae98 100644 --- a/docs/reference/commands.mdx +++ b/docs/reference/commands.mdx @@ -1419,9 +1419,8 @@ These flags toggle optional behaviors during onboarding; set them before running | `NEMOCLAW_SANDBOX_GPU_DEVICE` | OpenShell GPU device selector | Selects the GPU device passed with `openshell sandbox create --gpu-device`. Requires explicit sandbox GPU enablement with `NEMOCLAW_SANDBOX_GPU=1` (or `--sandbox-gpu` for CLI-driven onboarding); otherwise onboarding rejects the selector instead of treating it as an implicit opt-in. | | `NEMOCLAW_DOCKER_GPU_PATCH` | `0` to disable, anything else to keep the default | Controls the Linux Docker-driver GPU sandbox compatibility patch. Set to `0` only as an escape hatch when the patch fails and you need onboarding to continue without patching the GPU sandbox container. | | `NEMOCLAW_OPENSHELL_GATEWAY_BIN` | path | Advanced override for the `openshell-gateway` binary used by the Linux Docker-driver standalone fallback. Defaults to the binary next to `openshell`, then common install paths. | -| `NEMOCLAW_OPENSHELL_SANDBOX_BIN` | path | Advanced override for the `openshell-sandbox` binary passed to the Linux Docker-driver gateway supervisor. Defaults to the binary next to `openshell`, then common install paths. | +| `NEMOCLAW_OPENSHELL_SANDBOX_BIN` | path | Advanced override for the `openshell-sandbox` binary passed to the Linux Docker-driver standalone fallback. Defaults to the binary next to `openshell`, then common install paths. | | `NEMOCLAW_OPENSHELL_GATEWAY_STATE_DIR` | path | Advanced override for the Linux Docker-driver gateway SQLite state directory and standalone-fallback PID file. Defaults to `~/.local/state/nemoclaw/openshell-docker-gateway`. | -| `NEMOCLAW_OPENSHELL_STANDALONE_INSTALL` | `1` to enable | Advanced fallback that skips the upstream Linux OpenShell package/service installer and installs standalone release binaries instead. | | `NEMOCLAW_WECHAT_QUIET` | `1` to enable | Silences the `[wechat]` diagnostic lines printed during the host-side WeChat QR login (poll status, IDC redirects, swallowed gateway errors), which are visible by default while the experimental WeChat path stabilizes; set `1` once the flow is reliable in your environment. | ### Onboard Profiling Traces diff --git a/scripts/check-installer-hash.sh b/scripts/check-installer-hash.sh index e611e38953..362c086c4b 100755 --- a/scripts/check-installer-hash.sh +++ b/scripts/check-installer-hash.sh @@ -7,8 +7,6 @@ # # Checked installers: # 1. Ollama installer — scripts/install.sh (OLLAMA_INSTALL_SHA256) -# 2. OpenShell installer — scripts/install-openshell.sh -# (OPENSHELL_INSTALLER_SHA256_0_0_44) # # Usage: # scripts/check-installer-hash.sh # exit 0 if current, 1 if stale @@ -80,11 +78,6 @@ register "Ollama installer" \ "OLLAMA_INSTALL_SHA256" \ "https://ollama.com/install.sh" -register "OpenShell v0.0.44 installer" \ - "${REPO_ROOT}/scripts/install-openshell.sh" \ - "OPENSHELL_INSTALLER_SHA256_0_0_44" \ - "https://raw.githubusercontent.com/NVIDIA/OpenShell/v0.0.44/install.sh" - # --------------------------------------------------------------------------- # Main # --------------------------------------------------------------------------- diff --git a/scripts/install-openshell.sh b/scripts/install-openshell.sh index e88a1a6aef..6ba0381679 100755 --- a/scripts/install-openshell.sh +++ b/scripts/install-openshell.sh @@ -45,7 +45,6 @@ MAX_VERSION="0.0.44" # (see #3404). The hardcoded value is the fallback for offline runs. PIN_VERSION="$MAX_VERSION" DEV_MIN_VERSION="0.0.44" -OPENSHELL_INSTALLER_SHA256_0_0_44="fabb30f4ad7af2b14e4994420ba10ecbf4a195236166199abe90daeb671c6d70" CHANNEL="${NEMOCLAW_OPENSHELL_CHANNEL:-auto}" case "$CHANNEL" in @@ -131,21 +130,6 @@ required_driver_bins_present() { esac } -openshell_gateway_user_service_present() { - [ "$OS" = "Linux" ] || return 1 - [ -f "${HOME:-}/.config/systemd/user/openshell-gateway.service" ] || \ - [ -f /etc/systemd/user/openshell-gateway.service ] || \ - [ -f /usr/local/lib/systemd/user/openshell-gateway.service ] || \ - [ -f /usr/lib/systemd/user/openshell-gateway.service ] || \ - [ -f /lib/systemd/user/openshell-gateway.service ] -} - -should_require_openshell_gateway_user_service() { - [ "$OS" = "Linux" ] && \ - [ "$RESOLVED_CHANNEL" != "dev" ] && \ - [ "${NEMOCLAW_OPENSHELL_STANDALONE_INSTALL:-}" != "1" ] -} - OPENSHELL_FEATURE_CHECK_ERROR="" openshell_has_required_messaging_features() { @@ -291,8 +275,6 @@ if command -v openshell >/dev/null 2>&1; then warn "openshell $INSTALLED_VERSION is missing Docker-driver binaries — reinstalling pinned OpenShell ${PIN_VERSION}..." elif ! openshell_has_required_messaging_features; then fail "${OPENSHELL_FEATURE_CHECK_ERROR:-openshell $INSTALLED_VERSION is missing required messaging credential rewrite support. Install an OpenShell build that includes provider aliases, WebSocket text rewrite, and request-body credential rewrite.}" - elif should_require_openshell_gateway_user_service && ! openshell_gateway_user_service_present; then - warn "openshell $INSTALLED_VERSION is missing the package-managed gateway user service — reinstalling pinned OpenShell ${PIN_VERSION}..." else info "openshell already installed: $INSTALLED_VERSION (>= $MIN_VERSION, <= $MAX_VERSION, messaging rewrite capable)" exit 0 @@ -305,99 +287,6 @@ fi info "Installing OpenShell from release '$RELEASE_TAG'..." -tmpdir="$(mktemp -d)" -trap 'rm -rf "$tmpdir"' EXIT - -file_sha256() { - local file="$1" - if command -v sha256sum >/dev/null 2>&1; then - sha256sum "$file" | awk '{print $1}' - elif command -v shasum >/dev/null 2>&1; then - shasum -a 256 "$file" | awk '{print $1}' - else - return 1 - fi -} - -verify_file_sha256() { - local file="$1" expected="$2" label="$3" actual - if ! actual="$(file_sha256 "$file")"; then - fail "No SHA-256 tool available (sha256sum/shasum)" - fi - if [ "$actual" != "$expected" ]; then - fail "$label checksum verification failed. Expected $expected but got $actual." - fi -} - -openshell_installer_sha256() { - case "$RELEASE_TAG" in - v0.0.44) printf '%s\n' "$OPENSHELL_INSTALLER_SHA256_0_0_44" ;; - *) return 1 ;; - esac -} - -install_with_upstream_package_service() { - [ "$OS" = "Linux" ] || return 1 - [ "$RESOLVED_CHANNEL" != "dev" ] || return 1 - [ "${NEMOCLAW_OPENSHELL_STANDALONE_INSTALL:-}" != "1" ] || return 1 - command -v curl >/dev/null 2>&1 || return 1 - - local installer="$tmpdir/openshell-install.sh" - local installer_url="https://raw.githubusercontent.com/NVIDIA/OpenShell/${RELEASE_TAG}/install.sh" - local expected_installer_sha="" - local installer_status=0 - local installed_bin="" - local feature_status=0 - local breaking_ack="${OPENSHELL_ACK_BREAKING_UPGRADE:-}" - - if [ -z "$breaking_ack" ] && [ -n "${NEMOCLAW_OPENSHELL_UPGRADE_PREPARED:-}" ]; then - breaking_ack=1 - fi - - if ! expected_installer_sha="$(openshell_installer_sha256)"; then - warn "no pinned checksum for upstream OpenShell package installer ${RELEASE_TAG} — falling back to standalone binaries" - return 1 - fi - - info "Installing OpenShell ${RELEASE_TAG} with the upstream package installer..." - if ! curl --proto '=https' --tlsv1.2 -fLsS --retry 3 --retry-delay 1 --retry-all-errors --max-redirs 5 -o "$installer" "$installer_url"; then - warn "upstream package installer could not be downloaded — falling back to standalone binaries" - return 1 - fi - verify_file_sha256 "$installer" "$expected_installer_sha" "upstream OpenShell package installer" - chmod 755 "$installer" - - OPENSHELL_VERSION="$RELEASE_TAG" \ - OPENSHELL_ACK_BREAKING_UPGRADE="$breaking_ack" \ - sh "$installer" || installer_status=$? - - if [ "$installer_status" != "0" ]; then - warn "upstream package installer failed (exit ${installer_status}) — falling back to standalone binaries" - return 1 - fi - - installed_bin="$(command -v openshell 2>/dev/null || true)" - if [ -n "$installed_bin" ] && required_driver_bins_present && openshell_gateway_user_service_present; then - if openshell_has_required_messaging_features "$installed_bin"; then - info "$("$installed_bin" --version 2>&1 || echo openshell) installed with upstream package/service support" - return 0 - else - feature_status=$? - if [ "$feature_status" = "2" ]; then - fail "$OPENSHELL_FEATURE_CHECK_ERROR" - fi - warn "${OPENSHELL_FEATURE_CHECK_ERROR:-upstream package install did not provide the required OpenShell messaging features}" - fi - fi - - warn "upstream package installer did not provide required Docker-driver binaries and user service — falling back to standalone binaries" - return 1 -} - -if install_with_upstream_package_service; then - exit 0 -fi - case "$OS" in Darwin) case "$ARCH_LABEL" in @@ -443,6 +332,9 @@ case "$OS" in ;; esac +tmpdir="$(mktemp -d)" +trap 'rm -rf "$tmpdir"' EXIT + download_with_curl() { local name local -a curl_progress diff --git a/src/lib/onboard/docker-driver-gateway-service.ts b/src/lib/onboard/docker-driver-gateway-service.ts index 363d36495b..dd49de012a 100644 --- a/src/lib/onboard/docker-driver-gateway-service.ts +++ b/src/lib/onboard/docker-driver-gateway-service.ts @@ -188,7 +188,6 @@ export async function startPackageManagedDockerDriverGateway({ throw new Error(message); } - clearDockerDriverGatewayRuntimeFiles(); const pollCount = envInt("NEMOCLAW_HEALTH_POLL_COUNT", 30); const pollInterval = envInt("NEMOCLAW_HEALTH_POLL_INTERVAL", 2); for (let i = 0; i < pollCount; i += 1) { @@ -202,6 +201,7 @@ export async function startPackageManagedDockerDriverGateway({ }); const currentInfo = runCaptureOpenshell(["gateway", "info"], { ignoreError: true }); if (isGatewayHealthy(status, namedInfo, currentInfo) && (await isGatewayTcpReady())) { + clearDockerDriverGatewayRuntimeFiles(); await verifySandboxBridgeGatewayReachableOrExit(exitOnFailure, { skip: skipSandboxBridgeReachability, }); diff --git a/test/install-openshell-version-check.test.ts b/test/install-openshell-version-check.test.ts index 14afa9ae85..59a329681c 100644 --- a/test/install-openshell-version-check.test.ts +++ b/test/install-openshell-version-check.test.ts @@ -26,13 +26,11 @@ function runWithInstalledVersion( options: { capability?: boolean; driverBins?: boolean | "gateway" | "gateway-vm"; - gatewayService?: boolean; os?: string; arch?: string; } = {}, ) { const capability = options.capability ?? true; - const hostOs = options.os ?? "Linux"; const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-ver-")); try { const fakeBin = path.join(tmp, "bin"); @@ -41,7 +39,7 @@ function runWithInstalledVersion( writeExecutable( path.join(fakeBin, "uname"), `#!/usr/bin/env bash -if [ "\${1:-}" = "-m" ]; then echo "${options.arch ?? "x86_64"}"; else echo "${hostOs}"; fi`, +if [ "\${1:-}" = "-m" ]; then echo "${options.arch ?? "x86_64"}"; else echo "${options.os ?? "Linux"}"; fi`, ); // Fake openshell that reports the given version @@ -90,13 +88,7 @@ exit 1`, exit 1`, ); - if (hostOs === "Linux" && options.gatewayService !== false) { - const serviceDir = path.join(tmp, ".config", "systemd", "user"); - fs.mkdirSync(serviceDir, { recursive: true }); - fs.writeFileSync(path.join(serviceDir, "openshell-gateway.service"), "[Service]\n"); - } - - if (hostOs === "Darwin") { + if ((options.os ?? "Linux") === "Darwin") { writeExecutable( path.join(fakeBin, "codesign"), `#!/usr/bin/env bash @@ -120,7 +112,6 @@ exit 0`, return spawnSync("bash", [SCRIPT], { env: { ...process.env, - HOME: tmp, NEMOCLAW_OPENSHELL_CHANNEL: "stable", ...extraEnv, PATH: `${fakeBin}:/usr/bin:/bin`, @@ -139,23 +130,6 @@ describe("install-openshell.sh version check", { timeout: 15_000 }, () => { expect(result.stdout).toMatch(/already installed.*0\.0\.44/); }); - it("reinstalls Linux stable OpenShell when the package-managed gateway user service is missing", () => { - const result = runWithInstalledVersion("0.0.44", {}, { gatewayService: false }); - expect(result.status).not.toBe(0); - expect(result.stdout).toMatch(/missing the package-managed gateway user service/); - expect(result.stdout).toMatch(/Installing OpenShell from release 'v0\.0\.44'/); - }); - - it("allows standalone Linux installs as an explicit fallback", () => { - const result = runWithInstalledVersion( - "0.0.44", - { NEMOCLAW_OPENSHELL_STANDALONE_INSTALL: "1" }, - { gatewayService: false }, - ); - expect(result.status).toBe(0); - expect(result.stdout).toMatch(/already installed.*0\.0\.44/); - }); - it("triggers reinstall when openshell 0.0.44 is missing Docker-driver binaries", () => { const result = runWithInstalledVersion("0.0.44", {}, { driverBins: false, os: "Linux" }); expect(result.status).not.toBe(0); @@ -428,7 +402,6 @@ exit 0`, env: { ...process.env, HOME: tmp, - NEMOCLAW_OPENSHELL_STANDALONE_INSTALL: "1", NEMOCLAW_OPENSHELL_CHANNEL: "stable", PATH: `${fakeBin}:${activeBin}:/usr/bin:/bin`, }, @@ -503,203 +476,6 @@ exit 0`, expect(result.stdout).toMatch(/required dev-channel messaging-rewrite build/); }); - it("uses the upstream Linux package installer by default", () => { - const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-package-")); - try { - const fakeBin = path.join(tmp, "bin"); - fs.mkdirSync(fakeBin); - - writeExecutable( - path.join(fakeBin, "uname"), - `#!/usr/bin/env bash -if [ "\${1:-}" = "-m" ]; then echo "x86_64"; else echo "Linux"; fi`, - ); - writeExecutable( - path.join(fakeBin, "curl"), - `#!/usr/bin/env bash -out="" -while [ "$#" -gt 0 ]; do - if [ "$1" = "-o" ]; then - shift - out="$1" - fi - shift || true -done -[ -n "$out" ] || exit 1 -cat > "$out" <<'INSTALLER' -#!/usr/bin/env sh -mkdir -p "$NEMOCLAW_FAKE_INSTALL_BIN" "$HOME/.config/systemd/user" -cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell" <<'BIN' -#!/usr/bin/env sh -if [ "\${1:-}" = "--version" ]; then echo "openshell 0.0.44"; exit 0; fi -# request-body-credential-rewrite websocket-credential-rewrite -exit 0 -BIN -cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-gateway" <<'BIN' -#!/usr/bin/env sh -exit 0 -BIN -cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-sandbox" <<'BIN' -#!/usr/bin/env sh -exit 0 -BIN -printf '[Service]\\n' > "$HOME/.config/systemd/user/openshell-gateway.service" -chmod 755 "$NEMOCLAW_FAKE_INSTALL_BIN/openshell" "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-gateway" "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-sandbox" -exit "\${NEMOCLAW_FAKE_INSTALL_STATUS:-0}" -INSTALLER -exit 0`, - ); - writeExecutable( - path.join(fakeBin, "sha256sum"), - `#!/usr/bin/env bash -printf '%s %s\\n' 'fabb30f4ad7af2b14e4994420ba10ecbf4a195236166199abe90daeb671c6d70' "\${1:-}" -exit 0`, - ); - - const result = spawnSync("bash", [SCRIPT], { - env: { - ...process.env, - HOME: tmp, - NEMOCLAW_FAKE_INSTALL_BIN: fakeBin, - NEMOCLAW_FAKE_INSTALL_STATUS: "0", - NEMOCLAW_OPENSHELL_CHANNEL: "stable", - PATH: `${fakeBin}:/usr/bin:/bin`, - }, - encoding: "utf8", - }); - - expect(result.status, `${result.stdout}\n${result.stderr}`).toBe(0); - expect(result.stdout).toMatch(/upstream package installer/); - expect(result.stdout).toMatch(/installed with upstream package\/service support/); - expect(result.stdout).not.toMatch(/Downloading OpenShell release assets/); - } finally { - fs.rmSync(tmp, { recursive: true, force: true }); - } - }); - - it("fails closed when the upstream Linux package installer checksum mismatches", () => { - const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-package-hash-")); - try { - const fakeBin = path.join(tmp, "bin"); - fs.mkdirSync(fakeBin); - - writeExecutable( - path.join(fakeBin, "uname"), - `#!/usr/bin/env bash -if [ "\${1:-}" = "-m" ]; then echo "x86_64"; else echo "Linux"; fi`, - ); - writeExecutable( - path.join(fakeBin, "curl"), - `#!/usr/bin/env bash -out="" -while [ "$#" -gt 0 ]; do - if [ "$1" = "-o" ]; then - shift - out="$1" - fi - shift || true -done -[ -n "$out" ] || exit 1 -printf '%s\\n' '#!/usr/bin/env sh' 'exit 0' > "$out" -exit 0`, - ); - writeExecutable( - path.join(fakeBin, "sha256sum"), - `#!/usr/bin/env bash -printf '%s %s\\n' '0000000000000000000000000000000000000000000000000000000000000000' "\${1:-}" -exit 0`, - ); - - const result = spawnSync("bash", [SCRIPT], { - env: { - ...process.env, - HOME: tmp, - NEMOCLAW_OPENSHELL_CHANNEL: "stable", - PATH: `${fakeBin}:/usr/bin:/bin`, - }, - encoding: "utf8", - }); - - expect(result.status, `${result.stdout}\n${result.stderr}`).toBe(1); - expect(result.stderr).toMatch(/upstream OpenShell package installer checksum verification failed/); - expect(result.stdout).not.toMatch(/Downloading OpenShell release assets/); - } finally { - fs.rmSync(tmp, { recursive: true, force: true }); - } - }); - - it("falls back when the upstream Linux package installer exits nonzero", () => { - const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-package-nonzero-")); - try { - const fakeBin = path.join(tmp, "bin"); - fs.mkdirSync(fakeBin); - - writeExecutable( - path.join(fakeBin, "uname"), - `#!/usr/bin/env bash -if [ "\${1:-}" = "-m" ]; then echo "x86_64"; else echo "Linux"; fi`, - ); - writeExecutable( - path.join(fakeBin, "curl"), - `#!/usr/bin/env bash -out="" -while [ "$#" -gt 0 ]; do - if [ "$1" = "-o" ]; then - shift - out="$1" - fi - shift || true -done -[ -n "$out" ] || exit 1 -cat > "$out" <<'INSTALLER' -#!/usr/bin/env sh -mkdir -p "$NEMOCLAW_FAKE_INSTALL_BIN" "$HOME/.config/systemd/user" -cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell" <<'BIN' -#!/usr/bin/env sh -if [ "\${1:-}" = "--version" ]; then echo "openshell 0.0.44"; exit 0; fi -# request-body-credential-rewrite websocket-credential-rewrite -exit 0 -BIN -cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-gateway" <<'BIN' -#!/usr/bin/env sh -exit 0 -BIN -cat > "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-sandbox" <<'BIN' -#!/usr/bin/env sh -exit 0 -BIN -printf '[Service]\\n' > "$HOME/.config/systemd/user/openshell-gateway.service" -chmod 755 "$NEMOCLAW_FAKE_INSTALL_BIN/openshell" "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-gateway" "$NEMOCLAW_FAKE_INSTALL_BIN/openshell-sandbox" -exit 17 -INSTALLER -exit 0`, - ); - writeExecutable( - path.join(fakeBin, "sha256sum"), - `#!/usr/bin/env bash -printf '%s %s\\n' 'fabb30f4ad7af2b14e4994420ba10ecbf4a195236166199abe90daeb671c6d70' "\${1:-}" -exit 0`, - ); - - const result = spawnSync("bash", [SCRIPT], { - env: { - ...process.env, - HOME: tmp, - NEMOCLAW_FAKE_INSTALL_BIN: fakeBin, - NEMOCLAW_OPENSHELL_CHANNEL: "stable", - PATH: `${fakeBin}:/usr/bin:/bin`, - }, - encoding: "utf8", - }); - - expect(result.status, `${result.stdout}\n${result.stderr}`).not.toBe(0); - expect(result.stdout).toMatch(/upstream package installer failed \(exit 17\).*falling back to standalone binaries/); - expect(result.stdout).not.toMatch(/installed with upstream package\/service support/); - } finally { - fs.rmSync(tmp, { recursive: true, force: true }); - } - }); - it("proceeds to install when openshell is not present", () => { const tmp = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-openshell-noop-")); try { From 3f68c539819164ab6324c26c69a524181c2c3451 Mon Sep 17 00:00:00 2001 From: Aaron Erickson Date: Sun, 31 May 2026 09:33:13 -0700 Subject: [PATCH 6/9] test(e2e): fix OpenShell version pin checksum fixture --- test/e2e/test-openshell-version-pin.sh | 68 +++++++++++++++++++++++--- 1 file changed, 60 insertions(+), 8 deletions(-) diff --git a/test/e2e/test-openshell-version-pin.sh b/test/e2e/test-openshell-version-pin.sh index 5fe7496fdb..dd4132ab4e 100755 --- a/test/e2e/test-openshell-version-pin.sh +++ b/test/e2e/test-openshell-version-pin.sh @@ -93,6 +93,29 @@ SH write_executable "$FAKE_BIN/gh" <<'SH' #!/usr/bin/env bash set -euo pipefail +write_asset() { + local asset_name="$1" + local asset_path="$2" + printf 'fake OpenShell release asset: %s\n' "$asset_name" >"$asset_path" +} +sha256_digest() { + if [ -x /usr/bin/sha256sum ]; then + /usr/bin/sha256sum "$1" | awk '{print $1}' + elif [ -x /bin/sha256sum ]; then + /bin/sha256sum "$1" | awk '{print $1}' + elif [ -x /usr/bin/shasum ]; then + /usr/bin/shasum -a 256 "$1" | awk '{print $1}' + else + exit 3 + fi +} +write_checksum() { + local checksum_file="$1" + local asset_name="$2" + local asset_path="$3" + [ -f "$asset_path" ] || write_asset "$asset_name" "$asset_path" + printf '%s %s\n' "$(sha256_digest "$asset_path")" "$asset_name" >"$checksum_file" +} if [ "${1:-}" = "release" ] && [ "${2:-}" = "download" ]; then tag="${3:-}" pattern="" @@ -109,16 +132,19 @@ if [ "${1:-}" = "release" ] && [ "${2:-}" = "download" ]; then mkdir -p "$dir" case "$pattern" in openshell-checksums-sha256.txt) - printf 'ignored openshell-x86_64-unknown-linux-musl.tar.gz\n' > "$dir/$pattern" + asset_name="openshell-x86_64-unknown-linux-musl.tar.gz" + write_checksum "$dir/$pattern" "$asset_name" "$dir/$asset_name" ;; openshell-gateway-checksums-sha256.txt) - printf 'ignored openshell-gateway-x86_64-unknown-linux-gnu.tar.gz\n' > "$dir/$pattern" + asset_name="openshell-gateway-x86_64-unknown-linux-gnu.tar.gz" + write_checksum "$dir/$pattern" "$asset_name" "$dir/$asset_name" ;; openshell-sandbox-checksums-sha256.txt) - printf 'ignored openshell-sandbox-x86_64-unknown-linux-gnu.tar.gz\n' > "$dir/$pattern" + asset_name="openshell-sandbox-x86_64-unknown-linux-gnu.tar.gz" + write_checksum "$dir/$pattern" "$asset_name" "$dir/$asset_name" ;; *) - : > "$dir/$pattern" + write_asset "$pattern" "$dir/$pattern" ;; esac exit 0 @@ -129,6 +155,29 @@ SH write_executable "$FAKE_BIN/curl" <<'SH' #!/usr/bin/env bash set -euo pipefail +write_asset() { + local asset_name="$1" + local asset_path="$2" + printf 'fake OpenShell release asset: %s\n' "$asset_name" >"$asset_path" +} +sha256_digest() { + if [ -x /usr/bin/sha256sum ]; then + /usr/bin/sha256sum "$1" | awk '{print $1}' + elif [ -x /bin/sha256sum ]; then + /bin/sha256sum "$1" | awk '{print $1}' + elif [ -x /usr/bin/shasum ]; then + /usr/bin/shasum -a 256 "$1" | awk '{print $1}' + else + exit 3 + fi +} +write_checksum() { + local checksum_file="$1" + local asset_name="$2" + local asset_path="$3" + [ -f "$asset_path" ] || write_asset "$asset_name" "$asset_path" + printf '%s %s\n' "$(sha256_digest "$asset_path")" "$asset_name" >"$checksum_file" +} printf 'curl %s\n' "$*" >> "${DOWNLOAD_LOG:?}" out="" while [ "$#" -gt 0 ]; do @@ -141,16 +190,19 @@ done [ -n "$out" ] || exit 0 case "$(basename "$out")" in openshell-checksums-sha256.txt) - printf 'ignored openshell-x86_64-unknown-linux-musl.tar.gz\n' > "$out" + asset_name="openshell-x86_64-unknown-linux-musl.tar.gz" + write_checksum "$out" "$asset_name" "$(dirname "$out")/$asset_name" ;; openshell-gateway-checksums-sha256.txt) - printf 'ignored openshell-gateway-x86_64-unknown-linux-gnu.tar.gz\n' > "$out" + asset_name="openshell-gateway-x86_64-unknown-linux-gnu.tar.gz" + write_checksum "$out" "$asset_name" "$(dirname "$out")/$asset_name" ;; openshell-sandbox-checksums-sha256.txt) - printf 'ignored openshell-sandbox-x86_64-unknown-linux-gnu.tar.gz\n' > "$out" + asset_name="openshell-sandbox-x86_64-unknown-linux-gnu.tar.gz" + write_checksum "$out" "$asset_name" "$(dirname "$out")/$asset_name" ;; *) - : > "$out" + write_asset "$(basename "$out")" "$out" ;; esac SH From ce9cb666875dc387c8e55ef3c40037aefed04535 Mon Sep 17 00:00:00 2001 From: Aaron Erickson Date: Sun, 31 May 2026 09:42:17 -0700 Subject: [PATCH 7/9] fix(onboard): trust package OpenShell gateway units --- .../docker-driver-gateway-service.test.ts | 21 +++++++++++++++---- .../onboard/docker-driver-gateway-service.ts | 13 ++++-------- 2 files changed, 21 insertions(+), 13 deletions(-) diff --git a/src/lib/onboard/docker-driver-gateway-service.test.ts b/src/lib/onboard/docker-driver-gateway-service.test.ts index dce7895fa9..2b833a23a2 100644 --- a/src/lib/onboard/docker-driver-gateway-service.test.ts +++ b/src/lib/onboard/docker-driver-gateway-service.test.ts @@ -21,13 +21,26 @@ function spawnResult(status = 0, stderr = ""): SpawnSyncLikeResult { describe("docker-driver-gateway-service", () => { it("detects the upstream OpenShell user service only on Linux", () => { - const homeDir = "/home/nvidia"; const existsSync = (candidate: string) => candidate === "/usr/lib/systemd/user/openshell-gateway.service"; - expect(hasOpenShellGatewayUserService({ existsSync, homeDir, platform: "linux" })).toBe(true); - expect(hasOpenShellGatewayUserService({ existsSync, homeDir, platform: "darwin" })).toBe(false); - expect(getOpenShellGatewayUserServicePaths(homeDir)).toContain( + expect(hasOpenShellGatewayUserService({ existsSync, platform: "linux" })).toBe(true); + expect(hasOpenShellGatewayUserService({ existsSync, platform: "darwin" })).toBe(false); + expect(getOpenShellGatewayUserServicePaths()).toEqual([ + "/usr/local/lib/systemd/user/openshell-gateway.service", + "/usr/lib/systemd/user/openshell-gateway.service", + "/lib/systemd/user/openshell-gateway.service", + ]); + }); + + it("ignores stale per-user service units so standalone fallback remains available", () => { + const existsSync = vi.fn( + (candidate: string) => + candidate === "/home/nvidia/.config/systemd/user/openshell-gateway.service", + ); + + expect(hasOpenShellGatewayUserService({ existsSync, platform: "linux" })).toBe(false); + expect(existsSync.mock.calls.flat()).not.toContain( "/home/nvidia/.config/systemd/user/openshell-gateway.service", ); }); diff --git a/src/lib/onboard/docker-driver-gateway-service.ts b/src/lib/onboard/docker-driver-gateway-service.ts index dd49de012a..f6ffa7c673 100644 --- a/src/lib/onboard/docker-driver-gateway-service.ts +++ b/src/lib/onboard/docker-driver-gateway-service.ts @@ -3,8 +3,6 @@ import { spawnSync, type SpawnSyncOptions } from "node:child_process"; import fs from "node:fs"; -import os from "node:os"; -import path from "node:path"; import { sleepSeconds } from "../core/wait"; import { isGatewayHealthy } from "../state/gateway"; @@ -17,7 +15,6 @@ export interface OpenShellGatewayUserServiceOptions { commandExists?: (command: string) => boolean; env?: NodeJS.ProcessEnv; existsSync?: (filePath: string) => boolean; - homeDir?: string; platform?: NodeJS.Platform; spawnSyncImpl?: SpawnSyncLike; } @@ -55,10 +52,8 @@ export interface PackageManagedDockerDriverGatewayOptions { ) => Promise; } -export function getOpenShellGatewayUserServicePaths(homeDir = os.homedir()): string[] { +export function getOpenShellGatewayUserServicePaths(): string[] { return [ - path.join(homeDir, ".config", "systemd", "user", "openshell-gateway.service"), - "/etc/systemd/user/openshell-gateway.service", "/usr/local/lib/systemd/user/openshell-gateway.service", "/usr/lib/systemd/user/openshell-gateway.service", "/lib/systemd/user/openshell-gateway.service", @@ -66,11 +61,11 @@ export function getOpenShellGatewayUserServicePaths(homeDir = os.homedir()): str } export function hasOpenShellGatewayUserService( - opts: Pick = {}, + opts: Pick = {}, ): boolean { if ((opts.platform ?? process.platform) !== "linux") return false; const existsSync = opts.existsSync ?? fs.existsSync; - return getOpenShellGatewayUserServicePaths(opts.homeDir).some((candidate) => existsSync(candidate)); + return getOpenShellGatewayUserServicePaths().some((candidate) => existsSync(candidate)); } function defaultCommandExists(command: string, env: NodeJS.ProcessEnv): boolean { @@ -121,7 +116,7 @@ export function startOpenShellGatewayUserService( return { attempted: false, fallbackAllowed: true, started: false, reason: "not a Linux host" }; } const existsSync = opts.existsSync ?? fs.existsSync; - if (!hasOpenShellGatewayUserService({ existsSync, homeDir: opts.homeDir, platform })) { + if (!hasOpenShellGatewayUserService({ existsSync, platform })) { return { attempted: false, fallbackAllowed: true, From 59b38da051a840236ef0d5de60bd900f64ac4008 Mon Sep 17 00:00:00 2001 From: Aaron Erickson Date: Sun, 31 May 2026 10:09:51 -0700 Subject: [PATCH 8/9] docs(onboard): address OpenShell sandbox override wording --- docs/reference/commands.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/commands.mdx b/docs/reference/commands.mdx index faa4deae98..40aabc4f66 100644 --- a/docs/reference/commands.mdx +++ b/docs/reference/commands.mdx @@ -1419,7 +1419,7 @@ These flags toggle optional behaviors during onboarding; set them before running | `NEMOCLAW_SANDBOX_GPU_DEVICE` | OpenShell GPU device selector | Selects the GPU device passed with `openshell sandbox create --gpu-device`. Requires explicit sandbox GPU enablement with `NEMOCLAW_SANDBOX_GPU=1` (or `--sandbox-gpu` for CLI-driven onboarding); otherwise onboarding rejects the selector instead of treating it as an implicit opt-in. | | `NEMOCLAW_DOCKER_GPU_PATCH` | `0` to disable, anything else to keep the default | Controls the Linux Docker-driver GPU sandbox compatibility patch. Set to `0` only as an escape hatch when the patch fails and you need onboarding to continue without patching the GPU sandbox container. | | `NEMOCLAW_OPENSHELL_GATEWAY_BIN` | path | Advanced override for the `openshell-gateway` binary used by the Linux Docker-driver standalone fallback. Defaults to the binary next to `openshell`, then common install paths. | -| `NEMOCLAW_OPENSHELL_SANDBOX_BIN` | path | Advanced override for the `openshell-sandbox` binary passed to the Linux Docker-driver standalone fallback. Defaults to the binary next to `openshell`, then common install paths. | +| `NEMOCLAW_OPENSHELL_SANDBOX_BIN` | path | Advanced override for the `openshell-sandbox` binary used by the Linux Docker-driver standalone fallback. Defaults to the binary next to `openshell`, then common install paths. | | `NEMOCLAW_OPENSHELL_GATEWAY_STATE_DIR` | path | Advanced override for the Linux Docker-driver gateway SQLite state directory and standalone-fallback PID file. Defaults to `~/.local/state/nemoclaw/openshell-docker-gateway`. | | `NEMOCLAW_WECHAT_QUIET` | `1` to enable | Silences the `[wechat]` diagnostic lines printed during the host-side WeChat QR login (poll status, IDC redirects, swallowed gateway errors), which are visible by default while the experimental WeChat path stabilizes; set `1` once the flow is reliable in your environment. | From 7523a1ebf586a3749bd3b091f2d22c64add9b021 Mon Sep 17 00:00:00 2001 From: Aaron Erickson Date: Sun, 31 May 2026 11:10:25 -0700 Subject: [PATCH 9/9] fix(onboard): tighten OpenShell service handoff Signed-off-by: Aaron Erickson --- docs/reference/architecture.mdx | 4 +- .../docker-driver-gateway-service.test.ts | 200 +++++++++++++++++- .../onboard/docker-driver-gateway-service.ts | 122 +++++++++-- src/lib/onboard/gateway-tcp-readiness.ts | 8 +- 4 files changed, 308 insertions(+), 26 deletions(-) diff --git a/docs/reference/architecture.mdx b/docs/reference/architecture.mdx index c291f99c01..d1ec4189c8 100644 --- a/docs/reference/architecture.mdx +++ b/docs/reference/architecture.mdx @@ -78,7 +78,9 @@ The logical diagram above shows how components relate. This section shows what actually runs where on the host. NemoClaw's default Docker-driver topology does not place the sandbox in an embedded k3s cluster. On Linux, NemoClaw configures and restarts the package-managed OpenShell gateway user service when it is installed, then creates the sandbox as a Docker container. -If the upstream service is unavailable, NemoClaw falls back to the standalone gateway process used by earlier installs. +NemoClaw treats that service as authoritative only when `systemctl --user show openshell-gateway` reports a package/vendor unit path and an `openshell-gateway` `ExecStart`. +Per-user units, partial units, and user-manager or bus outages do not take over gateway ownership; NemoClaw falls back to the standalone gateway process used by earlier installs. +That compatibility fallback remains until supported upgrade paths no longer include pre-service OpenShell installs and the package-managed handoff has direct nightly coverage. On Apple Silicon macOS, NemoClaw starts the OpenShell Docker-driver gateway and creates the sandbox as a Docker container. In both Docker-driver modes, the sandbox is a Docker container, not a Kubernetes pod. Legacy non-Docker-driver installs still use the k3s-based gateway path; the diagram below shows the standard Docker-driver topology. diff --git a/src/lib/onboard/docker-driver-gateway-service.test.ts b/src/lib/onboard/docker-driver-gateway-service.test.ts index 2b833a23a2..6b1b6e9e81 100644 --- a/src/lib/onboard/docker-driver-gateway-service.test.ts +++ b/src/lib/onboard/docker-driver-gateway-service.test.ts @@ -6,16 +6,39 @@ import { describe, expect, it, vi } from "vitest"; import { getOpenShellGatewayUserServicePaths, hasOpenShellGatewayUserService, + startPackageManagedDockerDriverGateway, startOpenShellGatewayUserService, type SpawnSyncLikeResult, } from "./docker-driver-gateway-service"; -function spawnResult(status = 0, stderr = ""): SpawnSyncLikeResult { +const STATUS_CONNECTED = ` +Server Status + +Gateway: nemoclaw +Server: https://127.0.0.1:8080/ +Connected +`; + +const GATEWAY_INFO = ` +Gateway Info + +Gateway: nemoclaw +Gateway endpoint: https://127.0.0.1:8080/ +`; + +function trustedShowOutput(fragmentPath = "/lib/systemd/user/openshell-gateway.service"): string { + return [ + `FragmentPath=${fragmentPath}`, + "ExecStart={ path=/usr/bin/openshell-gateway ; argv[]=/usr/bin/openshell-gateway ; }", + ].join("\n"); +} + +function spawnResult(status = 0, stderr = "", stdout = ""): SpawnSyncLikeResult { return { error: undefined, status, stderr, - stdout: "", + stdout, }; } @@ -45,8 +68,10 @@ describe("docker-driver-gateway-service", () => { ); }); - it("restarts the upstream user service with systemctl --user", () => { - const spawnSyncImpl = vi.fn((_command: string, _args: string[]) => spawnResult()); + it("restarts the upstream user service with systemctl --user after validating identity", () => { + const spawnSyncImpl = vi.fn((_command: string, args: string[]) => + args.includes("show") ? spawnResult(0, "", trustedShowOutput()) : spawnResult(), + ); const result = startOpenShellGatewayUserService({ commandExists: (command) => command === "systemctl", @@ -59,6 +84,16 @@ describe("docker-driver-gateway-service", () => { expect(result).toEqual({ attempted: true, fallbackAllowed: false, started: true }); expect(spawnSyncImpl.mock.calls.map(([command, args]) => [command, args])).toEqual([ ["systemctl", ["--user", "daemon-reload"]], + [ + "systemctl", + [ + "--user", + "show", + "openshell-gateway", + "--property=FragmentPath", + "--property=ExecStart", + ], + ], ["systemctl", ["--user", "enable", "openshell-gateway"]], ["systemctl", ["--user", "restart", "openshell-gateway"]], ]); @@ -71,7 +106,7 @@ describe("docker-driver-gateway-service", () => { existsSync: () => true, platform: "linux", spawnSyncImpl: vi.fn((_command: string, args: string[]) => - Array.isArray(args) && args.includes("daemon-reload") + args.includes("daemon-reload") ? spawnResult(1, "Failed to connect to bus") : spawnResult(), ), @@ -85,17 +120,38 @@ describe("docker-driver-gateway-service", () => { expect(result.reason).toContain("Failed to connect to bus"); }); + it("allows standalone fallback when restart loses the user systemd manager", () => { + const result = startOpenShellGatewayUserService({ + commandExists: () => true, + env: {}, + existsSync: () => true, + platform: "linux", + spawnSyncImpl: vi.fn((_command: string, args: string[]) => { + if (args.includes("show")) return spawnResult(0, "", trustedShowOutput()); + if (args.includes("restart")) return spawnResult(1, "Failed to connect to bus"); + return spawnResult(); + }), + }); + + expect(result).toMatchObject({ + attempted: true, + fallbackAllowed: true, + started: false, + }); + expect(result.reason).toContain("Failed to connect to bus"); + }); + it("does not silently fall back when the installed service fails to restart", () => { const result = startOpenShellGatewayUserService({ commandExists: () => true, env: {}, existsSync: () => true, platform: "linux", - spawnSyncImpl: vi.fn((_command: string, args: string[]) => - Array.isArray(args) && args.includes("restart") - ? spawnResult(1, "Job failed") - : spawnResult(), - ), + spawnSyncImpl: vi.fn((_command: string, args: string[]) => { + if (args.includes("show")) return spawnResult(0, "", trustedShowOutput()); + if (args.includes("restart")) return spawnResult(1, "Job failed"); + return spawnResult(); + }), }); expect(result).toMatchObject({ @@ -105,4 +161,128 @@ describe("docker-driver-gateway-service", () => { }); expect(result.reason).toContain("Job failed"); }); + + it("falls back instead of trusting an unverified service identity", () => { + const spawnSyncImpl = vi.fn((_command: string, args: string[]) => { + if (args.includes("show")) { + return spawnResult( + 0, + "", + [ + "FragmentPath=/home/nvidia/.config/systemd/user/openshell-gateway.service", + "ExecStart={ path=/usr/bin/openshell-gateway ; argv[]=/usr/bin/openshell-gateway ; }", + ].join("\n"), + ); + } + return spawnResult(); + }); + + const result = startOpenShellGatewayUserService({ + commandExists: () => true, + env: {}, + existsSync: () => true, + platform: "linux", + spawnSyncImpl, + }); + + expect(result).toMatchObject({ + attempted: true, + fallbackAllowed: true, + started: false, + }); + expect(result.reason).toContain("not the package-managed OpenShell gateway"); + expect(spawnSyncImpl.mock.calls.map(([, args]) => args.join(" "))).not.toContain( + "--user restart openshell-gateway", + ); + }); + + it("uses the package-managed service only after endpoint, metadata, and gRPC health are ready", async () => { + const events: string[] = []; + let registerCount = 0; + const registerDockerDriverGatewayEndpoint = vi.fn(() => { + events.push("register"); + registerCount += 1; + return registerCount >= 2; + }); + + await expect( + startPackageManagedDockerDriverGateway({ + clearDockerDriverGatewayRuntimeFiles: () => events.push("clear"), + exitOnFailure: false, + gatewayName: "nemoclaw", + hasOpenShellGatewayUserService: () => true, + healthPollCount: 3, + healthPollInterval: 0, + isDockerDriverGatewayReady: async () => { + events.push("ready"); + return true; + }, + registerDockerDriverGatewayEndpoint, + runCaptureOpenshell: (args) => (args[0] === "status" ? STATUS_CONNECTED : GATEWAY_INFO), + sleepSeconds: () => events.push("sleep"), + skipSandboxBridgeReachability: false, + startOpenShellGatewayUserService: () => ({ + attempted: true, + fallbackAllowed: false, + started: true, + }), + verifySandboxBridgeGatewayReachableOrExit: async () => { + events.push("verify"); + }, + }), + ).resolves.toBe(true); + + expect(events).toEqual(["register", "sleep", "register", "ready", "clear", "verify"]); + }); + + it("falls back to standalone when package-managed service startup is unavailable", async () => { + const registerDockerDriverGatewayEndpoint = vi.fn(() => true); + + await expect( + startPackageManagedDockerDriverGateway({ + clearDockerDriverGatewayRuntimeFiles: vi.fn(), + exitOnFailure: false, + gatewayName: "nemoclaw", + hasOpenShellGatewayUserService: () => true, + registerDockerDriverGatewayEndpoint, + runCaptureOpenshell: vi.fn(), + skipSandboxBridgeReachability: false, + startOpenShellGatewayUserService: () => ({ + attempted: true, + fallbackAllowed: true, + reason: "user manager unavailable", + started: false, + }), + verifySandboxBridgeGatewayReachableOrExit: vi.fn(), + }), + ).resolves.toBe(false); + + expect(registerDockerDriverGatewayEndpoint).not.toHaveBeenCalled(); + }); + + it("keeps standalone runtime breadcrumbs when service health never becomes ready", async () => { + const clearDockerDriverGatewayRuntimeFiles = vi.fn(); + + await expect( + startPackageManagedDockerDriverGateway({ + clearDockerDriverGatewayRuntimeFiles, + exitOnFailure: false, + gatewayName: "nemoclaw", + hasOpenShellGatewayUserService: () => true, + healthPollCount: 1, + isDockerDriverGatewayReady: async () => false, + registerDockerDriverGatewayEndpoint: () => true, + runCaptureOpenshell: (args) => (args[0] === "status" ? STATUS_CONNECTED : GATEWAY_INFO), + skipSandboxBridgeReachability: false, + startOpenShellGatewayUserService: () => ({ + attempted: true, + fallbackAllowed: false, + started: true, + }), + verifySandboxBridgeGatewayReachableOrExit: vi.fn(), + }), + ).rejects.toThrow("did not become healthy"); + + expect(clearDockerDriverGatewayRuntimeFiles).not.toHaveBeenCalled(); + }); }); diff --git a/src/lib/onboard/docker-driver-gateway-service.ts b/src/lib/onboard/docker-driver-gateway-service.ts index f6ffa7c673..2abe157bd3 100644 --- a/src/lib/onboard/docker-driver-gateway-service.ts +++ b/src/lib/onboard/docker-driver-gateway-service.ts @@ -3,11 +3,12 @@ import { spawnSync, type SpawnSyncOptions } from "node:child_process"; import fs from "node:fs"; +import path from "node:path"; import { sleepSeconds } from "../core/wait"; import { isGatewayHealthy } from "../state/gateway"; import { envInt } from "./env"; -import { isGatewayTcpReady } from "./gateway-tcp-readiness"; +import { isDockerDriverGatewayHttpReady } from "./gateway-http-readiness"; export const OPENSHELL_GATEWAY_USER_SERVICE = "openshell-gateway"; @@ -43,15 +44,26 @@ export interface PackageManagedDockerDriverGatewayOptions { clearDockerDriverGatewayRuntimeFiles: () => void; exitOnFailure: boolean; gatewayName: string; + hasOpenShellGatewayUserService?: () => boolean; + healthPollCount?: number; + healthPollInterval?: number; + isDockerDriverGatewayReady?: () => Promise; registerDockerDriverGatewayEndpoint: () => boolean; runCaptureOpenshell: (args: string[], opts?: { ignoreError?: boolean }) => string; + sleepSeconds?: (seconds: number) => void; skipSandboxBridgeReachability: boolean; + startOpenShellGatewayUserService?: () => OpenShellGatewayUserServiceStartResult; verifySandboxBridgeGatewayReachableOrExit: ( exitOnFailure: boolean, options?: { skip?: boolean }, ) => Promise; } +interface OpenShellGatewayUserServiceIdentity { + execStart: string; + fragmentPath: string; +} + export function getOpenShellGatewayUserServicePaths(): string[] { return [ "/usr/local/lib/systemd/user/openshell-gateway.service", @@ -92,7 +104,7 @@ function userManagerLooksUnavailable(reason: string): boolean { function runSystemctlUser( args: string[], opts: Required>, -): { ok: boolean; reason?: string } { +): { ok: boolean; reason?: string; stdout?: string } { const result = opts.spawnSyncImpl("systemctl", ["--user", ...args], { encoding: "utf-8", env: opts.env, @@ -105,7 +117,65 @@ function runSystemctlUser( const detail = text(result.stderr).trim() || text(result.stdout).trim() || `exit ${String(result.status)}`; return { ok: false, reason: detail }; } - return { ok: true }; + return { ok: true, stdout: text(result.stdout) }; +} + +function parseSystemctlShowProperties(output: string): Record { + const properties: Record = {}; + for (const line of output.split(/\r?\n/)) { + const separator = line.indexOf("="); + if (separator <= 0) continue; + properties[line.slice(0, separator)] = line.slice(separator + 1).trim(); + } + return properties; +} + +function isTrustedOpenShellGatewayUserServiceIdentity( + identity: OpenShellGatewayUserServiceIdentity, +): boolean { + const fragmentPath = path.normalize(identity.fragmentPath.trim()); + const trustedUnit = getOpenShellGatewayUserServicePaths().some( + (candidate) => path.normalize(candidate) === fragmentPath, + ); + if (!trustedUnit) return false; + return /\bopenshell-gateway\b/.test(identity.execStart); +} + +function readTrustedOpenShellGatewayUserServiceIdentity( + opts: Required>, +): { fallbackAllowed: boolean; ok: boolean; reason?: string } { + const result = runSystemctlUser( + ["show", OPENSHELL_GATEWAY_USER_SERVICE, "--property=FragmentPath", "--property=ExecStart"], + opts, + ); + if (!result.ok) { + return { + fallbackAllowed: userManagerLooksUnavailable(result.reason ?? ""), + ok: false, + reason: `systemctl --user show ${OPENSHELL_GATEWAY_USER_SERVICE} failed: ${result.reason}`, + }; + } + + const properties = parseSystemctlShowProperties(result.stdout ?? ""); + const identity = { + execStart: properties.ExecStart ?? "", + fragmentPath: properties.FragmentPath ?? "", + }; + if (!identity.fragmentPath || !identity.execStart) { + return { + fallbackAllowed: true, + ok: false, + reason: "service identity is incomplete", + }; + } + if (!isTrustedOpenShellGatewayUserServiceIdentity(identity)) { + return { + fallbackAllowed: true, + ok: false, + reason: `service identity is not the package-managed OpenShell gateway (${identity.fragmentPath})`, + }; + } + return { fallbackAllowed: false, ok: true }; } export function startOpenShellGatewayUserService( @@ -137,8 +207,30 @@ export function startOpenShellGatewayUserService( } const spawnSyncImpl = opts.spawnSyncImpl ?? spawnSync; + for (const args of [["daemon-reload"]]) { + const result = runSystemctlUser(args, { env, spawnSyncImpl }); + if (!result.ok) { + const reason = `systemctl --user ${args.join(" ")} failed: ${result.reason}`; + return { + attempted: true, + fallbackAllowed: userManagerLooksUnavailable(result.reason ?? ""), + reason, + started: false, + }; + } + } + + const identity = readTrustedOpenShellGatewayUserServiceIdentity({ env, spawnSyncImpl }); + if (!identity.ok) { + return { + attempted: true, + fallbackAllowed: identity.fallbackAllowed, + reason: identity.reason, + started: false, + }; + } + for (const args of [ - ["daemon-reload"], ["enable", OPENSHELL_GATEWAY_USER_SERVICE], ["restart", OPENSHELL_GATEWAY_USER_SERVICE], ]) { @@ -147,7 +239,7 @@ export function startOpenShellGatewayUserService( const reason = `systemctl --user ${args.join(" ")} failed: ${result.reason}`; return { attempted: true, - fallbackAllowed: args[0] === "daemon-reload" && userManagerLooksUnavailable(result.reason ?? ""), + fallbackAllowed: userManagerLooksUnavailable(result.reason ?? ""), reason, started: false, }; @@ -161,15 +253,21 @@ export async function startPackageManagedDockerDriverGateway({ clearDockerDriverGatewayRuntimeFiles, exitOnFailure, gatewayName, + hasOpenShellGatewayUserService: hasOpenShellGatewayUserServiceImpl = hasOpenShellGatewayUserService, + healthPollCount, + healthPollInterval, + isDockerDriverGatewayReady = isDockerDriverGatewayHttpReady, registerDockerDriverGatewayEndpoint, runCaptureOpenshell, + sleepSeconds: sleepSecondsImpl = sleepSeconds, skipSandboxBridgeReachability, + startOpenShellGatewayUserService: startOpenShellGatewayUserServiceImpl = startOpenShellGatewayUserService, verifySandboxBridgeGatewayReachableOrExit, }: PackageManagedDockerDriverGatewayOptions): Promise { - if (!hasOpenShellGatewayUserService()) return false; + if (!hasOpenShellGatewayUserServiceImpl()) return false; console.log(" Starting OpenShell Docker-driver gateway via upstream user service..."); - const serviceStart = startOpenShellGatewayUserService(); + const serviceStart = startOpenShellGatewayUserServiceImpl(); if (!serviceStart.started) { const detail = serviceStart.reason ? ` (${serviceStart.reason})` : ""; if (serviceStart.fallbackAllowed) { @@ -183,11 +281,11 @@ export async function startPackageManagedDockerDriverGateway({ throw new Error(message); } - const pollCount = envInt("NEMOCLAW_HEALTH_POLL_COUNT", 30); - const pollInterval = envInt("NEMOCLAW_HEALTH_POLL_INTERVAL", 2); + const pollCount = healthPollCount ?? envInt("NEMOCLAW_HEALTH_POLL_COUNT", 30); + const pollInterval = healthPollInterval ?? envInt("NEMOCLAW_HEALTH_POLL_INTERVAL", 2); for (let i = 0; i < pollCount; i += 1) { if (!registerDockerDriverGatewayEndpoint()) { - if (i < pollCount - 1) sleepSeconds(pollInterval); + if (i < pollCount - 1) sleepSecondsImpl(pollInterval); continue; } const status = runCaptureOpenshell(["status"], { ignoreError: true }); @@ -195,7 +293,7 @@ export async function startPackageManagedDockerDriverGateway({ ignoreError: true, }); const currentInfo = runCaptureOpenshell(["gateway", "info"], { ignoreError: true }); - if (isGatewayHealthy(status, namedInfo, currentInfo) && (await isGatewayTcpReady())) { + if (isGatewayHealthy(status, namedInfo, currentInfo) && (await isDockerDriverGatewayReady())) { clearDockerDriverGatewayRuntimeFiles(); await verifySandboxBridgeGatewayReachableOrExit(exitOnFailure, { skip: skipSandboxBridgeReachability, @@ -203,7 +301,7 @@ export async function startPackageManagedDockerDriverGateway({ console.log(" ✓ OpenShell gateway user service is healthy"); return true; } - if (i < pollCount - 1) sleepSeconds(pollInterval); + if (i < pollCount - 1) sleepSecondsImpl(pollInterval); } const message = "OpenShell gateway user service started but did not become healthy."; diff --git a/src/lib/onboard/gateway-tcp-readiness.ts b/src/lib/onboard/gateway-tcp-readiness.ts index 0a8a0e3bce..7071ebc033 100644 --- a/src/lib/onboard/gateway-tcp-readiness.ts +++ b/src/lib/onboard/gateway-tcp-readiness.ts @@ -2,7 +2,7 @@ // SPDX-License-Identifier: Apache-2.0 /** - * Host-level TCP readiness probe for the OpenShell Docker-driver gateway. + * Host-level TCP readiness probe for the standalone OpenShell Docker-driver gateway. * * Plain TCP connect to the local gateway endpoint — semantic-free, just asks * "is anyone listening?". Used by `startDockerDriverGateway` in `onboard.ts` @@ -21,8 +21,10 @@ * * There is a peer module `gateway-http-readiness.ts` (introduced by #3312) * that exposes `isGatewayHttpReady` — a stronger HTTP-level probe used on - * the K3s path. It cannot be reused on the Docker-driver path because the - * two gateway types expose different HTTP routes for the root path: + * the K3s path — and `isDockerDriverGatewayHttpReady`, the package-managed + * Docker-driver handoff probe for the gRPC health endpoint. The root-path + * probe cannot be reused on the Docker-driver path because the two gateway + * types expose different HTTP routes for the root path: * * - K3s gateway answers `GET /` with 200/401 via a dispatcher catch-all. * - Docker-driver gateway returns 404 for `GET /`; only routes under