Skip to content

fix(cli): emit docker_unreachable header upfront in sandbox status#4388

Open
laitingsheng wants to merge 2 commits into
mainfrom
fix/4313-status-docker-unreachable-header
Open

fix(cli): emit docker_unreachable header upfront in sandbox status#4388
laitingsheng wants to merge 2 commits into
mainfrom
fix/4313-status-docker-unreachable-header

Conversation

@laitingsheng
Copy link
Copy Markdown
Contributor

@laitingsheng laitingsheng commented May 28, 2026

Summary

nemoclaw <name> status now probes the host Docker daemon upfront on docker-driver sandboxes and prints Failure layer: docker_unreachable — Docker daemon is not reachable. as the first line of stdout when the daemon is stopped. The misleading host-side Inference: healthy line (which hits the remote provider directly and bypasses the local stack) is suppressed in that case, and the command exits with code 1. The same layer header is no longer emitted twice when the downstream gateway-state branches also classify the failure as docker_unreachable.

Related Issue

Fixes #4313.

Changes

  • src/lib/actions/sandbox/gateway-failure-classifier.ts exports a new isDockerDaemonReachable() helper that wraps the existing dockerInfo({ ignoreError, timeout: 3000 }) probe.
  • src/lib/actions/sandbox/status.ts adds isDockerDaemonUnreachableForStatus(sb, probe?) (gated on sb.openshellDriver === "docker") which delegates to the shared helper. showSandboxStatus evaluates it after the sandbox-registry lookup, prints the layer header verbatim as the first stdout line, sets process.exitCode = 1, and suppresses the host-side inference probe. printGatewayFailureLayerHeader takes an alreadyPrintedDockerUnreachable flag so downstream gateway-state failure branches do not duplicate the header when the classifier also returns docker_unreachable.
  • src/lib/actions/sandbox/status.test.ts covers the new helper with null sandbox, non-docker driver, docker driver with reachable probe, and docker driver with unreachable probe.
  • test/cli.test.ts adds two integration tests: one drives alpha status with a docker stub that exits 1 and asserts exit code 1, that stdout starts with the verbatim layer header, that Inference: healthy is absent, that the header appears exactly once, and that the header precedes the Sandbox: alpha line; the other registers the sandbox with openshellDriver: "vm" and asserts exit code 0, presence of the Provider/Model/Inference lines, and absence of the docker header.
  • docs/reference/commands.mdx documents the docker-driver stopped-daemon behaviour, including the verbatim layer header, exit-code contract, and the inference-probe suppression.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Detects when the host Docker daemon is unreachable and marks sandbox status as failed; command now exits non‑zero and shows a dedicated "docker_unreachable" failure layer.
    • Suppresses the host-side inference health line when Docker is unreachable to avoid misleading output.
  • Documentation

    • Status command docs updated to describe the Docker-unreachable behavior and resulting output.
  • Tests

    • Added tests covering status behavior when Docker is unreachable and when it is not.

Review Change Stack

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a0d1eeae-5ce7-475d-b3dc-1593f5cc2ca8

📥 Commits

Reviewing files that changed from the base of the PR and between 5811f8d and 2da0d96.

📒 Files selected for processing (4)
  • docs/reference/commands.mdx
  • src/lib/actions/sandbox/gateway-failure-classifier.ts
  • src/lib/actions/sandbox/status.ts
  • test/cli.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/actions/sandbox/status.ts

📝 Walkthrough

Walkthrough

Implements upfront Docker daemon reachability detection in nemoclaw <sandbox> status. For sandboxes recorded with openshellDriver: "docker", the command now emits a docker_unreachable failure-layer header first, suppresses host-side inference probing, and sets a non-zero exit code when the host Docker daemon is unreachable.

Changes

Docker Unreachable Detection

Layer / File(s) Summary
Docker reachability helper and unit tests
src/lib/actions/sandbox/status.test.ts
Adds imports and unit tests for isDockerDaemonUnreachableForStatus(): covers null sandbox, non-docker driver, docker driver with reachable and unreachable probe results.
Classifier export: Docker daemon probe
src/lib/actions/sandbox/gateway-failure-classifier.ts
Exports isDockerDaemonReachable() which delegates to defaultDockerInfo() to expose Docker daemon reachability as a boolean helper.
Status flow: inference gating and failure output
src/lib/actions/sandbox/status.ts
Adds isDockerDaemonUnreachableForStatus() (exported, injectable probe). showSandboxStatus computes dockerUnreachable early; when true prints Failure layer: docker_unreachable…, sets process.exitCode = 1, and forces inferenceHealth = null. Updates printGatewayFailureLayerHeader and its call sites to accept the dockerUnreachable flag.
Integration tests: docker and non-docker scenarios
test/cli.test.ts
End-to-end tests assert Docker-driver sandbox with unreachable Docker prints the docker_unreachable header (once, first), suppresses “Inference: healthy”, and exits 1; non-docker driver with Docker unreachable does not print the header and still shows Inference: output.
Docs: status command note
docs/reference/commands.mdx
Documents the new docker_unreachable first-line behavior, inference suppression, and non-zero exit for Docker-driver sandboxes when the host Docker daemon is unreachable.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4020: Also modifies src/lib/actions/sandbox/status.ts/gateway failure handling for docker_unreachable detection.
  • NVIDIA/NemoClaw#4180: Related status flow changes; adds Docker container health reporting logic that intersects with status output.

Suggested labels

NemoClaw CLI, Docker, Sandbox, documentation, bug

Suggested reviewers

  • cv
  • cjagwani
  • jyaunches

Poem

🐰 I hopped to check the Docker sea,
Bumped the daemon — "not with me!"
A header shouts: "docker_unreachable" first,
No stale inference to make things worse.
Exit one, clear as morning sun, debugging done.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(cli): emit docker_unreachable header upfront in sandbox status' clearly and concisely describes the main change: emitting a docker_unreachable header early in the sandbox status output when Docker is unreachable.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from issue #4313: probes Docker daemon upfront for docker-driver sandboxes, emits the verbatim docker_unreachable header before other output, suppresses misleading Inference output, exits with code 1, and includes comprehensive unit and integration tests covering all scenarios.
Out of Scope Changes check ✅ Passed All changes directly address the requirements in issue #4313: added Docker reachability check, docker_unreachable header emission, inference suppression, exit code handling, and test coverage; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/4313-status-docker-unreachable-header

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

E2E Advisor Recommendation

Required E2E: sandbox-operations-e2e, diagnostics-e2e
Optional E2E: gateway-health-honest-e2e, sandbox-survival-e2e

Dispatch hint: sandbox-operations-e2e,diagnostics-e2e

Auto-dispatched E2E: sandbox-operations-e2e, diagnostics-e2e via nightly-e2e.yaml at 2da0d96682cf4f98ce9e44d2a5c7dca4c73bc331nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • sandbox-operations-e2e (high; provisions live sandbox(es), around 60 minute job budget): Most directly exercises real sandbox lifecycle operations including nemoclaw <name> status, status fields, OpenClaw process recovery via status, and gateway recovery paths affected by the status/gateway-failure changes.
  • diagnostics-e2e (medium-high; provisions a live sandbox, around 30-60 minute job budget): Covers user-facing diagnostics including host-side nemoclaw <name> status output for model/provider reporting, ensuring the status changes do not regress normal healthy reporting while adding the Docker-unreachable path.

Optional E2E

  • gateway-health-honest-e2e (medium; sabotages gateway startup without full sandbox onboarding): Adjacent regression coverage for honest gateway health/failure reporting. It does not specifically stop Docker after sandbox creation, but it is relevant to failure-layer honesty and avoiding false healthy gateway signals.
  • sandbox-survival-e2e (high; live sandbox with gateway restart flow, around 30 minute job budget): Useful broader confidence for sandbox/gateway restart survival and status/inference after gateway restarts, but less targeted than sandbox-operations for this PR.

New E2E recommendations

  • docker-driver status when Docker daemon stops after onboarding (high): Existing E2E coverage exercises normal status and gateway recovery, but there is no exact live E2E that creates/records a docker-driver sandbox, makes the Docker daemon unreachable, then asserts nemoclaw <name> status prints Failure layer: docker_unreachable first, exits non-zero, suppresses stale Inference: healthy, and avoids duplicate failure-layer headers.
    • Suggested test: Add a targeted regression E2E, e.g. docker-unreachable-status-e2e, that uses a fixture or controlled daemon shim to simulate Docker becoming unreachable for an existing docker-driver sandbox and validates the new status contract end-to-end.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: sandbox-operations-e2e,diagnostics-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

PR Review Advisor

Findings: 1 needs attention, 0 worth checking, 0 nice ideas
Since last review: 4 prior items resolved, 1 still applies, 0 new items found

Review findings

🛠️ Needs attention

  • Offset or extract the added status.ts hotspot growth (src/lib/actions/sandbox/status.ts:111): This PR fixes the reported status behavior, but it still adds another diagnostic path directly into src/lib/actions/sandbox/status.ts, which is already a large and frequently changed CLI/status hotspot. The deterministic drift check reports this file growing from 439 to 476 lines (+37), and multiple open PRs overlap this same status/test surface. That increases merge drift and makes future status behavior harder to reason about.

🔎 Worth checking

  • None.

🌱 Nice ideas

  • None.
Since last review details

Current findings:

  • Offset or extract the added status.ts hotspot growth (src/lib/actions/sandbox/status.ts:111): This PR fixes the reported status behavior, but it still adds another diagnostic path directly into src/lib/actions/sandbox/status.ts, which is already a large and frequently changed CLI/status hotspot. The deterministic drift check reports this file growing from 439 to 476 lines (+37), and multiple open PRs overlap this same status/test surface. That increases merge drift and makes future status behavior harder to reason about.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/cli.test.ts (1)

1003-1052: ⚡ Quick win

Strengthen the non-docker-driver assertion coverage.

This test currently proves only header absence, not that inference probing remains enabled. Please also assert normal (non-error) status behavior for this path.

Suggested diff
   const r = runWithEnv("alpha status", {
     HOME: home,
     PATH: `${localBin}:${process.env.PATH || ""}`,
   });

+  expect(r.code).toBe(0);
+  expect(r.out).toContain("Inference:");
   expect(r.out).not.toContain(
     "Failure layer: docker_unreachable",
   );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/cli.test.ts` around lines 1003 - 1052, The test "sandbox <name> status
preserves Inference probe when openshellDriver is not docker" currently only
asserts the absence of "Failure layer: docker_unreachable"; update the test
(around runWithEnv("alpha status") / writeSandboxRegistry) to also assert that
inference probing and normal status output are present by checking r.out
contains the inference probe lines (e.g., "Gateway inference:" and "Provider:
openai-api" or "Model: gpt-4o-mini") and that normal gateway status appears
(e.g., "Status: Connected"), using the same runWithEnv result and existing
expect APIs so you verify probing remains enabled in non-docker openshellDriver
cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/cli.test.ts`:
- Around line 1003-1052: The test "sandbox <name> status preserves Inference
probe when openshellDriver is not docker" currently only asserts the absence of
"Failure layer: docker_unreachable"; update the test (around runWithEnv("alpha
status") / writeSandboxRegistry) to also assert that inference probing and
normal status output are present by checking r.out contains the inference probe
lines (e.g., "Gateway inference:" and "Provider: openai-api" or "Model:
gpt-4o-mini") and that normal gateway status appears (e.g., "Status:
Connected"), using the same runWithEnv result and existing expect APIs so you
verify probing remains enabled in non-docker openshellDriver cases.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8b9c4ead-b564-4518-bbce-7e09c2c07300

📥 Commits

Reviewing files that changed from the base of the PR and between 1daf081 and 5811f8d.

📒 Files selected for processing (3)
  • src/lib/actions/sandbox/status.test.ts
  • src/lib/actions/sandbox/status.ts
  • test/cli.test.ts

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26551292928
Target ref: 5811f8d5a3327e07e126109875c8ba9256504454
Workflow ref: main
Requested jobs: sandbox-operations-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
sandbox-operations-e2e ✅ success

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26552024714
Target ref: 2da0d96682cf4f98ce9e44d2a5c7dca4c73bc331
Workflow ref: main
Requested jobs: sandbox-operations-e2e,diagnostics-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
diagnostics-e2e ✅ success
sandbox-operations-e2e ✅ success

@laitingsheng laitingsheng added Sandbox Use this label to identify issues related to the NemoClaw isolated environment based on OpenShell. v0.0.55 Release target labels May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Sandbox Use this label to identify issues related to the NemoClaw isolated environment based on OpenShell. v0.0.55 Release target

Projects

None yet

1 participant