Skip to content

fix(onboard): classify Docker GPU patch Error-phase failure (#4316)#4407

Merged
cv merged 12 commits into
NVIDIA:mainfrom
yimoj:fix/4316-docker-gpu-error-diagnostics
Jun 2, 2026
Merged

fix(onboard): classify Docker GPU patch Error-phase failure (#4316)#4407
cv merged 12 commits into
NVIDIA:mainfrom
yimoj:fix/4316-docker-gpu-error-diagnostics

Conversation

@yimoj
Copy link
Copy Markdown
Contributor

@yimoj yimoj commented May 28, 2026

Summary

The Docker GPU patch path can leave the patched container dead/exited with the sandbox in Error phase, but NemoClaw previously reported a generic "did not become ready" + "GPU proof failed" combo without enough container-level signal to identify which patched create option broke the sandbox. This PR distinguishes Error-phase / patched-container failure from real proof failures, short-circuits the readiness and supervisor-reconnect waits when the sandbox enters a terminal phase, and captures actionable diagnostics on disk so users can see the patched container's exit code and error directly.

Related Issue

Fixes #4316

Changes

  • Add isSandboxInErrorPhase / getSandboxFailurePhase in src/lib/state/gateway.ts to recognize Error / Failed / CrashLoopBackOff rows from openshell sandbox list.
  • Short-circuit waitForCreatedSandboxReadyWithTrace and waitForOpenShellSupervisorReconnect when the sandbox enters a terminal phase instead of burning the full timeout window.
  • Add captureDockerGpuPatchSandboxSnapshot and classifyDockerGpuPatchFailure in src/lib/onboard/docker-gpu-patch.ts. The classifier distinguishes patched_container_failed, sandbox_error_phase, supervisor_unreachable, and proof_failure based on the sandbox phase + patched container State.
  • Wire the new snapshot/classification into printDockerGpuPatchFailureAndExit, printDockerGpuReadinessFailure, and printDockerGpuProofFailure; write patched-container-state.json alongside existing diagnostics and surface a failure_kind= line in summary.txt.
  • Skip the GPU proof step in onboard.ts when the sandbox is already in a terminal phase so users see the real lifecycle failure instead of an openshell sandbox exec-refused error.
  • Plumb dockerCapture through docker-gpu-sandbox-create and onboard.ts so diagnostics work in every patch entry point.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npm test (CLI + plugin) — relevant suites pass; pre-existing flakes (config-sync chmod, some cli.test.ts 5s timeouts on overloaded hosts) reproduce on upstream/main and are unrelated.
  • ./node_modules/.bin/tsc -p tsconfig.cli.json passes.
  • Tests added or updated for new or changed behavior (src/lib/onboard/docker-gpu-patch.test.ts).
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes — no doc surface changed; diagnostics are written to ~/.nemoclaw/onboard-failures/... as before, just with more fields.

Notes

  • Host environment for this fix had Docker but no NVIDIA GPU, so the regression was reproduced and validated hermetically (mocked openshell + docker adapters with the reporter's failure signatures). The hermetic test exercises the new short-circuit and classification, and the build output matches the live-system flow.
  • Codex review (5 rounds) is now clean on the code; only the local triage scratch note (issue-4316.md) was flagged and is intentionally not committed.

Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

  • New Features

    • Enhanced GPU sandbox diagnostics with standardized failure summaries and state files for clearer post-mortems.
    • Allow caller-controlled GPU proof verification so callers can run custom verification and failure handling.
  • Bug Fixes

    • Faster failure detection — readiness and reconnect waits abort early when sandboxes enter terminal error phases.
    • More precise, classified readiness/failure messages distinguishing terminal phases from timeouts.
  • Tests

    • Added regression tests for phase detection, diagnostics capture/classification, and GPU-proof flows.

The Docker GPU patch can leave the patched container in a dead/exited
state with the sandbox in Error phase. Onboarding previously reported a
generic "did not become ready" + "GPU proof failed" combo without enough
container-level signal to identify which patched create option broke
the sandbox; the readiness and supervisor-reconnect waits also burned
their full timeout windows even when the sandbox had already entered a
terminal phase (NVIDIA#4316).

Add an Error/Failed/CrashLoopBackOff phase classifier in state/gateway,
short-circuit `waitForCreatedSandboxReadyWithTrace` and
`waitForOpenShellSupervisorReconnect` when the sandbox enters a
terminal failure phase, and introduce a snapshot + classifier in
docker-gpu-patch that distinguishes patched_container_failed,
sandbox_error_phase, supervisor_unreachable, and proof_failure. The
print helpers surface the new classification plus a
patched-container-state.json artifact alongside the existing
diagnostics. Skip the GPU proof entirely when the sandbox is already in
a terminal phase so users see the actual lifecycle failure instead of
an exec-refused error.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 197198e2-0a0a-4cb7-af58-497c1e6ad9eb

📥 Commits

Reviewing files that changed from the base of the PR and between 254f578 and 460704d.

📒 Files selected for processing (5)
  • src/lib/onboard.ts
  • src/lib/onboard/docker-gpu-patch.test.ts
  • src/lib/onboard/docker-gpu-sandbox-create.ts
  • src/lib/onboard/sandbox-readiness-tracing.ts
  • src/lib/state/gateway.ts
💤 Files with no reviewable changes (1)
  • src/lib/state/gateway.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/onboard/docker-gpu-patch.test.ts

📝 Walkthrough

Walkthrough

Detects terminal sandbox error phases during Docker-GPU onboarding, short‑circuits readiness and supervisor‑reconnect waits, captures sandbox/container snapshots, classifies failures, and writes enriched diagnostics; includes regression tests for parsing, short‑circuits, classification, and output files.

Changes

GPU Patch Error-Phase Detection and Diagnostics

Layer / File(s) Summary
Sandbox error-phase detection helpers
src/lib/state/gateway.ts
Adds TERMINAL_SANDBOX_FAILURE_PHASES and exports getSandboxFailurePhase(output, sandboxName) to extract terminal failure tokens from openshell sandbox output.
Readiness tracer integration
src/lib/onboard/sandbox-readiness-tracing.ts
waitForCreatedSandboxReadyWithTrace now returns CreatedSandboxReadinessResult, accepts a sandbox-failure extractor, emits terminal_failure_phase, and early-returns when a terminal phase is observed; adds formatCreatedSandboxReadinessFailureMessage and printReadinessFailure.
Diagnostics types and parsing
src/lib/onboard/docker-gpu-patch.ts
Adds exported types (DockerContainerState, DockerGpuPatchSandboxSnapshot, DockerGpuPatchFailureClassification) and parsers for sandbox list/get and docker inspect to support structured snapshots.
Patch error handling, classification, and diagnostics
src/lib/onboard/docker-gpu-patch.ts
Adds captureDockerGpuPatchSandboxSnapshot and classifyDockerGpuPatchFailure; failure printers capture snapshot/classification, accept dockerCapture, and collectDockerGpuPatchDiagnostics writes enriched summary.txt and patched-container-state.json.
Supervisor reconnect short-circuit
src/lib/onboard/docker-gpu-patch.ts
waitForOpenShellSandboxExec and reconnect waiters short-circuit when sandbox list shows a terminal error phase.
Sandbox create: deps wiring and accessor
src/lib/onboard/docker-gpu-sandbox-create.ts
Requires and threads dockerCapture through diagnostics paths, exposes printReadinessFailureIfEnabled() and verifyGpuOrExit() on the DockerGpuSandboxCreatePatch surface, and builds DockerGpuPatchFailureContext for diagnostics.
Onboarding flow integration
src/lib/onboard.ts
Passes gatewayState.getSandboxFailurePhase into readiness tracing, uses sandboxReadinessTracing.printReadinessFailure on readiness failure, routes readiness proof failures to dockerGpuCreatePatch.printReadinessFailureIfEnabled(), and passes dockerGpuCreatePatch.verifyGpuOrExit into verifyGpuSandboxAfterReady.
Local GPU verification wrapper and tests
src/lib/onboard/docker-gpu-local-inference.ts, src/lib/onboard/docker-gpu-local-inference.test.ts
Adds optional verifyGpuOrExit parameter to verifyGpuSandboxAfterReady and tests verifying delegation and suppression of duplicate diagnostics when the wrapper handles failures.
Regression tests for #4316
src/lib/onboard/docker-gpu-patch.test.ts
New tests validate terminal-phase detection from sandbox list, readiness/reconnect short-circuits, sandbox-list vs get precedence, snapshot contents, multiple classification scenarios, diagnostics behavior with/without dockerCapture, and filesystem outputs.

Sequence Diagrams

sequenceDiagram
  participant OnboardFlow as Onboard Flow
  participant ReadinessTracer as Readiness Tracer
  participant GatewayHelpers as Gateway Helpers
  participant DockerGpuPatch as Docker-GPU Patch
  OnboardFlow->>ReadinessTracer: waitForCreatedSandboxReadyWithTrace(getSandboxFailurePhase)
  ReadinessTracer->>GatewayHelpers: parse sandbox list/get for sandboxName
  alt Terminal Error Phase
    GatewayHelpers-->>ReadinessTracer: failurePhase
    ReadinessTracer-->>OnboardFlow: terminal_failure_phase (failurePhase)
    OnboardFlow->>DockerGpuPatch: printReadinessFailureIfEnabled() / collect diagnostics
  else Ready
    GatewayHelpers-->>ReadinessTracer: ready
    ReadinessTracer-->>OnboardFlow: ready
    OnboardFlow->>DockerGpuPatch: verifyGpuOrExit(verifyDirectSandboxGpu)
  end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

  • NVIDIA/NemoClaw#4609: Modifies GPU proof/verification flow and verifyGpuSandboxAfterReady behavior in related areas.

Suggested Reviewers

  • ericksoa

🐰 I hopped through logs where phases flick and fade,
I sniffed for "Error" tokens the list display made,
I captured a snapshot, put reasons in a file,
I stopped waiting when Error came and logged the while,
Hop, patch, report — then onboarding may smile!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.59% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(onboard): classify Docker GPU patch Error-phase failure (#4316)' clearly identifies the main change: classifying Docker GPU patch Error-phase failures to address issue #4316, which is the core objective of the pull request.
Linked Issues check ✅ Passed The PR addresses all primary coding objectives from #4316: it detects sandbox Error phases via getSandboxFailurePhase, short-circuits readiness waits on terminal phases, classifies failures into distinct kinds (patched_container_failed, sandbox_error_phase, supervisor_unreachable, proof_failure), captures actionable diagnostics (patched-container-state.json, failure_kind in summary.txt), and integrates the classification pipeline into failure reporters.
Out of Scope Changes check ✅ Passed All code changes align with the stated objectives: failure classification and diagnostics capture for Docker GPU patch flows, readiness-tracing improvements, gateway phase detection, and sandbox-create integration. No unrelated refactoring or feature additions are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)

3643-3648: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Differentiate terminal failure from timeout.

Once isSandboxInErrorPhase is wired into this wait, ready === false no longer means "timed out". The fallback message at Line 3662 still says the sandbox "did not become ready within ...", so an early Error/Failed/CrashLoopBackOff exit is still reported as a timeout.

Suggested fix
   const ready = sandboxReadinessTracing.waitForCreatedSandboxReadyWithTrace({
     sandboxName,
     timeoutSecs: sandboxReadyTimeoutSecs,
     runCaptureOpenshell,
     isSandboxReady,
     isSandboxInErrorPhase,
     sleep: sleepSeconds,
   });
+  const failurePhase = !ready
+    ? getSandboxFailurePhase(
+        runCaptureOpenshell(["sandbox", "list"], { ignoreError: true }),
+        sandboxName,
+      )
+    : null;
 
   const restoreBackupPath =
     pendingStateRestore?.manifest?.backupPath ?? pendingStateRestoreBackupPath;
 
   if (!ready) {
@@
-    console.error(
-      `  Sandbox '${sandboxName}' was created but did not become ready within ${sandboxReadyTimeoutSecs}s.`,
-    );
+    console.error(
+      failurePhase
+        ? `  Sandbox '${sandboxName}' entered ${failurePhase} before it became ready.`
+        : `  Sandbox '${sandboxName}' was created but did not become ready within ${sandboxReadyTimeoutSecs}s.`,
+    );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard.ts` around lines 3643 - 3648, The current call to
sandboxReadinessTracing.waitForCreatedSandboxReadyWithTrace (assigned to ready)
treats any non-true result as a timeout; update the wait function to return a
richer result (e.g., { ready: boolean, reason: 'timeout'|'error'|'other',
details?: any }) or otherwise surface why it failed (use isSandboxInErrorPhase
internally), then change the caller logic that inspects ready to branch: when
ready === true proceed, when reason === 'error' log/throw a clear "sandbox
entered error phase" message including details, and when reason === 'timeout'
keep the existing timeout fallback message. Ensure references to ready,
sandboxReadinessTracing.waitForCreatedSandboxReadyWithTrace, and
isSandboxInErrorPhase are used so the caller can distinguish terminal failures
from timeouts.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3680-3688: The code repeatedly constructs the same Docker GPU
diagnostics payload object ({ runCaptureOpenshell, dockerCapture:
docker.dockerCapture, context: { sandboxName, newContainerId:
dockerGpuCreatePatch.patchedContainerId(), selectedMode:
dockerGpuCreatePatch.selectedMode() } }) in three places; extract that into a
small local helper/closure (e.g., buildDockerGpuDiagPayload or dockerGpuDiag())
and replace each inline object with a call to that helper so the assembly uses
the single function and reduces file growth and duplication around
runCaptureOpenshell, docker.dockerCapture, sandboxName,
dockerGpuCreatePatch.patchedContainerId(), and
dockerGpuCreatePatch.selectedMode().

In `@src/lib/onboard/docker-gpu-patch.ts`:
- Around line 1361-1376: The current logic sets sandboxPhase from
parseSandboxPhaseFromGetOutput(getOutput) and only uses
parseSandboxPhaseFromListOutput(listOutput, sandboxName) as a fallback; change
it so the list result takes precedence: after obtaining listOutput and
sandboxListLine, if findSandboxListLine(listOutput, sandboxName) found the
sandbox then overwrite sandboxPhase with
parseSandboxPhaseFromListOutput(listOutput, sandboxName) regardless of whether
sandboxPhase was previously set; update the block around
deps.runCaptureOpenshell([... "sandbox", "list" ...]) to prefer the list-derived
phase (affecting sandboxPhase, listOutput, sandboxListLine) so
classifyDockerGpuPatchFailure(...) sees the up-to-date phase.

---

Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 3643-3648: The current call to
sandboxReadinessTracing.waitForCreatedSandboxReadyWithTrace (assigned to ready)
treats any non-true result as a timeout; update the wait function to return a
richer result (e.g., { ready: boolean, reason: 'timeout'|'error'|'other',
details?: any }) or otherwise surface why it failed (use isSandboxInErrorPhase
internally), then change the caller logic that inspects ready to branch: when
ready === true proceed, when reason === 'error' log/throw a clear "sandbox
entered error phase" message including details, and when reason === 'timeout'
keep the existing timeout fallback message. Ensure references to ready,
sandboxReadinessTracing.waitForCreatedSandboxReadyWithTrace, and
isSandboxInErrorPhase are used so the caller can distinguish terminal failures
from timeouts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 349a50ef-a1dc-4a4e-bdb7-a1afae5f6794

📥 Commits

Reviewing files that changed from the base of the PR and between 78909ec and 8174f66.

📒 Files selected for processing (6)
  • src/lib/onboard.ts
  • src/lib/onboard/docker-gpu-patch.test.ts
  • src/lib/onboard/docker-gpu-patch.ts
  • src/lib/onboard/docker-gpu-sandbox-create.ts
  • src/lib/onboard/sandbox-readiness-tracing.ts
  • src/lib/state/gateway.ts

Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/onboard/docker-gpu-patch.ts Outdated
The previous commit added the Error-phase short-circuit and proof-vs-
readiness distinction inline in onboard.ts, but `src/lib/onboard.ts` is
under a codebase-growth guardrail that blocks net growth in the top-
level entrypoint. Move the GPU readiness-failure print block and the
"skip proof on terminal phase" check into `DockerGpuSandboxCreatePatch`
helpers so onboard.ts shrinks while the diagnostics surface stays the
same.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lib/onboard/docker-gpu-sandbox-create.ts (1)

230-240: ⚡ Quick win

Consider including oldContainerId in the failure context for diagnostic completeness.

The buildFailureContext helper constructs a context object for readiness and proof failure diagnostics but does not include oldContainerId, whereas the inline context built for supervisor reconnect failures (line 156) does include it. This inconsistency means readiness/proof failure diagnostics will be missing the old container ID, which could be valuable for comparing before/after container state in diagnostic outputs like patched-container-state.json.

📋 Proposed fix to include oldContainerId
 function buildFailureContext(
   sandboxName: string,
   result: DockerGpuPatchResult | null,
 ): DockerGpuPatchFailureContext {
   return {
     sandboxName,
+    oldContainerId: result?.oldContainerId ?? null,
     newContainerId: result?.newContainerId ?? null,
     backupContainerName: result?.backupContainerName ?? null,
     selectedMode: result?.mode ?? null,
   };
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/docker-gpu-sandbox-create.ts` around lines 230 - 240, The
failure context returned by buildFailureContext is missing oldContainerId;
update the buildFailureContext function to include oldContainerId (e.g., set
oldContainerId: result?.oldContainerId ?? null) so it matches the inline
supervisor reconnect context and ensures DockerGpuPatchFailureContext
(type/interface) includes oldContainerId for diagnostics such as
patched-container-state.json.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/onboard/docker-gpu-sandbox-create.ts`:
- Around line 230-240: The failure context returned by buildFailureContext is
missing oldContainerId; update the buildFailureContext function to include
oldContainerId (e.g., set oldContainerId: result?.oldContainerId ?? null) so it
matches the inline supervisor reconnect context and ensures
DockerGpuPatchFailureContext (type/interface) includes oldContainerId for
diagnostics such as patched-container-state.json.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d58e524f-f855-4156-83bf-ebecfa0ae37f

📥 Commits

Reviewing files that changed from the base of the PR and between 8174f66 and f307b0a.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • src/lib/onboard/docker-gpu-sandbox-create.ts

yimoj added 2 commits May 28, 2026 05:55
…taleness

CodeRabbit and Codex feedback on NVIDIA#4316:

- The readiness-failure message in onboard.ts still reported "did not
  become ready within Xs" even after `waitForCreatedSandboxReadyWithTrace`
  learned to short-circuit on Error/Failed/CrashLoopBackOff. When that
  short-circuit fires, the message should call out the terminal phase
  instead of blaming the timeout.
- `captureDockerGpuPatchSandboxSnapshot` initially preferred the
  `sandbox get` phase and only fell back to `sandbox list`. The list
  view can be the fresher signal when a transition just happened, but
  blindly overriding the get phase would mask a terminal `get` behind
  a stale live `list` row. Rank phases as terminal > live >
  intermediate/none and let the higher-ranked signal win, with ties
  going to the list output as the broader gateway view.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
`buildFailureContext` was missing `oldContainerId`, but the supervisor-
reconnect failure path already passes the full before/after pair. Add
`oldContainerId` so the readiness and proof failure diagnostics
(patched-container-state.json, docker-network-summary.txt) get the
same original-container reference for comparison. CodeRabbit nitpick
on NVIDIA#4316.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@wscurran
Copy link
Copy Markdown
Contributor

@jyaunches jyaunches added R1 v0.0.56 Release target and removed v0.0.55 labels May 29, 2026
@cv cv enabled auto-merge (squash) May 30, 2026 01:04
@yimoj
Copy link
Copy Markdown
Contributor Author

yimoj commented May 30, 2026

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 30, 2026

✅ Actions performed

Reviews resumed.

@yimoj yimoj added v0.0.60 Release target and removed v0.0.60 Release target labels Jun 1, 2026
@cv cv added v0.0.57 Release target and removed v0.0.56 Release target labels Jun 1, 2026
cv and others added 2 commits June 1, 2026 15:35
… guardrail

The post-merge conflict resolution expanded the GPU patch deps literal
to multi-line form and inlined the readiness-failure message branching,
putting onboard.ts at +7 net lines and tripping codebase-growth-
guardrails.

Compress the deps literal back to one line and move the readiness-
failure message formatter into sandbox-readiness-tracing.ts where it
sits next to the CreatedSandboxReadinessResult contract it formats.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
auto-merge was automatically disabled June 1, 2026 23:21

Head branch was pushed to by a user without write access

@cv cv merged commit 451f26f into NVIDIA:main Jun 2, 2026
29 checks passed
@wscurran wscurran added area: packaging Packages, images, registries, installers, or distribution area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression platform: container Affects Docker, containerd, Podman, or images and removed area: packaging Packages, images, registries, installers, or distribution labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression platform: container Affects Docker, containerd, Podman, or images v0.0.57 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nemoclaw onboarding with gpu fails because sandbox transitions to Error phase after Docker GPU patch

4 participants