fix(onboard): escape Ollama loop on daemon/runner failure (#4365)#4401
Conversation
|
Caution Review failedPull request was closed or merged during review Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughClassifies Ollama runner/daemon crashes at inference time (isOllamaRunnerCrash), surfaces a daemonFailure flag through validation and proxy types, detects pinned Ollama providers, and centralizes onboarding probe-failure handling to exit, abort, or return to provider selection. ChangesOllama Daemon Failure Detection & Onboarding Escape
Sequence Diagram(s)sequenceDiagram
participant validateOllamaModel
participant isOllamaRunnerCrash
participant ValidationResult
participant prepareOllamaModel
validateOllamaModel->>isOllamaRunnerCrash: check errText
isOllamaRunnerCrash-->>validateOllamaModel: true if runner/daemon crash pattern
validateOllamaModel->>ValidationResult: set daemonFailure: true
ValidationResult-->>prepareOllamaModel: propagate daemonFailure
sequenceDiagram
participant prepareOllamaModel as Prepare
participant handleOllamaProbeFailure as Handler
participant isOllamaProviderPinned as Pinned
participant selectAndValidateOllamaModel as Caller
Prepare->>Handler: probe (ok, message, daemonFailure?)
Handler->>Pinned: check pinned state
alt pinned
Handler-->>Caller: process.exit(1)
else not pinned
alt non-interactive
Handler-->>Caller: abort (process.exit)
else interactive
Handler-->>Caller: "back-to-selection" or "continue"
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard/ollama-startup.ts (1)
94-95:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPinned-provider error text is overly specific to
ollama.On Line 94, the message always reports
NEMOCLAW_PROVIDER=ollama, but the pinned set also includes other Ollama-routing keys (Lines 23-28). This can mislead debugging when a different pinned key is set.Suggested fix
- " NEMOCLAW_PROVIDER=ollama is pinned but Ollama is unreachable; refusing to loop on provider selection.", + " NEMOCLAW_PROVIDER pins an Ollama onboarding path but Ollama is unreachable; refusing to loop on provider selection.",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/ollama-startup.ts` around lines 94 - 95, The error message in the pinned-provider check is hardcoded to "NEMOCLAW_PROVIDER=ollama" which is misleading; change the message to interpolate the actual pinned key/value used (read from NEMOCLAW_PROVIDER or the local variable holding the pinned provider, e.g., pinnedProvider/pinnedKey) so it reports the real pinned setting when Ollama is unreachable rather than always saying "ollama". Ensure the updated text is used where the pinned-provider check occurs (the startup/pinning logic in ollama-startup.ts).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3927-3952: Extract the daemon-failure branch that checks
probe.daemonFailure into a new helper module (e.g.,
src/lib/onboard/ollama-startup.ts) that exports a function like
handleOllamaDaemonFailure(probe, selectedModel, context) which contains the
isOllamaProviderPinned() check, the console.error/process.exit call, the
isNonInteractive()/abortNonInteractive call, the console messages, and returns {
outcome: "back-to-selection" } as before; in src/lib/onboard.ts replace the full
branch with a compact call site that invokes handleOllamaDaemonFailure(probe,
selectedModel, /* pass any needed helpers/context */) and forwards its result,
preserving behavior and imports but reducing onboard.ts line growth.
In `@src/lib/onboard/ollama-startup.test.ts`:
- Around line 117-144: The test currently calls restore() at the end and can
leak environment/module state if an assertion throws; wrap the body that mutates
process.env.NEMOCLAW_PROVIDER and calls isOllamaProviderPinned() inside a
try/finally and move restore() into the finally block so restore() always runs;
specifically update the test case that iterates over the cases (where
process.env.NEMOCLAW_PROVIDER is set/deleted) to ensure restore() is invoked in
the finally, leaving the loop and expect calls unchanged.
---
Outside diff comments:
In `@src/lib/onboard/ollama-startup.ts`:
- Around line 94-95: The error message in the pinned-provider check is hardcoded
to "NEMOCLAW_PROVIDER=ollama" which is misleading; change the message to
interpolate the actual pinned key/value used (read from NEMOCLAW_PROVIDER or the
local variable holding the pinned provider, e.g., pinnedProvider/pinnedKey) so
it reports the real pinned setting when Ollama is unreachable rather than always
saying "ollama". Ensure the updated text is used where the pinned-provider check
occurs (the startup/pinning logic in ollama-startup.ts).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fbb6e725-f0c2-4869-9298-3235d82ef7a4
📒 Files selected for processing (7)
src/lib/inference/local.test.tssrc/lib/inference/local.tssrc/lib/inference/ollama/proxy.tssrc/lib/onboard.tssrc/lib/onboard/ollama-startup.test.tssrc/lib/onboard/ollama-startup.tstest/onboard-ollama-autostart.test.ts
| it("isOllamaProviderPinned recognises every Ollama-using provider key (#4365)", () => { | ||
| // Mirror the matching logic in providers.getNonInteractiveProvider so a | ||
| // user setting NEMOCLAW_PROVIDER to any of the Ollama-using keys still | ||
| // triggers the pinned-provider escape paths. Without this, a casing | ||
| // variant or an install-* pin would let the wizard return to the | ||
| // selection menu and immediately re-pin to the same Ollama action, | ||
| // reintroducing the #4365 loop. | ||
| const cases: Array<[string | undefined, boolean]> = [ | ||
| ["ollama", true], | ||
| ["OLLAMA", true], | ||
| [" Ollama ", true], | ||
| [" ollama\n", true], | ||
| ["install-ollama", true], | ||
| ["INSTALL-OLLAMA", true], | ||
| ["install-windows-ollama", true], | ||
| ["start-windows-ollama", true], | ||
| ["build", false], | ||
| ["openai", false], | ||
| ["", false], | ||
| [undefined, false], | ||
| ]; | ||
| for (const [value, expected] of cases) { | ||
| if (value === undefined) delete process.env.NEMOCLAW_PROVIDER; | ||
| else process.env.NEMOCLAW_PROVIDER = value; | ||
| expect(isOllamaProviderPinned(), `pin=${JSON.stringify(value)}`).toBe(expected); | ||
| } | ||
| restore(); | ||
| }); |
There was a problem hiding this comment.
Use try/finally for cleanup in this test case.
If an assertion fails in this block, restore() is skipped and can leak env/module state into subsequent tests.
Proposed fix
it("isOllamaProviderPinned recognises every Ollama-using provider key (`#4365`)", () => {
@@
- for (const [value, expected] of cases) {
- if (value === undefined) delete process.env.NEMOCLAW_PROVIDER;
- else process.env.NEMOCLAW_PROVIDER = value;
- expect(isOllamaProviderPinned(), `pin=${JSON.stringify(value)}`).toBe(expected);
- }
- restore();
+ try {
+ for (const [value, expected] of cases) {
+ if (value === undefined) delete process.env.NEMOCLAW_PROVIDER;
+ else process.env.NEMOCLAW_PROVIDER = value;
+ expect(isOllamaProviderPinned(), `pin=${JSON.stringify(value)}`).toBe(expected);
+ }
+ } finally {
+ restore();
+ }
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| it("isOllamaProviderPinned recognises every Ollama-using provider key (#4365)", () => { | |
| // Mirror the matching logic in providers.getNonInteractiveProvider so a | |
| // user setting NEMOCLAW_PROVIDER to any of the Ollama-using keys still | |
| // triggers the pinned-provider escape paths. Without this, a casing | |
| // variant or an install-* pin would let the wizard return to the | |
| // selection menu and immediately re-pin to the same Ollama action, | |
| // reintroducing the #4365 loop. | |
| const cases: Array<[string | undefined, boolean]> = [ | |
| ["ollama", true], | |
| ["OLLAMA", true], | |
| [" Ollama ", true], | |
| [" ollama\n", true], | |
| ["install-ollama", true], | |
| ["INSTALL-OLLAMA", true], | |
| ["install-windows-ollama", true], | |
| ["start-windows-ollama", true], | |
| ["build", false], | |
| ["openai", false], | |
| ["", false], | |
| [undefined, false], | |
| ]; | |
| for (const [value, expected] of cases) { | |
| if (value === undefined) delete process.env.NEMOCLAW_PROVIDER; | |
| else process.env.NEMOCLAW_PROVIDER = value; | |
| expect(isOllamaProviderPinned(), `pin=${JSON.stringify(value)}`).toBe(expected); | |
| } | |
| restore(); | |
| }); | |
| it("isOllamaProviderPinned recognises every Ollama-using provider key (`#4365`)", () => { | |
| // Mirror the matching logic in providers.getNonInteractiveProvider so a | |
| // user setting NEMOCLAW_PROVIDER to any of the Ollama-using keys still | |
| // triggers the pinned-provider escape paths. Without this, a casing | |
| // variant or an install-* pin would let the wizard return to the | |
| // selection menu and immediately re-pin to the same Ollama action, | |
| // reintroducing the `#4365` loop. | |
| const cases: Array<[string | undefined, boolean]> = [ | |
| ["ollama", true], | |
| ["OLLAMA", true], | |
| [" Ollama ", true], | |
| [" ollama\n", true], | |
| ["install-ollama", true], | |
| ["INSTALL-OLLAMA", true], | |
| ["install-windows-ollama", true], | |
| ["start-windows-ollama", true], | |
| ["build", false], | |
| ["openai", false], | |
| ["", false], | |
| [undefined, false], | |
| ]; | |
| try { | |
| for (const [value, expected] of cases) { | |
| if (value === undefined) delete process.env.NEMOCLAW_PROVIDER; | |
| else process.env.NEMOCLAW_PROVIDER = value; | |
| expect(isOllamaProviderPinned(), `pin=${JSON.stringify(value)}`).toBe(expected); | |
| } | |
| } finally { | |
| restore(); | |
| } | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/onboard/ollama-startup.test.ts` around lines 117 - 144, The test
currently calls restore() at the end and can leak environment/module state if an
assertion throws; wrap the body that mutates process.env.NEMOCLAW_PROVIDER and
calls isOllamaProviderPinned() inside a try/finally and move restore() into the
finally block so restore() always runs; specifically update the test case that
iterates over the cases (where process.env.NEMOCLAW_PROVIDER is set/deleted) to
ensure restore() is invoked in the finally, leaving the loop and expect calls
unchanged.
When Ollama is unreachable or its model runner crashes mid-onboarding, the wizard kept the user inside Ollama-only loops. The interactive default autostart-timeout path silently returned to provider selection, and a "model runner has unexpectedly stopped" probe error re-prompted for another Ollama tag that would hit the same daemon failure. Detect Ollama runner crashes during the local probe via a new `daemonFailure` flag on ValidationResult and isOllamaRunnerCrash matcher. selectAndValidateOllamaModel escapes to provider selection on daemonFailure; pinned-provider (NEMOCLAW_PROVIDER of any Ollama key, normalized for case/whitespace) and non-interactive runs exit instead of looping. The autostart-timeout path now prints a clear "pick a non-Ollama provider" hint before returning to selection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
The codebase-growth-guardrails CI check requires src/lib/onboard.ts to stay net-neutral or smaller per PR. The initial fix added 26 lines of daemon-failure handling inline. Move that logic into the focused src/lib/onboard/ollama-probe-failure.ts module (allowed under EXTRACTION_DIR) behind a single dispatcher, handleOllamaProbeFailure, that the selectAndValidateOllamaModel inner loop calls. The dispatcher consolidates both the existing model-level failure path and the new (NVIDIA#4365) daemon-failure escape — pinned-provider exit, non-interactive abort, and interactive back-to-selection — so the onboard.ts diff is now net negative versus main. Behavior is unchanged from the previous commit; added a focused unit-test surface for each of the four branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
CodeRabbit flagged that the pinned-provider error string hard-coded "NEMOCLAW_PROVIDER=ollama" even though isOllamaProviderPinned also matches install-ollama, install-windows-ollama, and start-windows-ollama (plus case/ whitespace variants of any of those). When the wizard exits on a pinned- Ollama unreachable / runner-crash failure, the literal message could disagree with the user's actual env value and mislead debugging. Replace the hard-coded "NEMOCLAW_PROVIDER=ollama is pinned" prefix with "NEMOCLAW_PROVIDER pins onboarding to Ollama" so the message stays accurate across every Ollama-routing pin variant. Test assertions updated to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
47c1c4b to
02fffaa
Compare
Summary
When Ollama is stopped before
nemoclaw onboardruns or its model runner crashes mid-onboarding, the wizard kept the user inside Ollama-only loops (issue #4365):model runner has unexpectedly stoppedprobe error insideselectAndValidateOllamaModelre-prompted for another Ollama tag that would crash the same broken daemon.This change detects Ollama runner crashes during the local probe (
isOllamaRunnerCrash+ newdaemonFailureflag onValidationResult) and:NEMOCLAW_PROVIDERpins any Ollama-using key (ollama,install-ollama,install-windows-ollama,start-windows-ollama, normalized for case/whitespace via the newisOllamaProviderPinnedhelper, mirroringgetNonInteractiveProvider).abortNonInteractivefor non-interactive runs.Pick a non-Ollama provider in the next menu — re-selecting Local Ollama would hit the same timeouthint after an autostart timeout in interactive default mode.The
--no-ollama-autostartfallback behavior is preserved.Test plan
npx vitest run src/lib/inference/local.test.ts src/lib/onboard/ollama-startup.test.ts test/onboard-ollama-autostart.test.ts— 77/77 pass, including:isOllamaRunnerCrashmatcher /daemonFailureflag unit tests inlocal.test.ts.src/lib/onboard/ollama-startup.test.tscovering autostart-timeout steer hint, pinned-provider exit, ready short-circuit, andisOllamaProviderPinnednormalization (casing/whitespace + all four pinned keys).test/onboard-ollama-autostart.test.tsdriving the runner-crash escape end-to-end through the wizard (and Scenario H exercising the casing-variant pin).cd nemoclaw && npm test— plugin tests green.codex review --uncommitted— clean (two earlier P2 findings fixed: provider normalization and broader Ollama key set).🤖 Generated with Claude Code
Signed-off-by: Yimo Jiang yimoj@nvidia.com
Summary by CodeRabbit
New Features
Tests