fix(cli): make channels add atomic when policy preset is missing#4577
fix(cli): make channels add atomic when policy preset is missing#4577laitingsheng wants to merge 8 commits into
Conversation
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughThe CLI now requires the channel's built-in network policy preset to load and validates parsed ChangesPreset Application Prerequisite for Channel Add
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 1 needs attention, 6 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
Selective E2E Results — ✅ All requested jobs passedRun: 26711436094
|
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
🌿 Preview your docs: https://nvidia-preview-pr-4577.docs.buildwithfern.com/nemoclaw |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/manage-sandboxes/messaging-channels.mdx`:
- Around line 167-168: The two-sentence paragraph describing `channels add`
should be split so each sentence is on its own line and the em dash before the
remediation instruction should be replaced with a period; edit the block that
starts with "`channels add` requires the matching built-in network policy preset
YAML..." so the first sentence ends after "sandbox is never left advertising a
channel without a matching network policy." and the second sentence begins with
"If the file is present..." and ends after "rebuild prompt.", then convert the
em dash in the remediation sentence after "apply itself fails" into a period and
put "restore the preset YAML and re-run `nemoclaw <sandbox> channels add
<channel>`." on its own line.
In `@docs/reference/commands.mdx`:
- Line 726: The sentence in the docs line about NemoClaw packs multiple
sentences onto one line and uses an em dash; split that paragraph into separate
lines with one sentence per line and replace the em dash with a period, e.g.
break the long line starting with "NemoClaw requires..." into three separate
sentences/lines describing (1) missing preset YAML behavior, (2) behavior when
the file is present and apply occurs, and (3) recovery instruction starting with
"restore the preset YAML and re-run `nemoclaw <name> channels add <channel>`".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0fd7308a-cb1a-4806-aa4d-dd5969adb849
📒 Files selected for processing (4)
docs/manage-sandboxes/messaging-channels.mdxdocs/reference/commands.mdxsrc/lib/actions/sandbox/policy-channel.tstest/channels-add-preset.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/actions/sandbox/policy-channel.ts
Selective E2E Results — ✅ All requested jobs passedRun: 26711992585
|
… style Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/manage-sandboxes/messaging-channels.mdx (1)
168-170: 💤 Low valueConsider varying sentence structure to improve readability.
Three consecutive sentences begin with "If", creating a repetitive pattern. While grammatically correct, varying the structure would improve flow.
💡 Possible rewrite
-If the file is missing, the command exits with a non-zero status before any token prompt, registry write, or rebuild prompt, so the sandbox is never left advertising a channel without a matching network policy. -If the file is present, `channels add` applies the preset to the sandbox automatically before the rebuild so the bridge has egress to its upstream API. -If the apply itself fails, the command exits with a non-zero status without prompting for rebuild. +When the file is missing, the command exits with a non-zero status before any token prompt, registry write, or rebuild prompt, so the sandbox is never left advertising a channel without a matching network policy. +When the file is present, `channels add` applies the preset to the sandbox automatically before the rebuild so the bridge has egress to its upstream API. +Should the apply itself fail, the command exits with a non-zero status without prompting for rebuild.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/manage-sandboxes/messaging-channels.mdx` around lines 168 - 170, Reword the three consecutive sentences starting with "If" to vary sentence openings and improve flow: keep the same facts about the missing file (command exits non-zero before prompts so sandbox isn't left advertising a channel), the present file behavior (`channels add` auto-applies the preset before rebuild so bridge has egress), and the failed-apply behavior (command exits non-zero without prompting for rebuild), but change sentence structure — e.g., start one sentence with the consequence, another with the condition using a different phrasing, and the third as a dependent clause — updating the lines that contain "If the file is missing", "`channels add` applies", and "If the apply itself fails".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/manage-sandboxes/messaging-channels.mdx`:
- Line 168: Rewrite the passive sentence in the given paragraph to active voice:
replace "so the sandbox is never left advertising a channel without a matching
network policy" with an active construction such as "so the command prevents the
sandbox from advertising a channel without a matching network policy." Update
the sentence that begins "If the file is missing, the command exits..." in the
messaging-channels.mdx content to use this active phrasing while preserving
meaning.
---
Nitpick comments:
In `@docs/manage-sandboxes/messaging-channels.mdx`:
- Around line 168-170: Reword the three consecutive sentences starting with "If"
to vary sentence openings and improve flow: keep the same facts about the
missing file (command exits non-zero before prompts so sandbox isn't left
advertising a channel), the present file behavior (`channels add` auto-applies
the preset before rebuild so bridge has egress), and the failed-apply behavior
(command exits non-zero without prompting for rebuild), but change sentence
structure — e.g., start one sentence with the consequence, another with the
condition using a different phrasing, and the third as a dependent clause —
updating the lines that contain "If the file is missing", "`channels add`
applies", and "If the apply itself fails".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 070d214f-3545-41cc-80ef-5431b1b49e8a
📒 Files selected for processing (4)
docs/manage-sandboxes/messaging-channels.mdxdocs/reference/commands.mdxsrc/lib/actions/sandbox/policy-channel.tstest/channels-add-preset.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/lib/actions/sandbox/policy-channel.ts
- docs/reference/commands.mdx
…ay rollback Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26712645489
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docs/manage-sandboxes/messaging-channels.mdx (1)
170-170: ⚡ Quick winAdd article before "rebuild" for consistency.
The phrase "without prompting for rebuild" treats "rebuild" as an uncountable noun, but elsewhere in this document "rebuild" consistently takes an article as a countable noun: "the rebuild" (lines 169, 172, 177, 236, 238), "a manual rebuild" (line 184).
📝 Suggested fix
-When the apply step itself fails after the registry write, NemoClaw rolls back the bridge providers, the `messagingChannels` entry, and the persisted credentials, then exits without prompting for rebuild. +When the apply step itself fails after the registry write, NemoClaw rolls back the bridge providers, the `messagingChannels` entry, and the persisted credentials, then exits without prompting for a rebuild.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/manage-sandboxes/messaging-channels.mdx` at line 170, The sentence containing "without prompting for rebuild" should include the definite article to match the document's consistent usage; update that phrase to "without prompting for the rebuild" in the same sentence that starts "When the apply step itself fails after the registry write, NemoClaw rolls back the bridge providers, the `messagingChannels` entry, and the persisted credentials, then exits without prompting for rebuild" so it reads "...then exits without prompting for the rebuild."
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@docs/manage-sandboxes/messaging-channels.mdx`:
- Line 170: The sentence containing "without prompting for rebuild" should
include the definite article to match the document's consistent usage; update
that phrase to "without prompting for the rebuild" in the same sentence that
starts "When the apply step itself fails after the registry write, NemoClaw
rolls back the bridge providers, the `messagingChannels` entry, and the
persisted credentials, then exits without prompting for rebuild" so it reads
"...then exits without prompting for the rebuild."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d82988eb-fc01-4d45-af25-0438299b37ab
📒 Files selected for processing (4)
docs/manage-sandboxes/messaging-channels.mdxdocs/reference/commands.mdxsrc/lib/actions/sandbox/policy-channel.tstest/channels-add-preset.test.ts
✅ Files skipped from review due to trivial changes (1)
- docs/reference/commands.mdx
🚧 Files skipped from review as they are similar to previous changes (2)
- src/lib/actions/sandbox/policy-channel.ts
- test/channels-add-preset.test.ts
Selective E2E Results — ✅ All requested jobs passedRun: 26713016536
|
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26713559497
|
…d re-add Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/actions/sandbox/policy-channel.ts (1)
810-828:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake
--dry-runhonor the new preset preflight.Right now
--dry-runreturns beforeloadPreset/parsePresetPolicyKeysrun, so it exits 0 even when the realchannels addwould now fail on a missing or malformed built-in preset. That makes the dry-run contract lie about whether the command is actually runnable.Suggested change
- if (dryRun) { - console.log(` --dry-run: would enable channel '${canonical}' for '${sandboxName}'.`); - return; - } - const presetContent = policies.loadPreset(canonical); const presetPolicyKeys = presetContent === null ? [] : policies.parsePresetPolicyKeys(presetContent); if (presetContent === null || presetPolicyKeys.length === 0) { if (presetContent !== null && presetPolicyKeys.length === 0) { console.error( ` Preset YAML for channel '${canonical}' has no parseable entries under 'network_policies:'.`, ); } console.error( ` Restore the preset YAML and re-run: ${CLI_NAME} ${sandboxName} channels add ${canonical}`, ); process.exit(1); } + + if (dryRun) { + console.log(` --dry-run: would enable channel '${canonical}' for '${sandboxName}'.`); + return; + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/actions/sandbox/policy-channel.ts` around lines 810 - 828, The dry-run branch returns early and skips the new preset preflight, so dryRun in the channel add flow can report success even when policies.loadPreset(canonical) or policies.parsePresetPolicyKeys(...) would fail; modify the control flow in the function handling the --dry-run (check the dryRun variable) so that it still calls policies.loadPreset(canonical) and policies.parsePresetPolicyKeys(presetContent) and performs the same validation/error handling (including the console.error messages and process.exit(1) path using sandboxName, canonical, and CLI_NAME) but only skips the actual side-effect of enabling the channel—i.e., run the preset preflight before returning for dryRun.
🧹 Nitpick comments (1)
test/channels-add-preset.test.ts (1)
1374-1388: ⚡ Quick winValidate preset contents, not just file presence.
This guard still passes if
<channel>.yamlexists but is malformed or has no parseablenetwork_policies, which are the exact failure modesaddSandboxChannel()now treats as fatal. Parsing each on-disk preset here would make the source-of-truth test actually cover the new contract.♻️ Tighten the source-of-truth check
it("every channel registered in KNOWN_CHANNELS ships a matching preset YAML on disk", () => { const { knownChannelNames } = require(path.join(repoRoot, "dist", "lib", "sandbox", "channels.js")) as { knownChannelNames: () => string[]; }; + const { parsePresetPolicyKeys } = require(path.join(repoRoot, "dist", "lib", "policy", "index.js")) as { + parsePresetPolicyKeys: (content: string) => string[]; + }; const presetDir = path.join(repoRoot, "nemoclaw-blueprint", "policies", "presets"); const missing: string[] = []; for (const name of knownChannelNames()) { const file = path.join(presetDir, `${name}.yaml`); - if (!fs.existsSync(file)) missing.push(file); + if (!fs.existsSync(file)) { + missing.push(file); + continue; + } + const content = fs.readFileSync(file, "utf8"); + if (parsePresetPolicyKeys(content).length === 0) { + missing.push(`${file} (no parseable network_policies)`); + } } assert.deepEqual( missing, [], `every channel in KNOWN_CHANNELS must have a matching preset YAML; missing: ${missing.join(", ")}`,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/channels-add-preset.test.ts` around lines 1374 - 1388, The test currently only checks that a preset file exists for each channel; instead, load and parse each YAML preset from presetDir for every name from knownChannelNames(), and assert that parsing succeeds and the resulting object contains a non-empty network_policies (or the exact key/shape addSandboxChannel() expects). Update the loop that builds missing to also read fs.readFileSync(file, "utf8"), parse with the same YAML loader used in production (or require the preset parser used by addSandboxChannel()), and push file onto missing (or a new failures list) if parsing throws or network_policies is absent/empty; finally assert failures is empty with a helpful message referencing addSandboxChannel()'s contract.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/actions/sandbox/policy-channel.ts`:
- Around line 924-945: The rollback currently calls
onboardProviders.upsertMessagingProviders which can call process.exit on failure
and therefore prevents the registry restore; change this to a non-exiting,
best-effort provider-restore path (e.g. call a non-exiting variant like
onboardProviders.upsertMessagingProvidersBestEffort(...) or pass a { bestEffort:
true } flag to upsertMessagingProviders) inside a try/catch/finally so that any
gateway restore errors are caught and result in
residual.push("gateway-providers"), and ensure registry.updateSandbox(...) is
always executed (move it to finally or after the catch) so the prior registry
state is restored even when the gateway restore flakes.
---
Outside diff comments:
In `@src/lib/actions/sandbox/policy-channel.ts`:
- Around line 810-828: The dry-run branch returns early and skips the new preset
preflight, so dryRun in the channel add flow can report success even when
policies.loadPreset(canonical) or policies.parsePresetPolicyKeys(...) would
fail; modify the control flow in the function handling the --dry-run (check the
dryRun variable) so that it still calls policies.loadPreset(canonical) and
policies.parsePresetPolicyKeys(presetContent) and performs the same
validation/error handling (including the console.error messages and
process.exit(1) path using sandboxName, canonical, and CLI_NAME) but only skips
the actual side-effect of enabling the channel—i.e., run the preset preflight
before returning for dryRun.
---
Nitpick comments:
In `@test/channels-add-preset.test.ts`:
- Around line 1374-1388: The test currently only checks that a preset file
exists for each channel; instead, load and parse each YAML preset from presetDir
for every name from knownChannelNames(), and assert that parsing succeeds and
the resulting object contains a non-empty network_policies (or the exact
key/shape addSandboxChannel() expects). Update the loop that builds missing to
also read fs.readFileSync(file, "utf8"), parse with the same YAML loader used in
production (or require the preset parser used by addSandboxChannel()), and push
file onto missing (or a new failures list) if parsing throws or network_policies
is absent/empty; finally assert failures is empty with a helpful message
referencing addSandboxChannel()'s contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8e731a0f-5c07-40f2-adb2-85f42e707d2a
📒 Files selected for processing (5)
docs/manage-sandboxes/messaging-channels.mdxdocs/reference/commands.mdxsrc/lib/actions/sandbox/policy-channel.tssrc/lib/policy/index.tstest/channels-add-preset.test.ts
✅ Files skipped from review due to trivial changes (1)
- docs/manage-sandboxes/messaging-channels.mdx
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/reference/commands.mdx
Selective E2E Results — ✅ All requested jobs passedRun: 26714552896
|
…ilure safety Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26715309625
|
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26715901484
|
Summary
nemoclaw <sandbox> channels add <channel>reachedprocess.exit(0)and wrote the channel intosandboxes.jsoneven when the matching policy preset YAML was missing on disk, leaving the sandbox advertising a bridge that had no network policy. The add flow now parses the preset YAML (file present andnetwork_policies:section yields at least one entry viaparsePresetPolicyKeys) before--dry-runreturns, before snapshotting registry/credentials state, and before token acquisition, and rolls back the gateway/registry/credential mutations when the post-write preset application fails. Rollback is best-effort: gateway-side steps that fail (provider detach/delete, or re-upsert during a re-add restore) print aRollback could not fully clean <surfaces>warning so the operator can clean up manually.Related Issue
Fixes #4548
Changes
addSandboxChannelruns thepolicies.loadPreset(canonical)+policies.parsePresetPolicyKeys(...)preflight before--dry-runreturns, before snapshotting registry/credentials state, and before token acquisition, so a missing, unloadable, or unparseable preset (no usable entries undernetwork_policies:) exits non-zero with a restore-and-re-run hint on stderr without prompting the operator or contacting the gateway.applyChannelAddToGatewayAndRegistry. On apply failure the newrollbackChannelAddhelper handles two cases. A fresh add clears credentials then callsapplyChannelRemoveToGatewayAndRegistry(best-effort, prints residual warning before returning). A failed re-add of an already-enabled channel restores the prior registry entry first (so a later gateway-upsert exit cannot orphan the channel), prints thegateway-providersresidual warning, then restores on-disk credentials and best-effort re-upserts the prior bridge providers.applyChannelRemoveToGatewayAndRegistrygrew an optional{ bestEffort: true }mode that downgrades each gateway-sideprocess.exit(1)to a log-and-continue so the registry rollback finishes even when openshell detach or delete fails; returns{ ok, residual }.applyChannelPresetIfAvailableno longer short-circuits viapolicies.listPresets(); the filesystem scan conflated "channel has no preset" with "preset file missing on disk", which is what let the original bug pass silently.docs/manage-sandboxes/messaging-channels.mdx,docs/reference/commands.mdx): describe the missing-or-unparseable-preset abort, the best-effort fresh-add rollback with explicit residual warning, the re-add restoration semantics, and narrow--dry-runto "validate channel name and matching policy preset without prompting for credentials, contacting the gateway, or rebuilding"; one sentence per line per docs style guide.test/channels-add-preset.test.ts: addedpresetFileMissing,presetMissingNetworkPolicies,presetMalformedYamlfixture toggles plus regression tests covering each abort/rollback path (missing YAML for both telegram and whatsapp/QR, missingnetwork_policies:section, malformed YAML body, applyPreset-after-write rollback, rollback resilience when openshell detach fails, prior-config restoration on a failed re-add, and registry restoration ordering when the re-upsert during re-add rollback itself throws); the source-of-truth test now parses every preset viaparsePresetPolicyKeysinstead of just checking file existence.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Documentation
Tests