Skip to content

feat: Add system subclusters and kernel facet service#803

Open
rekmarks wants to merge 47 commits intomainfrom
rekm/system-vats-redux
Open

feat: Add system subclusters and kernel facet service#803
rekmarks wants to merge 47 commits intomainfrom
rekm/system-vats-redux

Conversation

@rekmarks
Copy link
Member

@rekmarks rekmarks commented Feb 3, 2026

Adds support for "system subclusters" - statically declared subclusters that are launched at kernel initialization and persist across kernel restarts. System subclusters can receive powerful kernel services not available to normal vats via the KernelFacet. In summary:

  • System subcluster persistence: System subclusters now persist across kernel restarts like regular subclusters
    • Orphaned system subclusters (config removed) are deleted on boot without starting their vats
  • KernelFacet: Privileged API for system vats to interact with the kernel (launch subclusters, queue messages, etc.)
    • Basically, an expanded version of the previously introduced "kernel facade" (for background CapTP purposes)
    • Introducing this as a service necessitated eagerly dispatching kernel service invocations, which would otherwise deadlock the current crank if the called kernel method waited for the current crank to complete
  • Controller vat: Moved Omnium controllers to a TypeScript controller-vat that runs inside the kernel
  • Globals endowments: Added globals config to allow vats to receive specific globals (like Date) in their SES Compartment
  • Kernel subcluster representation: The kernel now stores a name -> id mapping for subcluster vats, which facilitates identifying the bootstrap vat of a launched subcluster
    • Previously, only an array of vat ids was stored

Note

High Risk
Touches core kernel initialization, persistence, service invocation, and CapTP bootstrap semantics; mistakes could break subcluster restore/cleanup, message routing, or introduce deadlocks/race conditions during kernel service calls.

Overview
Adds system subclusters to @metamask/ocap-kernel: a new systemSubclusters init option persists a name→subcluster mapping in the store, restores or deletes orphaned system subclusters on boot (without starting their vats), launches new ones after the run-queue starts, exposes getSystemSubclusterRoot, and clears this state on reset.

Replaces the CapTP-exposed KernelFacade with a KernelFacet kernel service (makeKernelFacet), wiring CapTP bootstrap to kernel.provideFacet() and making kernel-service invocation non-blocking (promise-chained) to avoid crank deadlocks; this cascades into API changes like SubclusterLaunchResult.rootKref (renamed from bootstrapRootKref) and a subcluster vats record keyed by vat name.

Expands vat configuration with an allowlisted globals array (e.g., Date) that injects selected globals into worker endowments, and adds a Vite define for process.env.NODE_ENV when bundling vats.

Updates Omnium Gatherum to run controllers via a kernel-launched controller vat/system subcluster and route caplet operations through queueMessage, adjusts extension/runtime typings and E2E expectations for shifted krefs, and adds NodeJS E2E coverage for system subcluster lifecycle/persistence/reload behavior.

Written by Cursor Bugbot for commit 31868d8. This will update automatically on new commits. Configure here.

@rekmarks rekmarks force-pushed the rekm/system-vats-redux branch from f58f8c0 to 079a10a Compare February 3, 2026 20:08
@rekmarks
Copy link
Member Author

rekmarks commented Feb 3, 2026

@cursor review

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 77.87%
⬇️ -0.31%
6227 / 7996
🔵 Statements 77.82%
⬇️ -0.32%
6327 / 8130
🔵 Functions 76.2%
⬇️ -0.42%
1576 / 2068
🔵 Branches 77.74%
⬇️ -0.64%
2285 / 2939
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
packages/cli/src/vite/vat-bundler.ts 0%
🟰 ±0%
0%
🟰 ±0%
0%
🟰 ±0%
0%
🟰 ±0%
18-60
packages/extension/src/global.d.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/kernel-browser-runtime/src/background-captp.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/kernel-browser-runtime/src/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/kernel-browser-runtime/src/types.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/kernel-browser-runtime/src/kernel-worker/kernel-worker.ts 0%
🟰 ±0%
0%
🟰 ±0%
0%
🟰 ±0%
0%
🟰 ±0%
26-111
packages/kernel-browser-runtime/src/kernel-worker/captp/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/kernel-browser-runtime/src/kernel-worker/captp/kernel-captp.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/kernel-browser-runtime/src/rpc-handlers/launch-subcluster.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/Kernel.ts 92.47%
⬇️ -1.03%
82.75%
⬇️ -2.25%
90.47%
⬇️ -1.83%
92.47%
⬇️ -1.03%
121, 246-249, 266, 414, 482, 552, 596
packages/ocap-kernel/src/KernelRouter.ts 90.16%
🟰 ±0%
75.38%
🟰 ±0%
100%
🟰 ±0%
90.16%
🟰 ±0%
110, 163, 175, 225, 252-261, 268, 314, 329, 332
packages/ocap-kernel/src/KernelServiceManager.ts 93.87%
⬇️ -6.13%
88.88%
⬇️ -11.12%
100%
🟰 ±0%
93.87%
⬇️ -6.13%
178-183
packages/ocap-kernel/src/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/kernel-facet.ts 100% 100% 100% 100%
packages/ocap-kernel/src/types.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/store/methods/subclusters.ts 98.8%
⬇️ -1.20%
86.66%
⬇️ -2.62%
96.15%
⬇️ -3.85%
98.76%
⬇️ -1.24%
258
packages/ocap-kernel/src/vats/SubclusterManager.ts 95.62%
⬇️ -2.88%
83.92%
⬇️ -10.52%
100%
🟰 ±0%
95.56%
⬇️ -2.94%
187-189, 193-195, 259, 311-313, 486, 491-493, 497-499
packages/ocap-kernel/src/vats/VatManager.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/ocap-kernel/src/vats/VatSupervisor.ts 72.72%
⬇️ -1.92%
42.42%
⬇️ -2.40%
58.33%
🟰 ±0%
72.72%
⬇️ -1.92%
126, 137, 145, 183, 221-225, 236, 245-246, 267-269, 272, 276-278, 310-312, 329, 346-354
packages/omnium-gatherum/src/background.ts 0%
🟰 ±0%
0%
🟰 ±0%
0%
🟰 ±0%
0%
🟰 ±0%
21-275
packages/omnium-gatherum/src/global.d.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/omnium-gatherum/src/offscreen.ts 0%
🟰 ±0%
0%
🟰 ±0%
0%
🟰 ±0%
0%
🟰 ±0%
20-122
packages/omnium-gatherum/src/controllers/index.ts 100%
⬆️ +100.00%
100%
🟰 ±0%
100%
⬆️ +100.00%
100%
⬆️ +100.00%
packages/omnium-gatherum/src/controllers/caplet/caplet-controller.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/omnium-gatherum/src/controllers/caplet/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/omnium-gatherum/src/controllers/caplet/types.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/omnium-gatherum/src/controllers/storage/index.ts 100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
100%
🟰 ±0%
packages/omnium-gatherum/src/vats/controller-vat.ts 0% 0% 0% 0% 58-196
packages/omnium-gatherum/src/vats/storage/baggage-adapter.ts 100% 100% 100% 100%
Generated in workflow #3598 for commit 48ff5d1 by the Vitest Coverage Report Action

@rekmarks
Copy link
Member Author

rekmarks commented Feb 3, 2026

@cursor review

@rekmarks rekmarks force-pushed the rekm/system-vats-redux branch from 2d79ecf to 0a66993 Compare February 5, 2026 00:01
@rekmarks
Copy link
Member Author

rekmarks commented Feb 5, 2026

@cursor review

@rekmarks rekmarks changed the title feat: Add system vats support with KernelFacet feat: Add system subclusters and kernel facet service Feb 5, 2026
@rekmarks rekmarks marked this pull request as ready for review February 5, 2026 02:20
@rekmarks rekmarks requested a review from a team as a code owner February 5, 2026 02:20
Comment on lines +23 to +28
// TODO: Remove this define block and add a process shim to VatSupervisor
// workerEndowments instead. This injects into ALL bundles but is only needed
// for libraries like immer that check process.env.NODE_ENV.
define: {
'process.env.NODE_ENV': JSON.stringify('production'),
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to address in a follow-up. Requires changes to how we bundle vats with Vite best not added to this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +124 to +127
if (isConsoleForwardMessage(message)) {
handleConsoleForwardMessage(message);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We accidentally broke omnium when we introduced console forwarding because we forgot to instrument its background and offscreen for it, causing the stream handler to blow up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we need some omnium e2e tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +298 to +305
// Map of allowed global names to their values
const allowedGlobals: Record<string, unknown> = {
Date: globalThis.Date,
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See: #813

@rekmarks rekmarks requested a review from sirtimid February 6, 2026 20:46
@rekmarks
Copy link
Member Author

rekmarks commented Feb 6, 2026

Stack


Managed by gh-stack

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

rekmarks and others added 3 commits February 6, 2026 15:03
Implement system vats that are launched at kernel initialization and have
access to privileged kernel services. Key changes:

- Add SystemVatConfig type and getSystemVatRoot method to Kernel
- Launch system vats after queue starts to avoid deadlock
- Terminate and relaunch existing system vat subclusters on restart
- Add bootstrap-vat.js for Omnium system services with CapletController
- Add baggage-backed storage adapter for vat persistence
- Pass systemVats config via URL params from offscreen to kernel worker
- Update background.ts to use system vat for caplet operations
- Add process.env.NODE_ENV replacement in vat bundler for SES compatibility
- Simplify kernel-facet.ts by removing SystemVatManager
- Add duplicate name check in KernelServiceManager.registerKernelServiceObject

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Rename bootstrap-vat.js to bootstrap-vat.ts with full type annotations
- Export Baggage type from baggage-adapter.ts
- Make logger optional throughout controller hierarchy
- Simplify defineMethods to take array of method names instead of object map
- Update background.ts to use simplified method names (install, uninstall, etc.)
- Update package.json build script to reference .ts file

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
rekmarks and others added 29 commits February 6, 2026 15:03
- Add proper TypeScript types for KernelFacet, BootstrapServices, VatParameters
- Use types from @MetaMask/ocap-kernel (Baggage, ClusterConfig, etc.)
- Remove JSDoc type annotations in favor of TypeScript types

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Throw errors instead of silently recovering when a persisted system
subcluster has an empty vats array or missing root object. These
conditions indicate database corruption and should fail fast.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…estart

The controller vat created a new PromiseKit on every initialization but
only resolved it in bootstrap(). Since bootstrap() is not called during
resuscitation (kernel restart), all caplet methods would hang.

Fix by restoring kernelFacet from baggage and initializing the
CapletController immediately in buildRootObject when available.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a TODO comment noting that the define block should be replaced
with a process shim in VatSupervisor workerEndowments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix baggage adapter to use actual delete() instead of null tombstones
- Rename root to rootObject in KernelFacetLaunchResult for clarity
- Add subclusterId format validation in Kernel.getSubcluster()
- Add duplicate system subcluster name detection at kernel init
- Clarify comment in controller-vat resuscitation path

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Async kernel service invocations can cause multiple concurrent connection
attempts when processing many messages, which triggers the default rate
limiter. Increase maxConnectionAttemptsPerMinute to avoid interference
with the queue limit test.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace the KernelFacade type and makeKernelFacade factory (from
kernel-browser-runtime) with KernelFacet and makeKernelFacet (from
ocap-kernel). The kernel facet is now a thin delegate layer over the
kernel, with the only additions being ping() and getVatRoot().

Key changes:
- Add missing methods to KernelFacet (ping, pingVat,
  getSystemSubclusterRoot, reset, queueMessage)
- Add Kernel.provideFacet() for idempotent facet creation, replacing
  the boolean flag and #ensureKernelFacetRegistered()
- Move throw-on-missing logic for getSystemSubclusterRoot into Kernel
- Rename bootstrapRootKref to rootKref in SubclusterLaunchResult
- Remove KernelFacade type, makeKernelFacade, KernelFacetLaunchResult,
  and LaunchResult from kernel-browser-runtime
- Update all consumers (omnium-gatherum, extension) to use KernelFacet

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Kernel.getPresence(kref, iface = 'Kernel Object') as a public
method that wraps kslot(). Remove getVatRoot from KernelFacet and
replace it with getPresence, which is now a delegated dependency
rather than a standalone kslot call.

Update controller-vat.ts to call E(kernelFacet).getPresence(kref,
'vatRoot') instead of E(kernelFacet).getVatRoot(kref).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace individual method declarations with a spread of the deps
object. Since every method except ping() is a direct delegate, the
facet is now just `makeDefaultExo('kernelFacet', { ...deps, ping })`.

Simplify tests accordingly — use plain functions instead of vi.fn()
mocks (which get frozen by harden()).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…string, VatId>

Replace the array-based vat storage with a name-keyed record, making the
vat name→ID relationship explicit and eliminating the fragile index-based
bootstrap vat lookup in Kernel.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… CapTP integration test

Delegate each vi.fn() mock through a wrapper function before passing to
makeKernelFacet, so harden() freezes the wrappers instead of the original
mock instances, keeping vitest call tracking intact.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
reloadSubcluster() creates a new subcluster with a new ID, but was not
updating #systemSubclusterRoots or the persisted systemSubcluster.*
mappings. This left stale mappings that caused 'has no bootstrap vat'
errors on subsequent kernel restarts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…o SubclusterManager

System subcluster state and logic (persist/restore/cleanup mappings,
launch new named subclusters, track roots) belongs in SubclusterManager
which already owns subcluster CRUD, termination, and reload. This moves
~140 lines out of Kernel.ts into SubclusterManager, keeping Kernel as a
thin orchestration layer that delegates to its managers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…registration

The kernelFacet kernel service now takes ko3, shifting all vat root
ko IDs by 1. Update hardcoded ko references in control-panel,
object-registry, and remote-comms e2e tests accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…usters

reloadAllSubclusters bypasses reloadSubcluster and has its own loop
that calls addSubcluster + launchVatsForSubcluster directly, so it
never updated the in-memory systemSubclusterRoots map or persisted
mappings. After a reload-all, getSystemSubclusterRoot() would return
stale krefs pointing to deleted objects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace local Baggage type definition with the one exported from
ocap-kernel, which includes keys() for native iteration. This
eliminates the manual __storage_keys__ tracking in the baggage adapter.
Also replace local LaunchResult type with SubclusterLaunchResult from
ocap-kernel, and remove dead resuscitation guard in controller-vat
bootstrap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Construct a fallback Logger in controller-vat when vatPowers.logger is
not provided, ensuring a real Logger is always passed downstream. This
makes the logger property non-optional in ControllerConfig,
Controller, ControllerStorage, and CapletController, eliminating
optional chaining on logger calls throughout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nally

Instead of the caller manually binding each method, makeKernelFacet
now takes the kernel instance directly and iterates over a const array
of method names to bind them. This reduces the call site in Kernel.ts
from 12 lines to 1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the positional (resetStorage, mnemonicOrOptions) parameters
with a single options object. resetStorage defaults to true since
nearly every call site uses that value.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Apply vitest eslint config to `**/test/**/*` in addition to
`**/*.test.ts` files, so non-test-named files under test directories
also get the right rules. Remove now-unnecessary eslint-disable
comments in system-vat.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The omnium.caplet type was declared as Promisified<CapletControllerFacet>
but the implementation routes through queueMessage, returning raw
CapData instead of deserialized values. Replace with explicit method
signatures using QueueMessageResult, and add the missing
callCapletMethod and getCapletRoot methods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On restart with an empty systemSubclusters array, the kernel facet was
never registered because provideFacet() was guarded by configs.length > 0.
Persisted run queue items targeting the kernel facet kref would cause
invokeKernelService to throw, crashing the kernel queue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rekmarks rekmarks force-pushed the rekm/system-vats-redux branch from 48ff5d1 to 31868d8 Compare February 6, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants