Replace JS worker's rendezvous channel with unbounded queue by joshua-spacetime · Pull Request #4704 · clockworklabs/SpacetimeDB

joshua-spacetime · 2026-03-24T21:56:19Z

Description of Changes

Previously, a module’s JS worker thread was fed through a zero-capacity channel. That made every request handoff a rendezvous between the async producer task and the single JS worker thread. Under high concurrency, that synchronous handoff showed up as hot flume/lock/wakeup stacks on the critical path - the JS worker thread.

This patch brings V8 execution in line with WASM which also uses an unbounded request queue.

Changing that handoff to an unbounded queue decouples request producers from the JS worker. Producers can enqueue work without synchronizing directly with the worker on every request, and the worker can continue draining queued requests without paying the rendezvous cost each time. This shortens the critical path, reduces scheduler/locking overhead and increases throughput.

API and ABI breaking changes

None

Expected complexity level and risk

2

Testing

Manual performance testing

Centril · 2026-03-27T11:19:47Z

crates/core/src/host/v8/mod.rs

+/// remain single-consumer. Recovery waits on [`Self::wait_exited`] before spawning
+/// the replacement worker, so there is no overlap where two workers can both drain
+/// the queue.
+struct JsWorkerState {


Suggested change

struct JsWorkerState {

#[derive(Default)]

struct JsWorkerState {

Centril · 2026-03-27T11:20:57Z

crates/core/src/host/v8/mod.rs

+    fn new() -> Arc<Self> {
+        Arc::new(Self {
+            trapped: AtomicBool::new(false),
+            exited: AtomicBool::new(false),
+            exited_notify: Notify::new(),
+        })
+    }


All of these are Default.

Suggested change

fn new() -> Arc<Self> {

Arc::new(Self {

trapped: AtomicBool::new(false),

exited: AtomicBool::new(false),

exited_notify: Notify::new(),

})

}

Centril · 2026-03-27T11:21:32Z

crates/core/src/host/v8/mod.rs

    let (result_tx, result_rx) = oneshot::channel();
-    let trapped = Arc::new(AtomicBool::new(false));
-    let worker_trapped = trapped.clone();
+    let worker_state = JsWorkerState::new();


Suggested change

let worker_state = JsWorkerState::new();

let worker_state: Arc<JsWorkerState> = <_>::default();

Centril · 2026-03-27T11:23:25Z

crates/core/src/worker_metrics/mod.rs

+        #[name = spacetime_worker_v8_instance_lane_queue_length]
+        #[help = "The number of queued requests waiting for a database's JS instance lane worker"]
+        #[labels(database_identity: Identity)]
+        pub v8_instance_lane_queue_length: IntGaugeVec,


Now that you've discovered the issue and resolved it, I think we can remove these metrics as they mostly add (a little bit of) cost and complexity.

Centril · 2026-03-27T11:31:06Z

crates/core/src/host/v8/mod.rs

    pub async fn create_instance(&self) -> JsInstance {
+        // We use a rendezvous channel for pooled instances, because they are checked
+        // out one request at a time and subsequently returned to the pool, unlike the
+        // long lived instance used for executing reducers which isn't checked out but
+        // fed through a queue.
+        let request_queue = JsWorkerQueue::bounded(0);
+        let program = self.program.clone();
+        let common = self.common.clone();
+        let load_balance_guard = self.load_balance_guard.clone();
+        let core_pinner = self.core_pinner.clone();
+        let heap_policy = self.heap_policy;
+
+        // This has to be done in a blocking context because of `blocking_recv`.
+        let (_, instance) = spawn_instance_worker(
+            program,
+            Either::Left(common),
+            load_balance_guard,
+            core_pinner,
+            heap_policy,
+            request_queue,
+        )
+        .await
+        .expect("`spawn_instance_worker` should succeed when passed `ModuleCommon`");
+        instance
+    }
+
+    async fn create_lane_instance(&self) -> JsInstance {
        let program = self.program.clone();
        let common = self.common.clone();
        let load_balance_guard = self.load_balance_guard.clone();
        let core_pinner = self.core_pinner.clone();
        let heap_policy = self.heap_policy;
+        let request_queue = self.lane_queue.clone();

        // This has to be done in a blocking context because of `blocking_recv`.
        let (_, instance) = spawn_instance_worker(
            program,
            Either::Left(common),
            load_balance_guard,
            core_pinner,
            heap_policy,
+            request_queue,
        )
        .await
        .expect("`spawn_instance_worker` should succeed when passed `ModuleCommon`");
        instance
    }


It would have made reviewing easier and the diff smaller if this code was not duplicated. Please dedup these into a common base method taking request_queue.

Centril · 2026-03-27T11:53:44Z

crates/core/src/host/v8/mod.rs

+/// Async callers enqueue [`JsWorkerRequest`] values here and wait on their
+/// per-request one-shot replies. The dedicated JS worker thread drains this
+/// queue and executes those requests on the isolate.


Suggested change

/// Async callers enqueue [`JsWorkerRequest`] values here and wait on their

/// per-request one-shot replies. The dedicated JS worker thread drains this

/// queue and executes those requests on the isolate.

/// Async callers enqueue [`JsWorkerRequest`] values here

/// and wait on their per-request one-shot replies sent back via [`JsReplyTx`].

/// The dedicated JS worker thread, spawned in [`spawn_instance_worker`]

/// drains this queue and executes those requests on the isolate.

Centril · 2026-03-27T11:57:52Z

crates/core/src/host/v8/mod.rs

+    fn trapped(&self) -> bool {
+        self.trapped.load(Ordering::Relaxed)
+    }
+
+    fn exited(&self) -> bool {
+        self.exited.load(Ordering::Relaxed)
+    }
+
+    fn needs_recovery(&self) -> bool {
+        self.trapped() || self.exited()
+    }
+
+    fn mark_trapped(&self) {
+        self.trapped.store(true, Ordering::Relaxed);
+    }
+
+    fn mark_exited(&self) {
+        self.exited.store(true, Ordering::Relaxed);
+        self.exited_notify.notify_waiters();
+    }
+
+    async fn wait_exited(&self) {


Please add doc comments to these.

Centril · 2026-03-27T12:00:54Z

crates/core/src/host/v8/mod.rs

    id: u64,
-    request_tx: flume::Sender<JsWorkerRequest>,
-    trapped: Arc<AtomicBool>,
+    request_queue: Arc<JsWorkerQueue>,


Why is the Arc necessary here? The inner type is Clone and consists of cheaply cloneable types where each is already wrapped in Arc.

Centril · 2026-03-27T12:18:54Z

crates/core/src/host/v8/mod.rs

+        if active.needs_recovery() {
+            self.replace_active_if_current(&active).await;
+            active = self.active_instance();
+        }


Can we enter this method with a bad instance now? Some commentary on this would be good in the code.

Centril · 2026-03-27T12:26:19Z

crates/core/src/host/v8/mod.rs

+                    if trapped {
+                        worker_state_in_thread.mark_trapped();
+                    }


One thing that would be good to comment on is why we need to do worker_state_in_thread.mark_trapped(); when we already do that in send_request. Ostensibly, we are setting the flag twice now.

Centril · 2026-03-27T13:09:19Z

On phoenix nap, using the rust client, I get:

TS module: 156.5k TPS
Rust module: 168.5k TPS

The difference with this PR is at 12k TPS 🚀

joshua-spacetime force-pushed the joshua/js-worker-queue branch from 5dfa03f to 12bbb3e Compare March 25, 2026 01:09

joshua-spacetime changed the title ~~test: make js worker queue unbounded~~ Replace JS worker's rendezvous channel with bounded queue Mar 25, 2026

joshua-spacetime marked this pull request as ready for review March 25, 2026 06:49

joshua-spacetime force-pushed the joshua/js-worker-queue branch from 12bbb3e to 86bb26a Compare March 25, 2026 21:38

joshua-spacetime changed the title ~~Replace JS worker's rendezvous channel with bounded queue~~ Replace JS worker's rendezvous channel with unbounded queue Mar 25, 2026

Base automatically changed from joshua/v8-heap-metrics to master March 25, 2026 22:37

joshua-spacetime force-pushed the joshua/js-worker-queue branch from 86bb26a to b36c84d Compare March 25, 2026 22:42

joshua-spacetime requested a review from coolreader18 March 26, 2026 05:38

Replace JS worker's rendezvous channel with unbounded queue

1b76be1

joshua-spacetime force-pushed the joshua/js-worker-queue branch from b36c84d to 1b76be1 Compare March 26, 2026 23:48

Centril self-requested a review March 27, 2026 11:31

Centril requested changes Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace JS worker's rendezvous channel with unbounded queue#4704

Replace JS worker's rendezvous channel with unbounded queue#4704
joshua-spacetime wants to merge 1 commit intomasterfrom
joshua/js-worker-queue

joshua-spacetime commented Mar 24, 2026 •

edited

Loading

Uh oh!

Centril Mar 27, 2026

Uh oh!

Centril Mar 27, 2026

Uh oh!

Centril Mar 27, 2026

Uh oh!

Centril Mar 27, 2026

Uh oh!

Centril Mar 27, 2026

Uh oh!

Centril Mar 27, 2026

Uh oh!

Centril Mar 27, 2026

Uh oh!

Centril Mar 27, 2026

Uh oh!

Centril Mar 27, 2026

Uh oh!

Centril Mar 27, 2026

Uh oh!

Centril commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	struct JsWorkerState {
	#[derive(Default)]
	struct JsWorkerState {

	let worker_state = JsWorkerState::new();
	let worker_state: Arc<JsWorkerState> = <_>::default();

Conversation

joshua-spacetime commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of Changes

API and ABI breaking changes

Expected complexity level and risk

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Centril commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joshua-spacetime commented Mar 24, 2026 •

edited

Loading