This document is the north-star design for adding async/await to RunMat by making evaluation a
Rust Future end-to-end. It is written to minimize hacks, avoid “sentinel” control-flow, and keep
RunMat as a thin, clean language/runtime layer that leverages Rust’s established async semantics
(Future, Waker, executors), while preserving MATLAB compatibility for existing synchronous code.
This doc is paired with docs/ARCH_ASYNC_PLAN.md, which tracks an incremental rollout plan.
-
Goals
- Evaluation is a Future: all RunMat evaluation can be driven via
poll, returningPendingat suspension points andReadyat completion. - Typed suspension: no “pending as string” or implicit sentinel error messages.
- Host-neutral: the core async model works in native CLI, GUI, Jupyter, and WASM.
- Deterministic semantics: suspension is explicit (primarily via
await), with predictable atomic regions between awaits. - Good performance: “sync-only” code remains fast; async overhead is near-zero when unused.
- Clean layering: crates do not pull host dependencies (Tokio,
web_sys) unintentionally.
- Evaluation is a Future: all RunMat evaluation can be driven via
-
Non-goals (v1)
- Preemptive scheduling (cooperative only).
- Implicit async I/O (no hidden yields without an explicit await point, except internal awaits that are semantically transparent and carefully audited).
- Full async JIT compilation on day 1 (interpret-first is acceptable).
RunMat evaluation becomes a Future whose state is the interpreter (and later JIT coroutine) state:
ExecuteFutureowns the VM state (instruction pointer, frames, operand stack).poll()runs until:- Completion: produces a final
ExecutionResult. - Error: produces a typed
RunMatError. - Suspension: returns
Poll::Pendingafter registering a wake.
- Completion: produces a final
The rule for determinism:
- User-visible suspension points are explicit via
await(...)in RunMat code (and async blocks). - Internal suspension points (e.g., waiting for WebGPU
map_async) are modeled as internal awaitables and must be semantically transparent to users (no reordering surprises).
This gives predictable execution regions between awaits and avoids “hidden” re-entrancy.
A key compatibility constraint is that builtins keep stable signatures/return types:
- Builtins like
isemptyalways complete immediately. - Builtins like
webread/GPU readback remain language-synchronous (“evaluate expression, produce value, then continue”), but the evaluator may returnPoll::Pendingwhile waiting for host events.
This avoids “two builtins per builtin” and avoids config-driven return-type changes. Concurrency is introduced by running multiple tasks, not by builtins returning tasks conditionally.
This section defines what lives where, and what should not leak across layers.
- Owns: source text → AST.
- Does not own: runtime state, values, IO, executors.
- Owns: AST → HIR, HIR transforms, validation.
- Does not own: execution, async scheduling, host APIs.
- Owns: bytecode format and interpreter engine.
- Exposes: a pollable interpreter core (“run until yield”).
- Must not depend on: Tokio, JS/wasm bindings, GPU providers.
The VM should understand only:
- “execute instruction stream”
- “hit an await opcode / awaitable”
- “produce Completed / Pending(awaitable-id) / Error”
- Owns: builtin functions and semantic helpers.
- Builtins must not block the host. Under the futures-based engine, they may cause the evaluation
to return
Pendingwhile waiting for host events, while keeping language semantics sequential. awaitis an intrinsic or opcode (recommended as opcode for performance and clear semantics).
- Owns:
RunMatSessionpublic API and workspace model- execution planning, cancellation wiring, profiling/tracing
- integration glue between the VM + runtime + GC + async substrate
- Exposes:
execute_async(...) -> ExecuteFuture- convenience wrappers for native hosts (e.g.,
execute_blockingusing a host executor)
- Owns: GPU kernels, provider abstractions, and GPU-backed awaitables.
- Must not know about the VM; it can expose awaitables that wake when GPU work completes.
We introduce a new crate runmat-async as the single home for:
- typed yielding/suspension types (what used to be “control-flow”)
- executor trait(s)
- task identifiers/handles
- “awaitable value” representations used by the runtime/core
- deterministic local executor for tests (either in this crate or a submodule)
We should fold (or delete) the prior runmat-control-flow crate into runmat-async to avoid dual
control-flow APIs.
runmat-wasm: WASM bindings and host adapter torunmat-async(JS timers, promise wakeups).- Desktop/worker TypeScript: thin transport layer only; should not implement policy beyond “drive the runtime/executor”.
We want:
- a stable dependency boundary for the VM/runtime/core to share async semantics
- to avoid pulling host dependencies (Tokio,
web_sys) into core crates
TaskId(copyable id)TaskHandle(language-level value referencing a task)
This is intentionally minimal and host-neutral:
spawn(fut) -> TaskIdwake(task_id)(optional if wakers directly queue tasks)register_timer(deadline, task_id)(or timer awaitable)poll_task(task_id, cx) -> Poll<Result<Value, RunMatError>>
We can refine the trait, but the key is: RunMat does not hardcode Tokio.
At the language/runtime boundary, await(x) needs a protocol:
- if
xis aTaskHandle, poll that task - else if
xis a native awaitable wrapper (Value::Awaitable(...)), poll that - else error: “not awaitable”
Implementation strategy:
- store
Pin<Box<dyn Future<Output=Result<Value, RunMatError>> + 'session>>for tasks - store non-task awaitables similarly or via a small vtable if we need to avoid trait objects
Cancellation is best modeled as:
- a cancellation token (shared state + waker list)
- cancellation checks at poll boundaries and await points
ASYNC_CREATE <closure>: create a task (lazy or eager; we recommend lazy-by-default).AWAIT: pop awaitable; if ready push value; if pending yield from interpreter poll.
Optional but useful:
YIELD(explicit cooperative yield; mainly for runtime fairness/testing)
The VM should expose something like:
fn poll_execute(&mut self, cx: &mut Context<'_>) -> Poll<InterpreterDone>
Where InterpreterDone is either:
- Completed(value(s))
- Error(RunMatError)
When it hits AWAIT on a pending awaitable, it returns Poll::Pending and ensures the awaitable
has registered cx.waker().
The VM state that must live across yields includes:
- operand stack
- call frames
- instruction pointer(s)
- “spilled” locals across await boundaries
This state is owned by ExecuteFuture or by a VM struct owned by it.
This is the highest-risk correctness topic in async runtimes.
No raw pointers to GC-managed objects may live across an await.
Instead:
- store stable handles (indices/ids) into the GC arena
- maintain an explicit “async frame root set” for all live handles
Across any suspension:
- stack values
- locals
- captured closure env values
- temporary values that might be used after resuming
ExecuteFutureowns anAsyncFrameobject:- contains
Vec<GcHandle>(or equivalent) for all rooted values - is registered as a GC root for the lifetime of the future/task
- contains
This should integrate with existing GC root registration infrastructure (not duplicate it).
- Suspension must not be representable as an error string.
- Error adaptation layers must not be able to “wrap away” suspension.
At the Rust level:
RunMatError(real errors)Poll::Pending(suspension)
At the language level:
awaiteither returns a value or raises the awaited task’s error.
For external requests (stdin/UI), await(input(...)) produces a pending request awaitable that the
host fulfills; the task is woken and resumes.
WASM is the forcing function: you cannot block the event loop. The correct model is:
execute_asyncreturns a JSPromise(from TS bindings) that awaits completion of the underlyingExecuteFuture- internal awaitables (GPU map/readback) wake via JS callbacks → waker → resume poll
- external awaitables (stdin prompt) are surfaced to JS/UI; resolution triggers wake
Key property: no synchronous busy-wait in WASM.
Native hosts can choose:
execute_blocking(implemented via a local executor or Tokioblock_on)execute_asyncintegrated into an existing async runtime
The worker should be a thin transport:
- start execution
- forward stdout events
- forward external pending requests to UI
- fulfill external requests when UI responds
It should not implement “special GPU loops”; internal awaitables are executor-driven.
GPU readback is an internal awaitable:
- scheduling
map_asyncreturns an awaitable/future that:- registers a waker
- wakes on callback completion
- yields mapped bytes to the provider code to decode into host tensors
This avoids:
- blocking in wasm
- host polling loops
- leaking GPU details into unrelated runtime code
Providers should expose async-friendly operations by returning either:
- immediate values, or
- awaitables (internal) representing pending completion
This repository currently contains transitional mechanisms to avoid event-loop starvation:
- “pending request” plumbing and host auto-drain loops
- error-sentinel preservation in a few wrapper layers
These are temporary and should be removed once:
- the interpreter is a future
- suspension is typed (
Pendingvia wakers) - host uses a real executor adapter
In particular, any solution that encodes suspension as a string/sentinel is considered transitional and must be removed as part of the futures architecture rollout.
The plan in docs/ARCH_ASYNC_PLAN.md describes how to retire the transitional mechanisms safely.