From bc0a256d9c5af37d7207e23b1450a65b2c3c0c0f Mon Sep 17 00:00:00 2001
From: Cursor Agent <cursoragent@cursor.com>
Date: Sun, 1 Mar 2026 11:33:58 +0000
Subject: [PATCH 1/5] docs: add recursive includes exploration note

Co-authored-by: Sam Willis <samwillis@users.noreply.github.com>
---
 docs/guides/recursive-includes-exploration.md | 212 ++++++++++++++++++
 1 file changed, 212 insertions(+)
 create mode 100644 docs/guides/recursive-includes-exploration.md

diff --git a/docs/guides/recursive-includes-exploration.md b/docs/guides/recursive-includes-exploration.md
new file mode 100644
index 000000000..8ee79bdb4
--- /dev/null
+++ b/docs/guides/recursive-includes-exploration.md
@@ -0,0 +1,212 @@
+# Recursive includes exploration
+
+Status: draft design note  
+Branch context: `cursor/recursive-includes-exploration-41e3` (includes subqueries in `select`, `toArray`, nested includes, per-parent aggregate/order/limit behavior)
+
+## Goal
+
+Support recursive hierarchical projection (adjacency-list style), while preserving the current includes performance model:
+
+- one query-graph branch per include declaration (not one query per parent row),
+- fan-out/materialization outside the query graph,
+- incremental/live updates.
+
+Example target (conceptual):
+
+```ts
+interface Node {
+  id: number
+  parentId: number
+}
+```
+
+```ts
+useLiveQuery((q) => {
+  const children = withQuery(
+    { pn: nodesCollection },
+    q
+      .from({ cn: nodesCollection })
+      .where(({ pn, cn }) => eq(cn.parentId, pn.id))
+      .select(({ cn }) => ({
+        ...cn,
+        children: children(cn),
+      })),
+  )
+
+  return q.from({ pn: nodesCollection }).select(({ pn }) => ({
+    ...pn,
+    children: children(pn),
+  }))
+})
+```
+
+## How includes work today (important baseline)
+
+### 1) Builder/IR phase
+
+- In `buildNestedSelect` / `buildIncludesSubquery` (`packages/db/src/query/builder/index.ts`):
+  - subquery values inside `.select(...)` are converted to `IncludesSubquery` IR nodes,
+  - one correlation condition is extracted from child `.where(...)` (`eq(parentRef, childRef)`),
+  - correlation predicate is removed from child query and stored as:
+    - `correlationField` (parent side),
+    - `childCorrelationField` (child side),
+  - optional `toArray(...)` is carried via `materializeAsArray`.
+
+### 2) Compiler phase
+
+- In `compileQuery` (`packages/db/src/query/compiler/index.ts`):
+  - includes are extracted from select,
+  - each includes child query is compiled once (recursively) with a parent key stream,
+  - child input is inner-joined against parent correlation keys,
+  - child output tuple is `[result, orderByIndex, correlationKey]`,
+  - select includes entries are replaced with placeholders; real fan-out is deferred to output layer.
+
+This is the core "single branch, external fan-out" shape.
+
+### 3) Output/materialization phase
+
+- In `CollectionConfigBuilder` (`packages/db/src/query/live/collection-config-builder.ts`):
+  - one output callback per includes entry accumulates pending child deltas by
+    `Map<correlationKey, Map<childKey, Changes>>`,
+  - `flushIncludesState` creates/updates/disposes child Collections,
+  - `correlationToParentKeys` reverse index attaches one child Collection to all matching parents,
+  - nested includes are handled with buffered bottom-up flushing,
+  - `toArray` re-emits parent rows with array snapshots when child content changes.
+
+### Why this is fast
+
+- Query graph does not duplicate child queries per parent.
+- Per-parent binding happens in output routing by correlation key.
+- This is exactly the property we should preserve for recursion.
+
+## Why recursive includes are not directly possible yet
+
+1. **No fixed-point query representation in builder/IR**
+   - Current includes require a concrete child `QueryBuilder` now.
+   - There is no notion of "self call" node in IR.
+
+2. **Compilation assumes acyclic query references**
+   - `compileQuery` cache prevents duplicate work, but not cyclic query construction.
+   - A true self-referential query would recurse indefinitely without additional cycle handling.
+
+3. **Nested includes depth is static today**
+   - Existing nested includes are explicit finite nesting in AST (`project -> issue -> comment`).
+   - Recursive trees need unbounded/depth-dynamic expansion.
+
+4. **Output flushing is level-structured**
+   - Current nested buffering/routing works for known levels.
+   - Recursive trees need dynamic level creation and pruning.
+
+## Option space
+
+### Option A: Depth-limited unrolling (MVP-friendly)
+
+Idea:
+- Introduce recursive syntax, but require `maxDepth`.
+- Compile by unrolling into N nested includes nodes.
+
+Pros:
+- Reuses almost all current implementation.
+- Predictable complexity and easy testing.
+
+Cons:
+- Not true recursion (cutoff behavior).
+- Query/IR grows with depth.
+- Not ideal for unknown depth trees.
+
+### Option B: Per-node dynamic child query instances
+
+Idea:
+- At runtime create child query/subscription per discovered node.
+
+Pros:
+- Easy to reason about.
+
+Cons:
+- Violates performance goal (effectively N queries/subscriptions).
+- High memory/loadSubset pressure on large trees.
+- Hard to optimize globally.
+
+Conclusion: likely reject.
+
+### Option C: Shared recursive edge stream + recursive fan-out state (recommended medium-term)
+
+Idea:
+- Compile a recursive declaration into one shared child/edge stream (same "one branch" principle).
+- Maintain recursive adjacency/materialization state outside query graph:
+  - `childrenByParentKey`,
+  - reverse links for impacted-ancestor propagation,
+  - per-parent child Collection/array materialization.
+- Recursively attach children using the same stream/state, not new query branches.
+
+Pros:
+- Preserves core includes performance model.
+- Supports unbounded depth.
+- Keeps incremental/reactive behavior centralized in output layer.
+
+Cons:
+- Non-trivial runtime/state-engine work.
+- Needs explicit cycle policy and update semantics.
+
+### Option D: New query-graph recursive operator (transitive closure/fixpoint)
+
+Idea:
+- Add dedicated incremental operator in `@tanstack/db-ivm` for recursive traversal.
+
+Pros:
+- Most declarative and potentially most powerful long-term.
+
+Cons:
+- Highest implementation complexity/risk.
+- More invasive engine work before shipping user value.
+
+## Recommended staged plan
+
+### Phase 0: API and semantics RFC
+
+Decide:
+- allowed graph shape (tree only vs DAG),
+- cycle behavior (error, truncate, or dedupe by node key),
+- ordering/limit semantics (`orderBy/limit` per parent at each depth),
+- identity semantics (shared node object across paths vs per-path copy),
+- whether recursion requires stable correlation key (likely yes).
+
+### Phase 1: Ship depth-limited recursion (Option A)
+
+- Good for early user feedback and type-system validation.
+- Keeps current architecture almost unchanged.
+- Enables concrete UX/API iteration (`withQuery` vs dedicated `recursiveInclude(...)` API).
+
+### Phase 2: Build shared recursive materializer (Option C)
+
+- Add a recursive includes IR node that represents a fixed-point/self call.
+- Compile one child branch per declaration.
+- Extend output-layer state machine to dynamic-depth traversal and impacted-ancestor propagation.
+- Preserve existing non-recursive includes behavior as-is.
+
+### Phase 3 (optional/long-term): evaluate graph-level operator (Option D)
+
+- If runtime-layer complexity or performance ceilings appear, move recursion core into IVM.
+
+## Open questions to resolve early
+
+1. **Cycle policy**: What should happen on `A -> B -> A`?
+2. **DAG duplication**: If node `X` is reachable from two parents, share instance or duplicate per path?
+3. **Move semantics**: Parent change (`parentId` update) should re-home full subtree incrementally.
+4. **Result keying**: Need robust key serialization for correlation values.
+5. **Interplay with `toArray`**: re-emit boundaries and batching strategy for deep updates.
+6. **Parent-referencing child filters**: align recursion design with parent-filtering includes work.
+
+## Practical next step
+
+Build a small RFC/POC on top of this branch with:
+
+- API sketch (including TypeScript inference expectations),
+- Phase-1 depth-limited prototype (`maxDepth`),
+- benchmark scenarios:
+  - deep chain,
+  - wide tree,
+  - subtree move,
+  - frequent leaf insert/delete.
+
+That gives fast signal on ergonomics and correctness before committing to full fixed-point execution.

From 9556ff95f1191f840b1e56f911d1d32563b648ee Mon Sep 17 00:00:00 2001
From: Cursor Agent <cursoragent@cursor.com>
Date: Sun, 1 Mar 2026 11:55:51 +0000
Subject: [PATCH 2/5] docs: expand recursive includes option D design

Co-authored-by: Sam Willis <samwillis@users.noreply.github.com>
---
 docs/guides/recursive-includes-exploration.md | 302 +++++++++++++-----
 1 file changed, 224 insertions(+), 78 deletions(-)

diff --git a/docs/guides/recursive-includes-exploration.md b/docs/guides/recursive-includes-exploration.md
index 8ee79bdb4..229d97619 100644
--- a/docs/guides/recursive-includes-exploration.md
+++ b/docs/guides/recursive-includes-exploration.md
@@ -97,116 +97,262 @@ This is the core "single branch, external fan-out" shape.
    - Current nested buffering/routing works for known levels.
    - Recursive trees need dynamic level creation and pruning.
 
-## Option space
+## Option status after this exploration
 
-### Option A: Depth-limited unrolling (MVP-friendly)
+- **Option A (depth-limited unrolling):** compelling MVP route and syntax-compatible with a stronger future implementation.
+- **Option B (per-node dynamic queries):** rejected.
+- **Option C (output-layer recursive materializer):** currently less compelling given desire to solve recursion at the IVM graph level.
+- **Option D (new recursive IVM operator):** most compelling long-term direction.
 
-Idea:
-- Introduce recursive syntax, but require `maxDepth`.
-- Compile by unrolling into N nested includes nodes.
+The rest of this document focuses on how to make Option D practical in `db-ivm`, while avoiding global multidimensional time unless absolutely required.
 
-Pros:
-- Reuses almost all current implementation.
-- Predictable complexity and easy testing.
+## Option D deep dive: recursive operator in `@tanstack/db-ivm`
 
-Cons:
-- Not true recursion (cutoff behavior).
-- Query/IR grows with depth.
-- Not ideal for unknown depth trees.
+### Key observations from the current codebases
 
-### Option B: Per-node dynamic child query instances
+1. `db-ivm` intentionally removed version/frontier machinery and runs until local quiescence (`D2.run()` loops while operators have pending work).
+2. The original `d2ts` has `iterate` based on:
+   - version extension/truncation (`Version.extend()` / `truncate()`),
+   - per-iteration step (`applyStep()`),
+   - frontier coordination in `FeedbackOperator`.
+3. The DBSP paper explicitly supports recursion (including non-monotonic recursion) and models recursive incrementalization with nested time dimensions.
 
-Idea:
-- At runtime create child query/subscription per discovered node.
+So we have a useful tension:
 
-Pros:
-- Easy to reason about.
+- Differential-style multidimensional time is expressive and principled.
+- `db-ivm` is intentionally much simpler.
+- We want recursion now, but do not want to pay the full complexity tax upfront.
 
-Cons:
-- Violates performance goal (effectively N queries/subscriptions).
-- High memory/loadSubset pressure on large trees.
-- Hard to optimize globally.
+### What Differential/DBSP are telling us (and what to borrow)
 
-Conclusion: likely reject.
+From Differential and DBSP, the durable ideas to keep are:
 
-### Option C: Shared recursive edge stream + recursive fan-out state (recommended medium-term)
+1. **Recursion should be fixed-point computation over deltas**, not repeated full recomputation.
+2. **Semi-naive style propagation** (only newly discovered tuples drive next iteration) is essential.
+3. **Strict feedback / convergence discipline** is mandatory to avoid non-termination.
+4. **Two notions of progress exist conceptually**:
+   - outer progress (incoming transaction/update),
+   - inner progress (loop iteration).
 
-Idea:
-- Compile a recursive declaration into one shared child/edge stream (same "one branch" principle).
-- Maintain recursive adjacency/materialization state outside query graph:
-  - `childrenByParentKey`,
-  - reverse links for impacted-ancestor propagation,
-  - per-parent child Collection/array materialization.
-- Recursively attach children using the same stream/state, not new query branches.
+The implementation question is whether we must expose both dimensions in the public runtime timestamp model.
 
-Pros:
-- Preserves core includes performance model.
-- Supports unbounded depth.
-- Keeps incremental/reactive behavior centralized in output layer.
+### Can we avoid global multidimensional time?
 
-Cons:
-- Non-trivial runtime/state-engine work.
-- Needs explicit cycle policy and update semantics.
+**Yes, as a first-class engineering step**: keep one external time dimension (current `db-ivm` behavior), and model the recursion iteration dimension as *internal operator state*.
 
-### Option D: New query-graph recursive operator (transitive closure/fixpoint)
+Think of this as "local nested time" instead of "global timestamp vectors".
 
-Idea:
-- Add dedicated incremental operator in `@tanstack/db-ivm` for recursive traversal.
+- External graph: unchanged, still versionless from the API perspective.
+- Recursive operator internals:
+  - own work queue,
+  - own iteration counter/depth,
+  - own convergence checks.
 
-Pros:
-- Most declarative and potentially most powerful long-term.
+This gives most of the practical value without changing every operator or stream type.
 
-Cons:
-- Highest implementation complexity/risk.
-- More invasive engine work before shipping user value.
+## Proposed operator shape (first pass)
 
-## Recommended staged plan
+### Conceptual API
 
-### Phase 0: API and semantics RFC
+```ts
+recursiveFixpoint({
+  roots,      // stream of root entities / correlation keys
+  edges,      // stream of adjacency edges
+  expand,     // one-step expansion function (join-like)
+  options: {
+    maxDepth?: number,
+    cyclePolicy: 'dedupe-node' | 'allow-paths' | 'error',
+    deletionMode: 'recompute-affected' | 'support-counts',
+  },
+})
+```
+
+For tree includes, `expand` is typically "follow `parentId -> id` edge one hop".
+
+### Output contract for includes
+
+Emit tuples keyed by child identity, with payload that includes:
+
+- `correlationKey` (root/parent scope key for fan-out),
+- `nodeKey` (child key),
+- `depth`,
+- optional `parentNodeKey` (for deterministic tree reconstruction),
+- optional stable order token.
+
+This stays compatible with current includes output routing (`correlationKey` fan-out remains outside graph).
+
+## Internal algorithm sketch (no global multidimensional time)
+
+### State
+
+Per recursive operator instance:
+
+- `edgeIndex`: parentNodeKey -> children
+- `reverseEdgeIndex`: childNodeKey -> parents (for deletes)
+- `rootsIndex`: active roots
+- `reachable`: map `(rootKey, nodeKey) -> state`
+  - at minimum: present/not-present, depth
+  - for robust deletions: support count / witness set
+- `frontierQueue`: pending delta tuples for next expansion wave
+
+### Insert propagation (semi-naive)
+
+1. Ingest root/edge inserts as delta.
+2. Seed `frontierQueue` with new reachable facts.
+3. Loop until queue empty:
+   - pop wave,
+   - expand one hop via `edgeIndex`,
+   - apply cycle/dedupe policy,
+   - emit only net-new tuples,
+   - enqueue only newly-added tuples for next wave.
+
+This is standard semi-naive fixed-point iteration inside one operator run.
+
+### Delete propagation: two viable modes
+
+#### Mode 1: recompute-affected (simpler, good first cut)
+
+- On edge/root delete, identify affected roots/subgraph.
+- Retract previously emitted tuples for affected scope.
+- Recompute fixed point for that affected scope from current base data.
+
+Tradeoff:
+- simpler correctness,
+- potentially expensive on large deletions.
+
+#### Mode 2: support-counts / witnesses (full incremental)
+
+- Track derivation support per `(root,node)` tuple.
+- Inserts increment support and may cross 0 -> positive (emit insert).
+- Deletes decrement support and may cross positive -> 0 (emit delete), then cascade.
+
+Tradeoff:
+- best incremental behavior,
+- more state and complexity (especially for DAGs with many alternative paths).
+
+## Cycle, DAG, and depth semantics
+
+### Cycle policy
+
+Recommended default: `dedupe-node` by `(rootKey,nodeKey)`.
+
+- Guarantees termination on finite graphs.
+- Produces one materialized node per root, not one row per path.
 
-Decide:
-- allowed graph shape (tree only vs DAG),
-- cycle behavior (error, truncate, or dedupe by node key),
-- ordering/limit semantics (`orderBy/limit` per parent at each depth),
-- identity semantics (shared node object across paths vs per-path copy),
-- whether recursion requires stable correlation key (likely yes).
+Alternative `allow-paths` is much heavier (potential explosion), and should be opt-in.
 
-### Phase 1: Ship depth-limited recursion (Option A)
+### Depth handling (the "inject depth per iteration" idea)
 
-- Good for early user feedback and type-system validation.
-- Keeps current architecture almost unchanged.
-- Enables concrete UX/API iteration (`withQuery` vs dedicated `recursiveInclude(...)` API).
+Depth can be treated as the operator's internal iteration coordinate:
 
-### Phase 2: Build shared recursive materializer (Option C)
+- `depth=0` at root seed (or `1` at first child hop; pick one and document),
+- each expansion increments depth by 1.
 
-- Add a recursive includes IR node that represents a fixed-point/self call.
-- Compile one child branch per declaration.
-- Extend output-layer state machine to dynamic-depth traversal and impacted-ancestor propagation.
-- Preserve existing non-recursive includes behavior as-is.
+This supports:
 
-### Phase 3 (optional/long-term): evaluate graph-level operator (Option D)
+- optional `maxDepth` stopping criterion (Option A compatibility),
+- deterministic breadth-first layering,
+- future APIs that expose depth/path metadata.
 
-- If runtime-layer complexity or performance ceilings appear, move recursion core into IVM.
+Important: with dedupe-by-node, keep the minimal depth seen for each `(root,node)`.
 
-## Open questions to resolve early
+## Why this is syntax-compatible with Option A
 
-1. **Cycle policy**: What should happen on `A -> B -> A`?
-2. **DAG duplication**: If node `X` is reachable from two parents, share instance or duplicate per path?
-3. **Move semantics**: Parent change (`parentId` update) should re-home full subtree incrementally.
-4. **Result keying**: Need robust key serialization for correlation values.
-5. **Interplay with `toArray`**: re-emit boundaries and batching strategy for deep updates.
-6. **Parent-referencing child filters**: align recursion design with parent-filtering includes work.
+If we introduce recursive query syntax now, we can compile it in two different ways without API break:
 
-## Practical next step
+1. **MVP path**: unroll to `maxDepth` nested includes (Option A).
+2. **Future path**: compile to `recursiveFixpoint` operator (Option D).
 
-Build a small RFC/POC on top of this branch with:
+Same user syntax, different backend strategy.
 
-- API sketch (including TypeScript inference expectations),
-- Phase-1 depth-limited prototype (`maxDepth`),
-- benchmark scenarios:
+## Integration points in TanStack DB
+
+### IR / builder
+
+Add a recursive include IR form (placeholder naming):
+
+- `RecursiveIncludesSubquery`:
+  - base child query,
+  - self reference marker,
+  - correlation metadata,
+  - options (`maxDepth`, cycle policy, etc.).
+
+### Compiler
+
+When recursive IR is detected:
+
+- emit one recursive operator branch in `compileQuery`,
+- continue returning child rows with correlation metadata,
+- keep select placeholder behavior (as done for includes now).
+
+### Output layer
+
+Largely unchanged core principle:
+
+- still fan out by correlation key in `flushIncludesState`,
+- recursive operator only changes what child stream arrives, not where fan-out happens.
+
+## Concrete phased plan (A now, D in parallel)
+
+### Phase A1 (MVP)
+
+- Implement depth-limited recursive syntax with explicit `maxDepth`.
+- Compile by unrolling.
+- Land tests for:
   - deep chain,
   - wide tree,
   - subtree move,
-  - frequent leaf insert/delete.
+  - cycle handling under `maxDepth`.
+
+### Phase D0 (operator spike, behind flag)
+
+- Add internal `recursiveFixpoint` operator with:
+  - inserts + updates,
+  - delete handling via recompute-affected mode.
+- Tree-first semantics (`dedupe-node`, stable keys).
+- Benchmark against Option A at moderate depths.
+
+### Phase D1 (full incremental deletes)
+
+- Add support counts / witnesses.
+- Expand to robust DAG behavior.
+- Add stress tests for high churn and subtree re-parenting.
+
+### Phase D2 (only if needed)
+
+- Revisit whether global multidimensional time/frontiers are necessary.
+- Only escalate if concrete workloads show correctness/performance gaps that local iteration cannot close cleanly.
+
+## Risks and mitigations
+
+1. **Delete complexity in DAGs**
+   - Mitigation: start with recompute-affected mode; gate support-count mode later.
+
+2. **State growth**
+   - Mitigation: strict dedupe policy by default; expose safeguards (`maxDepth`, optional per-root limits).
+
+3. **Non-termination under permissive path semantics**
+   - Mitigation: default `dedupe-node`; explicit opt-in for path semantics with hard limits.
+
+4. **Ordering instability across recursive updates**
+   - Mitigation: define deterministic order contract early (e.g., by depth then key, or explicit `orderBy` semantics per level).
+
+## Open questions to lock before implementation
+
+1. Node identity semantics for DAGs:
+   - one instance per `(root,node)` or per path?
+2. Parent-child ordering semantics at each depth.
+3. Whether subtree moves must be strongly incremental in v1 of Option D.
+4. How much recursion metadata should be exposed (`depth`, `path`, `ancestor`).
+5. Hard bounds for safe execution (depth, node-count, iteration-count).
+
+## References used in this exploration
 
-That gives fast signal on ergonomics and correctness before committing to full fixed-point execution.
+- Current TanStack DB includes pipeline:
+  - `packages/db/src/query/compiler/index.ts`
+  - `packages/db/src/query/live/collection-config-builder.ts`
+- `d2ts` iterative machinery (pre-simplification reference):
+  - `packages/d2ts/src/operators/iterate.ts`
+  - `packages/d2ts/src/order.ts`
+- DBSP paper (arXiv 2203.16684):
+  - abstract and sections 5-6 discuss recursion, fixed points, and nested time dimensions.

From f74fa140986114d018a39fe3602c3ed6eb764115 Mon Sep 17 00:00:00 2001
From: Cursor Agent <cursoragent@cursor.com>
Date: Sun, 1 Mar 2026 12:07:44 +0000
Subject: [PATCH 3/5] docs: add recursive pipeline sketch and option E analysis

Co-authored-by: Sam Willis <samwillis@users.noreply.github.com>
---
 docs/guides/recursive-includes-exploration.md | 224 +++++++++++++++++-
 1 file changed, 223 insertions(+), 1 deletion(-)

diff --git a/docs/guides/recursive-includes-exploration.md b/docs/guides/recursive-includes-exploration.md
index 229d97619..6459ea785 100644
--- a/docs/guides/recursive-includes-exploration.md
+++ b/docs/guides/recursive-includes-exploration.md
@@ -181,6 +181,83 @@ Emit tuples keyed by child identity, with payload that includes:
 
 This stays compatible with current includes output routing (`correlationKey` fan-out remains outside graph).
 
+### Compiled pipeline sketch (Option D)
+
+Below is a concrete **pseudo-code sketch** of what compilation could emit for a
+recursive include, using current `db-ivm`-style streams and operators.
+
+```ts
+// Input collection stream: [nodeKey, nodeRow]
+const nodesInput = inputs.cn
+
+// 1) Parent/root keys from already-filtered parent pipeline
+// Shape: [correlationKey, { rootNodeKey }]
+const parentKeys = parentPipeline.pipe(
+  map(([_, parentNsRow]) => {
+    const root = parentNsRow.pn
+    return [root.id, { rootNodeKey: root.id }] as const
+  }),
+)
+
+// 2) One shared adjacency stream for recursion
+// Shape: [parentId, { nodeKey, node }]
+const edgesByParent = nodesInput.pipe(
+  map(([nodeKey, node]) => [node.parentId, { nodeKey, node }] as const),
+)
+
+// 3) Seed stream (depth 0 or 1, depending on chosen convention)
+// Shape: [correlationKey, RecursiveRow]
+const seed = parentKeys.pipe(
+  map(([correlationKey, { rootNodeKey }]) => [
+    correlationKey,
+    {
+      nodeKey: rootNodeKey,
+      parentNodeKey: null,
+      depth: 0,
+    },
+  ] as const),
+)
+
+// 4) Recursive fixed-point operator (new in Option D)
+// Emits only net-new / net-removed recursive rows incrementally.
+const recursiveRows = recursiveFixpoint({
+  seed,
+  edgesByParent,
+  maxDepth, // optional (for syntax compatibility with Option A)
+  cyclePolicy: `dedupe-node`, // dedupe by (correlationKey,nodeKey)
+  deletionMode: `recompute-affected`, // first implementation
+})
+
+// 5) Join recursive rows back to base node rows to project final shape
+// Shape: [childNodeKey, [childResult, orderByIndex?, correlationKey]]
+const includesChildPipeline = recursiveRows.pipe(
+  // pseudocode for lookup/join; real implementation can use join/index operator
+  map(([_corr, rr]) => [rr.nodeKey, rr] as const),
+  join(nodesInput, `inner`),
+  map(([childNodeKey, [rr, nodeRow]]) => [
+    childNodeKey,
+    [
+      {
+        ...nodeRow,
+        __depth: rr.depth,
+        __parentNodeKey: rr.parentNodeKey,
+      },
+      undefined, // orderBy index slot (kept for compatibility)
+      rr.correlationKey,
+    ],
+  ] as const),
+)
+```
+
+And then this stream plugs into the existing includes output machinery:
+
+- `pendingChildChanges` accumulation,
+- `correlationToParentKeys` fan-out,
+- `flushIncludesState` materialization into child collections/arrays.
+
+So Option D changes the **source of child rows**, while preserving the existing
+"single graph branch + external fan-out" architecture.
+
 ## Internal algorithm sketch (no global multidimensional time)
 
 ### State
@@ -292,7 +369,117 @@ Largely unchanged core principle:
 - still fan out by correlation key in `flushIncludesState`,
 - recursive operator only changes what child stream arrives, not where fan-out happens.
 
-## Concrete phased plan (A now, D in parallel)
+## Option E deep dive: reintroduce global time + frontiers
+
+Option E means bringing back the "version + frontier" execution model (as in
+`d2ts`) into `db-ivm`, then implementing recursion on top of that.
+
+Why it is worth considering:
+
+- explicit transaction tracking and ordering,
+- stronger global progress semantics,
+- cleaner foundation for multiple future iterative operators.
+
+### E1: global **single-dimensional** time + frontiers
+
+Use one global version coordinate (transaction epoch), and frontiers to mark
+completion of each epoch:
+
+- input data arrives as `(version, delta)`,
+- frontier advance means "no more data < frontier",
+- all operators become version-aware again.
+
+Recursion can still use a local operator loop internally (like Option D), but
+its outputs are tagged with the same outer version.
+
+Sketch:
+
+```ts
+const graph = new D2({ initialFrontier: 0 })
+const nodes = graph.newInput<NodeRow>()
+
+const parentKeys = compileParent(... ) // stream of [corrKey, root]
+const recursiveRows = parentKeys.pipe(
+  recursiveFixpoint({ /* same logic as Option D */ }),
+)
+
+graph.finalize()
+
+// transaction N
+nodes.sendData(42, nodeDelta)
+nodes.sendFrontier(43)
+graph.run()
+```
+
+What E1 brings:
+
+- stable transaction boundaries system-wide,
+- better observability/debuggability ("which epoch produced this row"),
+- easier consistency rules across multiple inputs.
+
+Cost of E1:
+
+- all streams/operators/messages become versioned again,
+- frontier correctness needs to be restored across graph execution,
+- substantial migration in `db-ivm` and compiler glue.
+
+### E2: global **multidimensional** time + frontiers (Differential-like)
+
+Use timestamps like `[txn, iter]` (or equivalent lattice tuples), with
+antichain frontiers.
+
+This supports explicit iterative scopes:
+
+- entering recursion extends time: `[txn] -> [txn, 0]`,
+- feedback increments iteration: `[txn, i] -> [txn, i + 1]`,
+- leaving recursion truncates back to `[txn]`.
+
+Sketch (conceptual):
+
+```ts
+const recursiveRows = parentKeys.pipe(
+  iterate((loop) =>
+    loop.pipe(
+      expandOneHop(edgesByParent),
+      dedupeByNode(),
+      consolidate(),
+    ),
+  ),
+)
+```
+
+What E2 brings:
+
+- principled semantics for nested iteration and recursion,
+- strong alignment with Differential/DBSP theory,
+- best long-term substrate for advanced recursive/incremental operators.
+
+Cost of E2:
+
+- largest complexity increase (time lattice + antichain logic everywhere),
+- high implementation and maintenance burden,
+- likely much slower path to user-visible value.
+
+## Option D vs Option E: what each really brings
+
+| Dimension | Option D (local recursive operator, versionless graph) | Option E1 (global single time + frontiers) | Option E2 (global multidimensional time) |
+| --- | --- | --- | --- |
+| Primary benefit | Fastest path to recursive includes | Strong transaction semantics | Most general recursive semantics |
+| Scope of change | Mostly one operator + compiler wiring | Whole runtime message model | Whole runtime + time lattice model |
+| Transaction tracking | Implicit / external to graph | Explicit and native | Explicit and native |
+| Recursion semantics | Strong enough for tree/DAG with careful state | Similar to D unless iterate scopes added | First-class iterative scopes |
+| Delivery risk | Low-medium | Medium-high | High |
+| Performance overhead | Lowest base overhead | Moderate (version/frontier plumbing) | Highest (timestamp lattice machinery) |
+| Future extensibility | Good, but local to recursive op | Better global control | Best theoretical headroom |
+| Best use case | Ship recursive includes soon | Need explicit epoch correctness now | Need full Differential-like model |
+
+### Practical interpretation
+
+- If the goal is "ship recursive includes for common use cases quickly", **D wins**.
+- If transaction-epoch correctness inside IVM is a hard requirement now, **E1 becomes compelling**.
+- If we expect many recursive/time-nested operators and need a canonical model, **E2 is architecturally strongest** but expensive.
+
+## Concrete phased plan (A now, D/E decision gate)
 
 ### Phase A1 (MVP)
 
@@ -323,6 +510,29 @@ Largely unchanged core principle:
 - Revisit whether global multidimensional time/frontiers are necessary.
 - Only escalate if concrete workloads show correctness/performance gaps that local iteration cannot close cleanly.
 
+### Phase E0 (parallel design spike)
+
+- Specify minimal version/frontier contract needed for transaction tracking.
+- Decide whether E1 alone is enough, or if E2 is actually required.
+- Prototype cost estimate:
+  - number of operators touched,
+  - expected perf/memory delta,
+  - migration impact on db compiler/live query code.
+
+### Decision gate: choose D-only vs D+E1 vs E2
+
+Evaluate with concrete workloads:
+
+- high-frequency transactions across multiple inputs,
+- recursive subtree churn (insert/delete/move),
+- observability/debugging needs by transaction.
+
+Choose:
+
+1. **D-only** if correctness/perf targets are met without global time.
+2. **D + E1** if transaction tracking and epoch semantics are required system-wide.
+3. **E2** only if E1 cannot satisfy recursive/iterative semantics needed by roadmap.
+
 ## Risks and mitigations
 
 1. **Delete complexity in DAGs**
@@ -337,6 +547,15 @@ Largely unchanged core principle:
 4. **Ordering instability across recursive updates**
    - Mitigation: define deterministic order contract early (e.g., by depth then key, or explicit `orderBy` semantics per level).
 
+5. **Runtime-wide migration risk for Option E**
+   - Mitigation: do E0 spike first; quantify exact operator/runtime churn before committing.
+
+6. **Frontier/liveness bugs if global time returns**
+   - Mitigation: add invariant checks and dedicated tests for monotonic frontier advancement and quiescence.
+
+7. **Higher steady-state overhead with versioned messages**
+   - Mitigation: benchmark E1 against D on representative live-query workloads before deciding.
+
 ## Open questions to lock before implementation
 
 1. Node identity semantics for DAGs:
@@ -345,6 +564,9 @@ Largely unchanged core principle:
 3. Whether subtree moves must be strongly incremental in v1 of Option D.
 4. How much recursion metadata should be exposed (`depth`, `path`, `ancestor`).
 5. Hard bounds for safe execution (depth, node-count, iteration-count).
+6. If Option E is chosen, should transaction tracking use:
+   - single-dimensional epochs only, or
+   - multidimensional `[txn, iter]` timestamps?
 
 ## References used in this exploration
 

From b552f1d8a49da88674995ec9f4856e74396e5bd4 Mon Sep 17 00:00:00 2001
From: Cursor Agent <cursoragent@cursor.com>
Date: Sun, 1 Mar 2026 12:46:26 +0000
Subject: [PATCH 4/5] docs: add query engine evolution exploration guide

Co-authored-by: Sam Willis <samwillis@users.noreply.github.com>
---
 .../query-engine-evolution-exploration.md     | 436 ++++++++++++++++++
 1 file changed, 436 insertions(+)
 create mode 100644 docs/guides/query-engine-evolution-exploration.md

diff --git a/docs/guides/query-engine-evolution-exploration.md b/docs/guides/query-engine-evolution-exploration.md
new file mode 100644
index 000000000..72663f461
--- /dev/null
+++ b/docs/guides/query-engine-evolution-exploration.md
@@ -0,0 +1,436 @@
+# Query engine evolution exploration
+
+Status: draft design note  
+Branch context: `cursor/recursive-includes-exploration-41e3`  
+Related: [Recursive includes exploration](./recursive-includes-exploration.md)
+
+## Goal
+
+Explore an evolution of the TanStack DB query engine with these requirements:
+
+1. **On-demand collections emit row version/LSN metadata** so queries can provide stable transactional guarantees.
+2. **One mutable query graph for all live queries** (compile + match + amend), instead of one graph per live query collection.
+3. **Minimize join index state** by treating it as computed-graph (CG) cache and re-fetching via up-queries when needed.
+4. **Route up-queries through operators** so they can be transformed/composed by the query engine.
+5. **Introduce single-dimensional or multidimensional version/time** so joins can withhold output until required up-queries are satisfied.
+
+This note is intentionally architecture-first and code-adjacent.
+
+---
+
+## Current baseline (important context)
+
+### 1) Query graph lifetime is per live query collection
+
+- `CollectionConfigBuilder` builds a query IR and compiles a `D2` graph for that live query.
+- `compileBasePipeline()` creates `new D2()` and input streams per alias.
+- Graph/pipeline caches live within that builder instance and are reset on sync cleanup.
+
+Implication: there is no shared global graph structure across independent live queries today.
+
+### 2) `loadSubset` is invoked directly from subscriptions and join lazy-loading
+
+- `CollectionSubscription.requestSnapshot()` / `requestLimitedSnapshot()` call `collection._sync.loadSubset(...)`.
+- Lazy join loading in the compiler (`joins.ts`) calls `lazySourceSubscription.requestSnapshot(...)` when join keys are missing.
+- This means up-query intent is generated in query code paths, but execution is pushed directly to source sync handlers.
+
+Implication: up-query planning is not a first-class operator pipeline stage.
+
+### 3) Core `db-ivm` runtime is versionless
+
+- `@tanstack/db-ivm` streams carry diffs (`MultiSet`) without explicit timestamps/frontiers.
+- `D2.run()` drives operators until quiescence based on pending input queues.
+- There is no graph-level frontier contract in the current runtime.
+
+### 4) Version/tx tracking exists in adapters, not in core query graph semantics
+
+- Electric adapter has `txid` tracking (`awaitTxId`, `awaitMatch`, `up-to-date`, `subset-end` controls).
+- This gives useful consistency behavior for that source, but it is source-specific and not exposed as a general graph contract.
+
+---
+
+## Desired semantics
+
+The requested direction implies these invariants.
+
+### A. Stable transactional visibility token per query
+
+Each query result should have a monotonic "stable through" token (epoch/frontier-like), so consumers can reason about how complete the result is.
+
+### B. No premature join emission when data is missing
+
+If a join row depends on missing side data and an up-query was issued, the join must not emit a "final" row for that dependency until the up-query is satisfied for the required version horizon.
+
+### C. Eventual convergence despite sparse version streams
+
+We may skip intermediate versions in emitted rows, but eventual output must converge to the correct result once enough data has arrived.
+
+### D. Bounded state with controlled re-fetch
+
+Join state should be treated as a cache. Evict aggressively where safe, and rely on operator-routed up-queries for replay/fill.
+
+---
+
+## Proposed architecture
+
+### 1) Versioned row contract for on-demand collections
+
+Introduce a normalized row-stamp shape that can travel through sync adapters and operators.
+
+```ts
+type RowStamp = {
+  sourceId: string
+  lsn?: bigint | number | string
+  epoch?: number
+}
+```
+
+Possible representation choices:
+
+- **Non-breaking first step:** put stamp in `ChangeMessage.metadata`.
+- **Stronger typing later:** add first-class typed version fields in internal message envelopes.
+
+Key idea: LSN/source version and graph-level epoch are related but not identical.
+
+- `lsn` answers "which source commit did this row reflect?"
+- `epoch` answers "which global graph progress point did this message enter?"
+
+### Source frontier signal
+
+To make gating composable, sources should also expose a monotonic frontier/high-watermark (explicitly or by convention), e.g.:
+
+```ts
+type SourceFrontier = {
+  sourceId: string
+  stableLsn?: bigint | number | string
+  stableEpoch?: number
+}
+```
+
+---
+
+### 2) One mutable global query graph
+
+Introduce a graph manager (conceptual name: `GlobalQueryGraphManager`) that owns one runtime graph and supports attach/detach of logical queries.
+
+### Attach flow (conceptual)
+
+1. Normalize query IR into canonical operator fragments.
+2. Fingerprint each fragment (operator kind + normalized args + upstream fingerprints).
+3. Reuse existing nodes when fingerprint matches; create only missing nodes.
+4. Attach query sink to terminal node(s) and increment refcounts.
+5. Return current snapshot plus stability token.
+
+### Detach flow
+
+1. Decrement sink reference count.
+2. Garbage-collect unreachable nodes/operators and operator-local caches.
+3. Optionally keep warm caches for a short TTL if churn is high.
+
+### Why this matters
+
+- Natural sharing of common subplans (especially joins/filters/order nodes).
+- Shared backpressure and consistent frontier accounting.
+- Foundation for query-level and global-level up-query coalescing.
+
+---
+
+### 3) Up-queries routed through operators
+
+Move from "subscription directly calls source `loadSubset`" to "operators emit up-query needs and an up-query router executes them."
+
+### New internal message types (conceptual)
+
+```ts
+type UpQueryNeed = {
+  needId: string
+  sourceAlias: string
+  load: LoadSubsetOptions
+  requiredEpoch?: number
+  requiredLsn?: bigint | number | string
+  requestedByOperatorId: number
+}
+
+type UpQueryAck = {
+  needId: string
+  satisfied: boolean
+  satisfiedEpoch?: number
+  satisfiedLsn?: bigint | number | string
+  error?: unknown
+}
+```
+
+### Flow
+
+1. Join/lookup/index operator detects a hole (missing row/state).
+2. Operator emits `UpQueryNeed` into an internal control stream.
+3. Up-query router/planner:
+   - deduplicates needs,
+   - coalesces by source + compatible predicates,
+   - rewrites into richer `LoadSubsetOptions` when possible.
+4. Sync bridge executes source-specific `loadSubset`.
+5. Source changes re-enter graph as normal row deltas with stamps.
+6. Router emits `UpQueryAck` and advances obligation state.
+
+This makes up-queries composable and observable within the graph itself.
+
+---
+
+### 4) Partial-state joins (minimum practical state)
+
+Treat join indexes as a hierarchy of caches, not durable truth.
+
+### State tiers
+
+1. **Obligation state (required):**
+   - unresolved needs,
+   - required epoch/lsn horizon,
+   - pending keys.
+2. **Key skeleton state (small):**
+   - key existence,
+   - minimal join attributes,
+   - refcount/last-access metadata.
+3. **Full row cache (evictable):**
+   - only hot rows required by active outputs/windows.
+
+### Eviction strategy
+
+- Evict cold full rows first.
+- Keep small key skeleton and obligations.
+- Re-issue up-query when evicted row is needed again.
+
+### Safety rule
+
+Eviction is safe only if unresolved dependencies are tracked via obligations so output gating remains correct.
+
+---
+
+### 5) Time/version model options
+
+#### Option S1: global single-dimensional time + source frontiers
+
+Representation:
+
+- Global epoch `e: number` for graph progress ordering.
+- Source-local LSNs for provenance.
+- Frontier map: `source -> stableEpoch/stableLsn`.
+
+Pros:
+
+- Much lower complexity than full multidimensional time.
+- Enough to express "do not emit join output until up-query for epoch `e` is satisfied."
+- Good fit for immediate transactional tracking goals.
+
+Cons:
+
+- Less expressive for nested iterative operators/recursive fixed points.
+- May require conservative gating in complex feedback cases.
+
+#### Option S2: multidimensional time + antichain frontiers
+
+Representation:
+
+- Version vectors (for example `[epoch, iter]`, potentially more dimensions).
+- Antichain frontiers as in Differential-style progress tracking.
+
+Pros:
+
+- Strongest formal model for recursion/feedback and concurrent iterative subcomputations.
+- More precise progress and less conservative gating in advanced plans.
+
+Cons:
+
+- Significant complexity tax across streams/operators/runtime APIs.
+- Larger cognitive and implementation overhead for debugging/tooling.
+
+#### Practical recommendation
+
+For this evolution, **S1 is the recommended first target**:
+
+- satisfies transactional up-query gating goals,
+- keeps runtime changes tractable,
+- remains compatible with later expansion to S2 if recursion/frontier precision demands it.
+
+---
+
+## Compiled pipeline sketch (operator-routed up-query + gating)
+
+Pseudo-code showing the desired shape:
+
+```ts
+// Source streams emit row deltas + stamps.
+const users = sourceInput(`users`)   // [key, row, stamp]
+const orders = sourceInput(`orders`) // [key, row, stamp]
+
+const usersById = users.pipe(
+  indexBy(([, row]) => row.id, { evictable: true }),
+)
+
+// Join operator can emit:
+// - joined rows when right side is present
+// - UpQueryNeed when right side is missing
+const joinResult = orders.pipe(
+  lookupJoinWithUpquery({
+    rightIndex: usersById,
+    rightKey: (orderRow) => orderRow.userId,
+    makeNeed: (orderRow, stamp) => ({
+      sourceAlias: `users`,
+      load: { where: eq(ref(`id`), val(orderRow.userId)), limit: 1 },
+      requiredEpoch: stamp.epoch,
+    }),
+  }),
+)
+
+const needs = joinResult.needs
+const candidates = joinResult.rows
+
+const upqueryAcks = needs.pipe(
+  coalesceNeeds(),
+  routeToSyncLoadSubset(), // executes via source sync adapters
+)
+
+const stableRows = candidates.pipe(
+  gateByObligations({
+    acks: upqueryAcks,
+    sourceFrontiers: sourceFrontierStream,
+    canEmit: (candidate, obligationState) =>
+      obligationState.isSatisfied(candidate.requiredNeeds, candidate.requiredEpoch),
+  }),
+)
+
+const output = stableRows.pipe(projectFinalShape())
+```
+
+Important property: the query graph itself carries both data and control obligations.
+
+---
+
+## Query output contract (constant transactional guarantees)
+
+Expose query-level stability metadata alongside rows:
+
+```ts
+type QueryStability = {
+  stableEpoch?: number
+  sourceFrontiers: Record<string, number | string | bigint | undefined>
+  pendingObligations: number
+}
+```
+
+Interpretation:
+
+- Rows are guaranteed consistent through `stableEpoch` / frontiers.
+- If `pendingObligations > 0`, output may still be incomplete due to outstanding up-queries.
+- As obligations resolve and frontiers advance, snapshots converge.
+
+This matches the intended "may miss versions now, eventually answer correctly" model.
+
+---
+
+## Mapping to Noria concepts
+
+Noria-inspired concept mapping:
+
+- **Partial materialization / holes** -> partial-state join with explicit obligations.
+- **Upqueries** -> operator-emitted `UpQueryNeed` + router + ack stream.
+- **Replay/backfill paths** -> source reload via `loadSubset` through graph control plane.
+- **Consistency progress** -> frontier/stability tokens at query outputs.
+
+This preserves the core spirit (minimum resident state + on-demand replay) while fitting TanStack DB's live query model.
+
+---
+
+## Likely implementation touchpoints in this repo
+
+If this direction is implemented incrementally, the likely first touchpoints are:
+
+- `packages/db/src/query/live/collection-config-builder.ts`
+  - transition from per-live-query graph ownership toward global graph manager integration.
+- `packages/db/src/query/compiler/index.ts` and `packages/db/src/query/compiler/joins.ts`
+  - emit/operator plans for up-query control streams and obligation-gated joins.
+- `packages/db/src/collection/subscription.ts`
+  - migrate direct snapshot-triggered up-query calls toward routed control-plane hooks.
+- `packages/db/src/collection/sync.ts` and `packages/db/src/types.ts`
+  - extend `loadSubset` contract for explicit acknowledgements/frontier metadata.
+- `packages/db-ivm/src/*`
+  - add internal message shape support for stamps/frontiers and control streams.
+- `packages/query-db-collection/src/query.ts` and adapter packages (`electric-db-collection`, etc.)
+  - provide source frontier/version metadata and ack semantics from concrete sync implementations.
+
+---
+
+## Phased rollout plan
+
+### Phase 0: instrumentation and invariants
+
+- Add metrics for:
+  - join index memory,
+  - loadSubset volume and latency,
+  - duplicate up-query ratio,
+  - time-to-stable for live queries.
+- Add invariant checks around monotonic stability token progression.
+
+### Phase 1: operator-routed up-query control plane (no global epoch yet)
+
+- Introduce `UpQueryNeed`/`UpQueryAck` internal streams.
+- Route existing lazy join loading through router operator instead of direct calls.
+- Keep current runtime mostly intact; gate only by ack completion.
+
+### Phase 2: global mutable query graph manager
+
+- Implement canonical node matching and sink attach/detach.
+- Move per-query compilation into "compile-and-merge" against global graph.
+- Add reference-counted node lifecycle and cleanup.
+
+### Phase 3: single-dimensional time/frontier (Option S1)
+
+- Add epoch stamping and source frontier propagation.
+- Upgrade obligation gate from "ack only" to "ack + frontier >= required epoch".
+- Expose query stability token to consumers.
+
+### Phase 4: partial-state join eviction
+
+- Introduce tiered join caches (obligation + key skeleton + evictable full rows).
+- Add adaptive eviction policy and anti-thrash controls.
+
+### Phase 5 (optional): multidimensional time (Option S2)
+
+- Re-evaluate only if recursion/feedback precision requires it.
+- Keep S1 as default for most plans.
+
+---
+
+## Tests needed for confidence
+
+1. **No premature join rows**
+   - Join with missing side row must not emit final row until up-query resolves.
+2. **Monotonic stability**
+   - `stableEpoch`/frontier tokens must never move backward.
+3. **Eventual convergence**
+   - Delayed up-query responses eventually produce the correct snapshot.
+4. **Eviction safety**
+   - Evict/reload cycles preserve correctness under concurrent writes.
+5. **Shared graph correctness**
+   - Query attach/detach does not leak state or cross-contaminate outputs.
+
+---
+
+## Open questions
+
+1. Should `LoadSubsetFn` return an explicit ack payload (not only `Promise<void>`) for stronger obligation accounting?
+2. Do we expose stability metadata in `useLiveQuery` APIs directly, or via debug/internal API first?
+3. How should cross-source transactional guarantees be defined when sources provide incomparable LSN domains?
+4. What are the minimal frontier semantics required for non-Electric sources?
+5. Which operators should become up-query-capable first (join, order/limit, recursive operator)?
+
+---
+
+## Recommended direction
+
+Build toward:
+
+1. **operator-routed up-queries**,  
+2. **single global mutable graph**, and  
+3. **single-dimensional epoch/frontier semantics (S1)** as the default transaction-tracking layer.
+
+This reaches the requested guarantees with a manageable complexity step, while leaving a clear path to multidimensional time (S2) if and when recursion/frontier precision truly requires it.

From 394e0af7325a5d32a56674708818d495f0d71ee6 Mon Sep 17 00:00:00 2001
From: Cursor Agent <cursoragent@cursor.com>
Date: Sun, 1 Mar 2026 13:07:48 +0000
Subject: [PATCH 5/5] docs: add recursive-focused time model decision analysis

Co-authored-by: Sam Willis <samwillis@users.noreply.github.com>
---
 .../query-engine-evolution-exploration.md     | 63 ++++++++++++++-----
 1 file changed, 49 insertions(+), 14 deletions(-)

diff --git a/docs/guides/query-engine-evolution-exploration.md b/docs/guides/query-engine-evolution-exploration.md
index 72663f461..209619222 100644
--- a/docs/guides/query-engine-evolution-exploration.md
+++ b/docs/guides/query-engine-evolution-exploration.md
@@ -243,13 +243,36 @@ Cons:
 - Significant complexity tax across streams/operators/runtime APIs.
 - Larger cognitive and implementation overhead for debugging/tooling.
 
+#### Critical assessment for recursive includes and aggregates
+
+When we evaluate S1 vs S2 specifically for recursive query workloads (includes that recurse, per-level aggregates, and multi-level includes), the trade-off is less about "can we make it work?" and more about "where does complexity live?"
+
+| Dimension | S1 (single dimension) | S2 (multidimensional) |
+|---|---|---|
+| Recursive fixed-point progress | Requires operator-local iteration bookkeeping layered on top of global epoch | Native representation of outer txn + inner iter progress |
+| Recursive aggregates with retractions/deletes | Correctness often needs conservative barriers or localized recompute | Delta + frontier semantics are explicit, reducing ad-hoc recompute paths |
+| Multi-level include composition | Can become conservative/global when one branch is slow | Supports finer-grained partial progress across branches/subtrees |
+| Up-query gating under out-of-order arrivals | Feasible but tends toward custom gate logic per operator | Unified obligation/frontier reasoning across operators |
+| Future expressive operators (topK/limits within recursion, advanced feedback) | Higher risk of semantic corner cases and bespoke fixes | Better long-term foundation for expressive recursive plans |
+
+Critical observation:
+
+- **S1 minimizes early runtime complexity but shifts complexity into operator-specific logic over time.**
+- **S2 increases early runtime complexity but centralizes semantics, which usually lowers total complexity for expressive recursive evolution.**
+
 #### Practical recommendation
 
-For this evolution, **S1 is the recommended first target**:
+For this evolution, with a goal of expressive recursive queries, **S2 should be the default target**:
 
-- satisfies transactional up-query gating goals,
-- keeps runtime changes tractable,
-- remains compatible with later expansion to S2 if recursion/frontier precision demands it.
+- it gives the cleanest semantic model for recursive includes, aggregates, and multi-level composition,
+- it avoids paying a migration tax later when S1 abstractions start to leak,
+- it provides clearer correctness invariants for up-query gating and eventual convergence.
+
+Adopt S2 with implementation guardrails:
+
+1. Keep the external API simple (query-level stable token/frontier summary), even if internal time is multidimensional.
+2. Scope V1 to minimal required dimensions (`[txn, iter]`), while keeping internal types extensible.
+3. Implement only the first recursive-capable operators initially (join/lookup + recursive/aggregate path), then expand.
 
 ---
 
@@ -382,21 +405,28 @@ If this direction is implemented incrementally, the likely first touchpoints are
 - Move per-query compilation into "compile-and-merge" against global graph.
 - Add reference-counted node lifecycle and cleanup.
 
-### Phase 3: single-dimensional time/frontier (Option S1)
+### Phase 3: multidimensional time/frontier core (Option S2)
+
+- Add version vectors (initially `[txn, iter]`) and antichain frontier plumbing in runtime/operator messages.
+- Upgrade obligation gate from "ack only" to multidimensional frontier-aware satisfaction checks.
+- Expose a simplified query stability token/frontier summary to consumers.
+
+### Phase 4: recursive includes and aggregates on S2
 
-- Add epoch stamping and source frontier propagation.
-- Upgrade obligation gate from "ack only" to "ack + frontier >= required epoch".
-- Expose query stability token to consumers.
+- Implement recursive include operator path against S2 frontier semantics.
+- Implement recursive aggregate correctness tests (insert/update/delete/retract scenarios).
+- Validate multi-level include behavior with mixed loaded/unloaded branches.
 
-### Phase 4: partial-state join eviction
+### Phase 5: partial-state join eviction
 
 - Introduce tiered join caches (obligation + key skeleton + evictable full rows).
 - Add adaptive eviction policy and anti-thrash controls.
+- Ensure eviction/reload correctness under recursive pipelines.
 
-### Phase 5 (optional): multidimensional time (Option S2)
+### Phase 6 (optional): dimension expansion beyond `[txn, iter]`
 
-- Re-evaluate only if recursion/feedback precision requires it.
-- Keep S1 as default for most plans.
+- Add extra dimensions only when required by concrete operators/use-cases.
+- Keep dimensionality minimal by default to control complexity.
 
 ---
 
@@ -412,6 +442,10 @@ If this direction is implemented incrementally, the likely first touchpoints are
    - Evict/reload cycles preserve correctness under concurrent writes.
 5. **Shared graph correctness**
    - Query attach/detach does not leak state or cross-contaminate outputs.
+6. **Recursive aggregate correctness**
+   - Aggregates over recursive includes remain correct under inserts, updates, and retractions.
+7. **Multi-level include progress isolation**
+   - Slow/deep branches do not unnecessarily block stable emission of unrelated branches.
 
 ---
 
@@ -422,6 +456,7 @@ If this direction is implemented incrementally, the likely first touchpoints are
 3. How should cross-source transactional guarantees be defined when sources provide incomparable LSN domains?
 4. What are the minimal frontier semantics required for non-Electric sources?
 5. Which operators should become up-query-capable first (join, order/limit, recursive operator)?
+6. Under what concrete conditions do we need dimensions beyond `[txn, iter]`?
 
 ---
 
@@ -431,6 +466,6 @@ Build toward:
 
 1. **operator-routed up-queries**,  
 2. **single global mutable graph**, and  
-3. **single-dimensional epoch/frontier semantics (S1)** as the default transaction-tracking layer.
+3. **multidimensional version/frontier semantics (S2)** as the internal default for recursive correctness.
 
-This reaches the requested guarantees with a manageable complexity step, while leaving a clear path to multidimensional time (S2) if and when recursion/frontier precision truly requires it.
+Given the goal of iterating quickly on expressive recursive queries (includes with aggregates and multi-level includes), taking on S2 complexity early is likely the lower total-cost path.