From bc0a256d9c5af37d7207e23b1450a65b2c3c0c0f Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Sun, 1 Mar 2026 11:33:58 +0000 Subject: [PATCH 1/5] docs: add recursive includes exploration note Co-authored-by: Sam Willis --- docs/guides/recursive-includes-exploration.md | 212 ++++++++++++++++++ 1 file changed, 212 insertions(+) create mode 100644 docs/guides/recursive-includes-exploration.md diff --git a/docs/guides/recursive-includes-exploration.md b/docs/guides/recursive-includes-exploration.md new file mode 100644 index 000000000..8ee79bdb4 --- /dev/null +++ b/docs/guides/recursive-includes-exploration.md @@ -0,0 +1,212 @@ +# Recursive includes exploration + +Status: draft design note +Branch context: `cursor/recursive-includes-exploration-41e3` (includes subqueries in `select`, `toArray`, nested includes, per-parent aggregate/order/limit behavior) + +## Goal + +Support recursive hierarchical projection (adjacency-list style), while preserving the current includes performance model: + +- one query-graph branch per include declaration (not one query per parent row), +- fan-out/materialization outside the query graph, +- incremental/live updates. + +Example target (conceptual): + +```ts +interface Node { + id: number + parentId: number +} +``` + +```ts +useLiveQuery((q) => { + const children = withQuery( + { pn: nodesCollection }, + q + .from({ cn: nodesCollection }) + .where(({ pn, cn }) => eq(cn.parentId, pn.id)) + .select(({ cn }) => ({ + ...cn, + children: children(cn), + })), + ) + + return q.from({ pn: nodesCollection }).select(({ pn }) => ({ + ...pn, + children: children(pn), + })) +}) +``` + +## How includes work today (important baseline) + +### 1) Builder/IR phase + +- In `buildNestedSelect` / `buildIncludesSubquery` (`packages/db/src/query/builder/index.ts`): + - subquery values inside `.select(...)` are converted to `IncludesSubquery` IR nodes, + - one correlation condition is extracted from child `.where(...)` (`eq(parentRef, childRef)`), + - correlation predicate is removed from child query and stored as: + - `correlationField` (parent side), + - `childCorrelationField` (child side), + - optional `toArray(...)` is carried via `materializeAsArray`. + +### 2) Compiler phase + +- In `compileQuery` (`packages/db/src/query/compiler/index.ts`): + - includes are extracted from select, + - each includes child query is compiled once (recursively) with a parent key stream, + - child input is inner-joined against parent correlation keys, + - child output tuple is `[result, orderByIndex, correlationKey]`, + - select includes entries are replaced with placeholders; real fan-out is deferred to output layer. + +This is the core "single branch, external fan-out" shape. + +### 3) Output/materialization phase + +- In `CollectionConfigBuilder` (`packages/db/src/query/live/collection-config-builder.ts`): + - one output callback per includes entry accumulates pending child deltas by + `Map>`, + - `flushIncludesState` creates/updates/disposes child Collections, + - `correlationToParentKeys` reverse index attaches one child Collection to all matching parents, + - nested includes are handled with buffered bottom-up flushing, + - `toArray` re-emits parent rows with array snapshots when child content changes. + +### Why this is fast + +- Query graph does not duplicate child queries per parent. +- Per-parent binding happens in output routing by correlation key. +- This is exactly the property we should preserve for recursion. + +## Why recursive includes are not directly possible yet + +1. **No fixed-point query representation in builder/IR** + - Current includes require a concrete child `QueryBuilder` now. + - There is no notion of "self call" node in IR. + +2. **Compilation assumes acyclic query references** + - `compileQuery` cache prevents duplicate work, but not cyclic query construction. + - A true self-referential query would recurse indefinitely without additional cycle handling. + +3. **Nested includes depth is static today** + - Existing nested includes are explicit finite nesting in AST (`project -> issue -> comment`). + - Recursive trees need unbounded/depth-dynamic expansion. + +4. **Output flushing is level-structured** + - Current nested buffering/routing works for known levels. + - Recursive trees need dynamic level creation and pruning. + +## Option space + +### Option A: Depth-limited unrolling (MVP-friendly) + +Idea: +- Introduce recursive syntax, but require `maxDepth`. +- Compile by unrolling into N nested includes nodes. + +Pros: +- Reuses almost all current implementation. +- Predictable complexity and easy testing. + +Cons: +- Not true recursion (cutoff behavior). +- Query/IR grows with depth. +- Not ideal for unknown depth trees. + +### Option B: Per-node dynamic child query instances + +Idea: +- At runtime create child query/subscription per discovered node. + +Pros: +- Easy to reason about. + +Cons: +- Violates performance goal (effectively N queries/subscriptions). +- High memory/loadSubset pressure on large trees. +- Hard to optimize globally. + +Conclusion: likely reject. + +### Option C: Shared recursive edge stream + recursive fan-out state (recommended medium-term) + +Idea: +- Compile a recursive declaration into one shared child/edge stream (same "one branch" principle). +- Maintain recursive adjacency/materialization state outside query graph: + - `childrenByParentKey`, + - reverse links for impacted-ancestor propagation, + - per-parent child Collection/array materialization. +- Recursively attach children using the same stream/state, not new query branches. + +Pros: +- Preserves core includes performance model. +- Supports unbounded depth. +- Keeps incremental/reactive behavior centralized in output layer. + +Cons: +- Non-trivial runtime/state-engine work. +- Needs explicit cycle policy and update semantics. + +### Option D: New query-graph recursive operator (transitive closure/fixpoint) + +Idea: +- Add dedicated incremental operator in `@tanstack/db-ivm` for recursive traversal. + +Pros: +- Most declarative and potentially most powerful long-term. + +Cons: +- Highest implementation complexity/risk. +- More invasive engine work before shipping user value. + +## Recommended staged plan + +### Phase 0: API and semantics RFC + +Decide: +- allowed graph shape (tree only vs DAG), +- cycle behavior (error, truncate, or dedupe by node key), +- ordering/limit semantics (`orderBy/limit` per parent at each depth), +- identity semantics (shared node object across paths vs per-path copy), +- whether recursion requires stable correlation key (likely yes). + +### Phase 1: Ship depth-limited recursion (Option A) + +- Good for early user feedback and type-system validation. +- Keeps current architecture almost unchanged. +- Enables concrete UX/API iteration (`withQuery` vs dedicated `recursiveInclude(...)` API). + +### Phase 2: Build shared recursive materializer (Option C) + +- Add a recursive includes IR node that represents a fixed-point/self call. +- Compile one child branch per declaration. +- Extend output-layer state machine to dynamic-depth traversal and impacted-ancestor propagation. +- Preserve existing non-recursive includes behavior as-is. + +### Phase 3 (optional/long-term): evaluate graph-level operator (Option D) + +- If runtime-layer complexity or performance ceilings appear, move recursion core into IVM. + +## Open questions to resolve early + +1. **Cycle policy**: What should happen on `A -> B -> A`? +2. **DAG duplication**: If node `X` is reachable from two parents, share instance or duplicate per path? +3. **Move semantics**: Parent change (`parentId` update) should re-home full subtree incrementally. +4. **Result keying**: Need robust key serialization for correlation values. +5. **Interplay with `toArray`**: re-emit boundaries and batching strategy for deep updates. +6. **Parent-referencing child filters**: align recursion design with parent-filtering includes work. + +## Practical next step + +Build a small RFC/POC on top of this branch with: + +- API sketch (including TypeScript inference expectations), +- Phase-1 depth-limited prototype (`maxDepth`), +- benchmark scenarios: + - deep chain, + - wide tree, + - subtree move, + - frequent leaf insert/delete. + +That gives fast signal on ergonomics and correctness before committing to full fixed-point execution. From 9556ff95f1191f840b1e56f911d1d32563b648ee Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Sun, 1 Mar 2026 11:55:51 +0000 Subject: [PATCH 2/5] docs: expand recursive includes option D design Co-authored-by: Sam Willis --- docs/guides/recursive-includes-exploration.md | 302 +++++++++++++----- 1 file changed, 224 insertions(+), 78 deletions(-) diff --git a/docs/guides/recursive-includes-exploration.md b/docs/guides/recursive-includes-exploration.md index 8ee79bdb4..229d97619 100644 --- a/docs/guides/recursive-includes-exploration.md +++ b/docs/guides/recursive-includes-exploration.md @@ -97,116 +97,262 @@ This is the core "single branch, external fan-out" shape. - Current nested buffering/routing works for known levels. - Recursive trees need dynamic level creation and pruning. -## Option space +## Option status after this exploration -### Option A: Depth-limited unrolling (MVP-friendly) +- **Option A (depth-limited unrolling):** compelling MVP route and syntax-compatible with a stronger future implementation. +- **Option B (per-node dynamic queries):** rejected. +- **Option C (output-layer recursive materializer):** currently less compelling given desire to solve recursion at the IVM graph level. +- **Option D (new recursive IVM operator):** most compelling long-term direction. -Idea: -- Introduce recursive syntax, but require `maxDepth`. -- Compile by unrolling into N nested includes nodes. +The rest of this document focuses on how to make Option D practical in `db-ivm`, while avoiding global multidimensional time unless absolutely required. -Pros: -- Reuses almost all current implementation. -- Predictable complexity and easy testing. +## Option D deep dive: recursive operator in `@tanstack/db-ivm` -Cons: -- Not true recursion (cutoff behavior). -- Query/IR grows with depth. -- Not ideal for unknown depth trees. +### Key observations from the current codebases -### Option B: Per-node dynamic child query instances +1. `db-ivm` intentionally removed version/frontier machinery and runs until local quiescence (`D2.run()` loops while operators have pending work). +2. The original `d2ts` has `iterate` based on: + - version extension/truncation (`Version.extend()` / `truncate()`), + - per-iteration step (`applyStep()`), + - frontier coordination in `FeedbackOperator`. +3. The DBSP paper explicitly supports recursion (including non-monotonic recursion) and models recursive incrementalization with nested time dimensions. -Idea: -- At runtime create child query/subscription per discovered node. +So we have a useful tension: -Pros: -- Easy to reason about. +- Differential-style multidimensional time is expressive and principled. +- `db-ivm` is intentionally much simpler. +- We want recursion now, but do not want to pay the full complexity tax upfront. -Cons: -- Violates performance goal (effectively N queries/subscriptions). -- High memory/loadSubset pressure on large trees. -- Hard to optimize globally. +### What Differential/DBSP are telling us (and what to borrow) -Conclusion: likely reject. +From Differential and DBSP, the durable ideas to keep are: -### Option C: Shared recursive edge stream + recursive fan-out state (recommended medium-term) +1. **Recursion should be fixed-point computation over deltas**, not repeated full recomputation. +2. **Semi-naive style propagation** (only newly discovered tuples drive next iteration) is essential. +3. **Strict feedback / convergence discipline** is mandatory to avoid non-termination. +4. **Two notions of progress exist conceptually**: + - outer progress (incoming transaction/update), + - inner progress (loop iteration). -Idea: -- Compile a recursive declaration into one shared child/edge stream (same "one branch" principle). -- Maintain recursive adjacency/materialization state outside query graph: - - `childrenByParentKey`, - - reverse links for impacted-ancestor propagation, - - per-parent child Collection/array materialization. -- Recursively attach children using the same stream/state, not new query branches. +The implementation question is whether we must expose both dimensions in the public runtime timestamp model. -Pros: -- Preserves core includes performance model. -- Supports unbounded depth. -- Keeps incremental/reactive behavior centralized in output layer. +### Can we avoid global multidimensional time? -Cons: -- Non-trivial runtime/state-engine work. -- Needs explicit cycle policy and update semantics. +**Yes, as a first-class engineering step**: keep one external time dimension (current `db-ivm` behavior), and model the recursion iteration dimension as *internal operator state*. -### Option D: New query-graph recursive operator (transitive closure/fixpoint) +Think of this as "local nested time" instead of "global timestamp vectors". -Idea: -- Add dedicated incremental operator in `@tanstack/db-ivm` for recursive traversal. +- External graph: unchanged, still versionless from the API perspective. +- Recursive operator internals: + - own work queue, + - own iteration counter/depth, + - own convergence checks. -Pros: -- Most declarative and potentially most powerful long-term. +This gives most of the practical value without changing every operator or stream type. -Cons: -- Highest implementation complexity/risk. -- More invasive engine work before shipping user value. +## Proposed operator shape (first pass) -## Recommended staged plan +### Conceptual API -### Phase 0: API and semantics RFC +```ts +recursiveFixpoint({ + roots, // stream of root entities / correlation keys + edges, // stream of adjacency edges + expand, // one-step expansion function (join-like) + options: { + maxDepth?: number, + cyclePolicy: 'dedupe-node' | 'allow-paths' | 'error', + deletionMode: 'recompute-affected' | 'support-counts', + }, +}) +``` + +For tree includes, `expand` is typically "follow `parentId -> id` edge one hop". + +### Output contract for includes + +Emit tuples keyed by child identity, with payload that includes: + +- `correlationKey` (root/parent scope key for fan-out), +- `nodeKey` (child key), +- `depth`, +- optional `parentNodeKey` (for deterministic tree reconstruction), +- optional stable order token. + +This stays compatible with current includes output routing (`correlationKey` fan-out remains outside graph). + +## Internal algorithm sketch (no global multidimensional time) + +### State + +Per recursive operator instance: + +- `edgeIndex`: parentNodeKey -> children +- `reverseEdgeIndex`: childNodeKey -> parents (for deletes) +- `rootsIndex`: active roots +- `reachable`: map `(rootKey, nodeKey) -> state` + - at minimum: present/not-present, depth + - for robust deletions: support count / witness set +- `frontierQueue`: pending delta tuples for next expansion wave + +### Insert propagation (semi-naive) + +1. Ingest root/edge inserts as delta. +2. Seed `frontierQueue` with new reachable facts. +3. Loop until queue empty: + - pop wave, + - expand one hop via `edgeIndex`, + - apply cycle/dedupe policy, + - emit only net-new tuples, + - enqueue only newly-added tuples for next wave. + +This is standard semi-naive fixed-point iteration inside one operator run. + +### Delete propagation: two viable modes + +#### Mode 1: recompute-affected (simpler, good first cut) + +- On edge/root delete, identify affected roots/subgraph. +- Retract previously emitted tuples for affected scope. +- Recompute fixed point for that affected scope from current base data. + +Tradeoff: +- simpler correctness, +- potentially expensive on large deletions. + +#### Mode 2: support-counts / witnesses (full incremental) + +- Track derivation support per `(root,node)` tuple. +- Inserts increment support and may cross 0 -> positive (emit insert). +- Deletes decrement support and may cross positive -> 0 (emit delete), then cascade. + +Tradeoff: +- best incremental behavior, +- more state and complexity (especially for DAGs with many alternative paths). + +## Cycle, DAG, and depth semantics + +### Cycle policy + +Recommended default: `dedupe-node` by `(rootKey,nodeKey)`. + +- Guarantees termination on finite graphs. +- Produces one materialized node per root, not one row per path. -Decide: -- allowed graph shape (tree only vs DAG), -- cycle behavior (error, truncate, or dedupe by node key), -- ordering/limit semantics (`orderBy/limit` per parent at each depth), -- identity semantics (shared node object across paths vs per-path copy), -- whether recursion requires stable correlation key (likely yes). +Alternative `allow-paths` is much heavier (potential explosion), and should be opt-in. -### Phase 1: Ship depth-limited recursion (Option A) +### Depth handling (the "inject depth per iteration" idea) -- Good for early user feedback and type-system validation. -- Keeps current architecture almost unchanged. -- Enables concrete UX/API iteration (`withQuery` vs dedicated `recursiveInclude(...)` API). +Depth can be treated as the operator's internal iteration coordinate: -### Phase 2: Build shared recursive materializer (Option C) +- `depth=0` at root seed (or `1` at first child hop; pick one and document), +- each expansion increments depth by 1. -- Add a recursive includes IR node that represents a fixed-point/self call. -- Compile one child branch per declaration. -- Extend output-layer state machine to dynamic-depth traversal and impacted-ancestor propagation. -- Preserve existing non-recursive includes behavior as-is. +This supports: -### Phase 3 (optional/long-term): evaluate graph-level operator (Option D) +- optional `maxDepth` stopping criterion (Option A compatibility), +- deterministic breadth-first layering, +- future APIs that expose depth/path metadata. -- If runtime-layer complexity or performance ceilings appear, move recursion core into IVM. +Important: with dedupe-by-node, keep the minimal depth seen for each `(root,node)`. -## Open questions to resolve early +## Why this is syntax-compatible with Option A -1. **Cycle policy**: What should happen on `A -> B -> A`? -2. **DAG duplication**: If node `X` is reachable from two parents, share instance or duplicate per path? -3. **Move semantics**: Parent change (`parentId` update) should re-home full subtree incrementally. -4. **Result keying**: Need robust key serialization for correlation values. -5. **Interplay with `toArray`**: re-emit boundaries and batching strategy for deep updates. -6. **Parent-referencing child filters**: align recursion design with parent-filtering includes work. +If we introduce recursive query syntax now, we can compile it in two different ways without API break: -## Practical next step +1. **MVP path**: unroll to `maxDepth` nested includes (Option A). +2. **Future path**: compile to `recursiveFixpoint` operator (Option D). -Build a small RFC/POC on top of this branch with: +Same user syntax, different backend strategy. -- API sketch (including TypeScript inference expectations), -- Phase-1 depth-limited prototype (`maxDepth`), -- benchmark scenarios: +## Integration points in TanStack DB + +### IR / builder + +Add a recursive include IR form (placeholder naming): + +- `RecursiveIncludesSubquery`: + - base child query, + - self reference marker, + - correlation metadata, + - options (`maxDepth`, cycle policy, etc.). + +### Compiler + +When recursive IR is detected: + +- emit one recursive operator branch in `compileQuery`, +- continue returning child rows with correlation metadata, +- keep select placeholder behavior (as done for includes now). + +### Output layer + +Largely unchanged core principle: + +- still fan out by correlation key in `flushIncludesState`, +- recursive operator only changes what child stream arrives, not where fan-out happens. + +## Concrete phased plan (A now, D in parallel) + +### Phase A1 (MVP) + +- Implement depth-limited recursive syntax with explicit `maxDepth`. +- Compile by unrolling. +- Land tests for: - deep chain, - wide tree, - subtree move, - - frequent leaf insert/delete. + - cycle handling under `maxDepth`. + +### Phase D0 (operator spike, behind flag) + +- Add internal `recursiveFixpoint` operator with: + - inserts + updates, + - delete handling via recompute-affected mode. +- Tree-first semantics (`dedupe-node`, stable keys). +- Benchmark against Option A at moderate depths. + +### Phase D1 (full incremental deletes) + +- Add support counts / witnesses. +- Expand to robust DAG behavior. +- Add stress tests for high churn and subtree re-parenting. + +### Phase D2 (only if needed) + +- Revisit whether global multidimensional time/frontiers are necessary. +- Only escalate if concrete workloads show correctness/performance gaps that local iteration cannot close cleanly. + +## Risks and mitigations + +1. **Delete complexity in DAGs** + - Mitigation: start with recompute-affected mode; gate support-count mode later. + +2. **State growth** + - Mitigation: strict dedupe policy by default; expose safeguards (`maxDepth`, optional per-root limits). + +3. **Non-termination under permissive path semantics** + - Mitigation: default `dedupe-node`; explicit opt-in for path semantics with hard limits. + +4. **Ordering instability across recursive updates** + - Mitigation: define deterministic order contract early (e.g., by depth then key, or explicit `orderBy` semantics per level). + +## Open questions to lock before implementation + +1. Node identity semantics for DAGs: + - one instance per `(root,node)` or per path? +2. Parent-child ordering semantics at each depth. +3. Whether subtree moves must be strongly incremental in v1 of Option D. +4. How much recursion metadata should be exposed (`depth`, `path`, `ancestor`). +5. Hard bounds for safe execution (depth, node-count, iteration-count). + +## References used in this exploration -That gives fast signal on ergonomics and correctness before committing to full fixed-point execution. +- Current TanStack DB includes pipeline: + - `packages/db/src/query/compiler/index.ts` + - `packages/db/src/query/live/collection-config-builder.ts` +- `d2ts` iterative machinery (pre-simplification reference): + - `packages/d2ts/src/operators/iterate.ts` + - `packages/d2ts/src/order.ts` +- DBSP paper (arXiv 2203.16684): + - abstract and sections 5-6 discuss recursion, fixed points, and nested time dimensions. From f74fa140986114d018a39fe3602c3ed6eb764115 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Sun, 1 Mar 2026 12:07:44 +0000 Subject: [PATCH 3/5] docs: add recursive pipeline sketch and option E analysis Co-authored-by: Sam Willis --- docs/guides/recursive-includes-exploration.md | 224 +++++++++++++++++- 1 file changed, 223 insertions(+), 1 deletion(-) diff --git a/docs/guides/recursive-includes-exploration.md b/docs/guides/recursive-includes-exploration.md index 229d97619..6459ea785 100644 --- a/docs/guides/recursive-includes-exploration.md +++ b/docs/guides/recursive-includes-exploration.md @@ -181,6 +181,83 @@ Emit tuples keyed by child identity, with payload that includes: This stays compatible with current includes output routing (`correlationKey` fan-out remains outside graph). +### Compiled pipeline sketch (Option D) + +Below is a concrete **pseudo-code sketch** of what compilation could emit for a +recursive include, using current `db-ivm`-style streams and operators. + +```ts +// Input collection stream: [nodeKey, nodeRow] +const nodesInput = inputs.cn + +// 1) Parent/root keys from already-filtered parent pipeline +// Shape: [correlationKey, { rootNodeKey }] +const parentKeys = parentPipeline.pipe( + map(([_, parentNsRow]) => { + const root = parentNsRow.pn + return [root.id, { rootNodeKey: root.id }] as const + }), +) + +// 2) One shared adjacency stream for recursion +// Shape: [parentId, { nodeKey, node }] +const edgesByParent = nodesInput.pipe( + map(([nodeKey, node]) => [node.parentId, { nodeKey, node }] as const), +) + +// 3) Seed stream (depth 0 or 1, depending on chosen convention) +// Shape: [correlationKey, RecursiveRow] +const seed = parentKeys.pipe( + map(([correlationKey, { rootNodeKey }]) => [ + correlationKey, + { + nodeKey: rootNodeKey, + parentNodeKey: null, + depth: 0, + }, + ] as const), +) + +// 4) Recursive fixed-point operator (new in Option D) +// Emits only net-new / net-removed recursive rows incrementally. +const recursiveRows = recursiveFixpoint({ + seed, + edgesByParent, + maxDepth, // optional (for syntax compatibility with Option A) + cyclePolicy: `dedupe-node`, // dedupe by (correlationKey,nodeKey) + deletionMode: `recompute-affected`, // first implementation +}) + +// 5) Join recursive rows back to base node rows to project final shape +// Shape: [childNodeKey, [childResult, orderByIndex?, correlationKey]] +const includesChildPipeline = recursiveRows.pipe( + // pseudocode for lookup/join; real implementation can use join/index operator + map(([_corr, rr]) => [rr.nodeKey, rr] as const), + join(nodesInput, `inner`), + map(([childNodeKey, [rr, nodeRow]]) => [ + childNodeKey, + [ + { + ...nodeRow, + __depth: rr.depth, + __parentNodeKey: rr.parentNodeKey, + }, + undefined, // orderBy index slot (kept for compatibility) + rr.correlationKey, + ], + ] as const), +) +``` + +And then this stream plugs into the existing includes output machinery: + +- `pendingChildChanges` accumulation, +- `correlationToParentKeys` fan-out, +- `flushIncludesState` materialization into child collections/arrays. + +So Option D changes the **source of child rows**, while preserving the existing +"single graph branch + external fan-out" architecture. + ## Internal algorithm sketch (no global multidimensional time) ### State @@ -292,7 +369,117 @@ Largely unchanged core principle: - still fan out by correlation key in `flushIncludesState`, - recursive operator only changes what child stream arrives, not where fan-out happens. -## Concrete phased plan (A now, D in parallel) +## Option E deep dive: reintroduce global time + frontiers + +Option E means bringing back the "version + frontier" execution model (as in +`d2ts`) into `db-ivm`, then implementing recursion on top of that. + +Why it is worth considering: + +- explicit transaction tracking and ordering, +- stronger global progress semantics, +- cleaner foundation for multiple future iterative operators. + +### E1: global **single-dimensional** time + frontiers + +Use one global version coordinate (transaction epoch), and frontiers to mark +completion of each epoch: + +- input data arrives as `(version, delta)`, +- frontier advance means "no more data < frontier", +- all operators become version-aware again. + +Recursion can still use a local operator loop internally (like Option D), but +its outputs are tagged with the same outer version. + +Sketch: + +```ts +const graph = new D2({ initialFrontier: 0 }) +const nodes = graph.newInput() + +const parentKeys = compileParent(... ) // stream of [corrKey, root] +const recursiveRows = parentKeys.pipe( + recursiveFixpoint({ /* same logic as Option D */ }), +) + +graph.finalize() + +// transaction N +nodes.sendData(42, nodeDelta) +nodes.sendFrontier(43) +graph.run() +``` + +What E1 brings: + +- stable transaction boundaries system-wide, +- better observability/debuggability ("which epoch produced this row"), +- easier consistency rules across multiple inputs. + +Cost of E1: + +- all streams/operators/messages become versioned again, +- frontier correctness needs to be restored across graph execution, +- substantial migration in `db-ivm` and compiler glue. + +### E2: global **multidimensional** time + frontiers (Differential-like) + +Use timestamps like `[txn, iter]` (or equivalent lattice tuples), with +antichain frontiers. + +This supports explicit iterative scopes: + +- entering recursion extends time: `[txn] -> [txn, 0]`, +- feedback increments iteration: `[txn, i] -> [txn, i + 1]`, +- leaving recursion truncates back to `[txn]`. + +Sketch (conceptual): + +```ts +const recursiveRows = parentKeys.pipe( + iterate((loop) => + loop.pipe( + expandOneHop(edgesByParent), + dedupeByNode(), + consolidate(), + ), + ), +) +``` + +What E2 brings: + +- principled semantics for nested iteration and recursion, +- strong alignment with Differential/DBSP theory, +- best long-term substrate for advanced recursive/incremental operators. + +Cost of E2: + +- largest complexity increase (time lattice + antichain logic everywhere), +- high implementation and maintenance burden, +- likely much slower path to user-visible value. + +## Option D vs Option E: what each really brings + +| Dimension | Option D (local recursive operator, versionless graph) | Option E1 (global single time + frontiers) | Option E2 (global multidimensional time) | +| --- | --- | --- | --- | +| Primary benefit | Fastest path to recursive includes | Strong transaction semantics | Most general recursive semantics | +| Scope of change | Mostly one operator + compiler wiring | Whole runtime message model | Whole runtime + time lattice model | +| Transaction tracking | Implicit / external to graph | Explicit and native | Explicit and native | +| Recursion semantics | Strong enough for tree/DAG with careful state | Similar to D unless iterate scopes added | First-class iterative scopes | +| Delivery risk | Low-medium | Medium-high | High | +| Performance overhead | Lowest base overhead | Moderate (version/frontier plumbing) | Highest (timestamp lattice machinery) | +| Future extensibility | Good, but local to recursive op | Better global control | Best theoretical headroom | +| Best use case | Ship recursive includes soon | Need explicit epoch correctness now | Need full Differential-like model | + +### Practical interpretation + +- If the goal is "ship recursive includes for common use cases quickly", **D wins**. +- If transaction-epoch correctness inside IVM is a hard requirement now, **E1 becomes compelling**. +- If we expect many recursive/time-nested operators and need a canonical model, **E2 is architecturally strongest** but expensive. + +## Concrete phased plan (A now, D/E decision gate) ### Phase A1 (MVP) @@ -323,6 +510,29 @@ Largely unchanged core principle: - Revisit whether global multidimensional time/frontiers are necessary. - Only escalate if concrete workloads show correctness/performance gaps that local iteration cannot close cleanly. +### Phase E0 (parallel design spike) + +- Specify minimal version/frontier contract needed for transaction tracking. +- Decide whether E1 alone is enough, or if E2 is actually required. +- Prototype cost estimate: + - number of operators touched, + - expected perf/memory delta, + - migration impact on db compiler/live query code. + +### Decision gate: choose D-only vs D+E1 vs E2 + +Evaluate with concrete workloads: + +- high-frequency transactions across multiple inputs, +- recursive subtree churn (insert/delete/move), +- observability/debugging needs by transaction. + +Choose: + +1. **D-only** if correctness/perf targets are met without global time. +2. **D + E1** if transaction tracking and epoch semantics are required system-wide. +3. **E2** only if E1 cannot satisfy recursive/iterative semantics needed by roadmap. + ## Risks and mitigations 1. **Delete complexity in DAGs** @@ -337,6 +547,15 @@ Largely unchanged core principle: 4. **Ordering instability across recursive updates** - Mitigation: define deterministic order contract early (e.g., by depth then key, or explicit `orderBy` semantics per level). +5. **Runtime-wide migration risk for Option E** + - Mitigation: do E0 spike first; quantify exact operator/runtime churn before committing. + +6. **Frontier/liveness bugs if global time returns** + - Mitigation: add invariant checks and dedicated tests for monotonic frontier advancement and quiescence. + +7. **Higher steady-state overhead with versioned messages** + - Mitigation: benchmark E1 against D on representative live-query workloads before deciding. + ## Open questions to lock before implementation 1. Node identity semantics for DAGs: @@ -345,6 +564,9 @@ Largely unchanged core principle: 3. Whether subtree moves must be strongly incremental in v1 of Option D. 4. How much recursion metadata should be exposed (`depth`, `path`, `ancestor`). 5. Hard bounds for safe execution (depth, node-count, iteration-count). +6. If Option E is chosen, should transaction tracking use: + - single-dimensional epochs only, or + - multidimensional `[txn, iter]` timestamps? ## References used in this exploration From b552f1d8a49da88674995ec9f4856e74396e5bd4 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Sun, 1 Mar 2026 12:46:26 +0000 Subject: [PATCH 4/5] docs: add query engine evolution exploration guide Co-authored-by: Sam Willis --- .../query-engine-evolution-exploration.md | 436 ++++++++++++++++++ 1 file changed, 436 insertions(+) create mode 100644 docs/guides/query-engine-evolution-exploration.md diff --git a/docs/guides/query-engine-evolution-exploration.md b/docs/guides/query-engine-evolution-exploration.md new file mode 100644 index 000000000..72663f461 --- /dev/null +++ b/docs/guides/query-engine-evolution-exploration.md @@ -0,0 +1,436 @@ +# Query engine evolution exploration + +Status: draft design note +Branch context: `cursor/recursive-includes-exploration-41e3` +Related: [Recursive includes exploration](./recursive-includes-exploration.md) + +## Goal + +Explore an evolution of the TanStack DB query engine with these requirements: + +1. **On-demand collections emit row version/LSN metadata** so queries can provide stable transactional guarantees. +2. **One mutable query graph for all live queries** (compile + match + amend), instead of one graph per live query collection. +3. **Minimize join index state** by treating it as computed-graph (CG) cache and re-fetching via up-queries when needed. +4. **Route up-queries through operators** so they can be transformed/composed by the query engine. +5. **Introduce single-dimensional or multidimensional version/time** so joins can withhold output until required up-queries are satisfied. + +This note is intentionally architecture-first and code-adjacent. + +--- + +## Current baseline (important context) + +### 1) Query graph lifetime is per live query collection + +- `CollectionConfigBuilder` builds a query IR and compiles a `D2` graph for that live query. +- `compileBasePipeline()` creates `new D2()` and input streams per alias. +- Graph/pipeline caches live within that builder instance and are reset on sync cleanup. + +Implication: there is no shared global graph structure across independent live queries today. + +### 2) `loadSubset` is invoked directly from subscriptions and join lazy-loading + +- `CollectionSubscription.requestSnapshot()` / `requestLimitedSnapshot()` call `collection._sync.loadSubset(...)`. +- Lazy join loading in the compiler (`joins.ts`) calls `lazySourceSubscription.requestSnapshot(...)` when join keys are missing. +- This means up-query intent is generated in query code paths, but execution is pushed directly to source sync handlers. + +Implication: up-query planning is not a first-class operator pipeline stage. + +### 3) Core `db-ivm` runtime is versionless + +- `@tanstack/db-ivm` streams carry diffs (`MultiSet`) without explicit timestamps/frontiers. +- `D2.run()` drives operators until quiescence based on pending input queues. +- There is no graph-level frontier contract in the current runtime. + +### 4) Version/tx tracking exists in adapters, not in core query graph semantics + +- Electric adapter has `txid` tracking (`awaitTxId`, `awaitMatch`, `up-to-date`, `subset-end` controls). +- This gives useful consistency behavior for that source, but it is source-specific and not exposed as a general graph contract. + +--- + +## Desired semantics + +The requested direction implies these invariants. + +### A. Stable transactional visibility token per query + +Each query result should have a monotonic "stable through" token (epoch/frontier-like), so consumers can reason about how complete the result is. + +### B. No premature join emission when data is missing + +If a join row depends on missing side data and an up-query was issued, the join must not emit a "final" row for that dependency until the up-query is satisfied for the required version horizon. + +### C. Eventual convergence despite sparse version streams + +We may skip intermediate versions in emitted rows, but eventual output must converge to the correct result once enough data has arrived. + +### D. Bounded state with controlled re-fetch + +Join state should be treated as a cache. Evict aggressively where safe, and rely on operator-routed up-queries for replay/fill. + +--- + +## Proposed architecture + +### 1) Versioned row contract for on-demand collections + +Introduce a normalized row-stamp shape that can travel through sync adapters and operators. + +```ts +type RowStamp = { + sourceId: string + lsn?: bigint | number | string + epoch?: number +} +``` + +Possible representation choices: + +- **Non-breaking first step:** put stamp in `ChangeMessage.metadata`. +- **Stronger typing later:** add first-class typed version fields in internal message envelopes. + +Key idea: LSN/source version and graph-level epoch are related but not identical. + +- `lsn` answers "which source commit did this row reflect?" +- `epoch` answers "which global graph progress point did this message enter?" + +### Source frontier signal + +To make gating composable, sources should also expose a monotonic frontier/high-watermark (explicitly or by convention), e.g.: + +```ts +type SourceFrontier = { + sourceId: string + stableLsn?: bigint | number | string + stableEpoch?: number +} +``` + +--- + +### 2) One mutable global query graph + +Introduce a graph manager (conceptual name: `GlobalQueryGraphManager`) that owns one runtime graph and supports attach/detach of logical queries. + +### Attach flow (conceptual) + +1. Normalize query IR into canonical operator fragments. +2. Fingerprint each fragment (operator kind + normalized args + upstream fingerprints). +3. Reuse existing nodes when fingerprint matches; create only missing nodes. +4. Attach query sink to terminal node(s) and increment refcounts. +5. Return current snapshot plus stability token. + +### Detach flow + +1. Decrement sink reference count. +2. Garbage-collect unreachable nodes/operators and operator-local caches. +3. Optionally keep warm caches for a short TTL if churn is high. + +### Why this matters + +- Natural sharing of common subplans (especially joins/filters/order nodes). +- Shared backpressure and consistent frontier accounting. +- Foundation for query-level and global-level up-query coalescing. + +--- + +### 3) Up-queries routed through operators + +Move from "subscription directly calls source `loadSubset`" to "operators emit up-query needs and an up-query router executes them." + +### New internal message types (conceptual) + +```ts +type UpQueryNeed = { + needId: string + sourceAlias: string + load: LoadSubsetOptions + requiredEpoch?: number + requiredLsn?: bigint | number | string + requestedByOperatorId: number +} + +type UpQueryAck = { + needId: string + satisfied: boolean + satisfiedEpoch?: number + satisfiedLsn?: bigint | number | string + error?: unknown +} +``` + +### Flow + +1. Join/lookup/index operator detects a hole (missing row/state). +2. Operator emits `UpQueryNeed` into an internal control stream. +3. Up-query router/planner: + - deduplicates needs, + - coalesces by source + compatible predicates, + - rewrites into richer `LoadSubsetOptions` when possible. +4. Sync bridge executes source-specific `loadSubset`. +5. Source changes re-enter graph as normal row deltas with stamps. +6. Router emits `UpQueryAck` and advances obligation state. + +This makes up-queries composable and observable within the graph itself. + +--- + +### 4) Partial-state joins (minimum practical state) + +Treat join indexes as a hierarchy of caches, not durable truth. + +### State tiers + +1. **Obligation state (required):** + - unresolved needs, + - required epoch/lsn horizon, + - pending keys. +2. **Key skeleton state (small):** + - key existence, + - minimal join attributes, + - refcount/last-access metadata. +3. **Full row cache (evictable):** + - only hot rows required by active outputs/windows. + +### Eviction strategy + +- Evict cold full rows first. +- Keep small key skeleton and obligations. +- Re-issue up-query when evicted row is needed again. + +### Safety rule + +Eviction is safe only if unresolved dependencies are tracked via obligations so output gating remains correct. + +--- + +### 5) Time/version model options + +#### Option S1: global single-dimensional time + source frontiers + +Representation: + +- Global epoch `e: number` for graph progress ordering. +- Source-local LSNs for provenance. +- Frontier map: `source -> stableEpoch/stableLsn`. + +Pros: + +- Much lower complexity than full multidimensional time. +- Enough to express "do not emit join output until up-query for epoch `e` is satisfied." +- Good fit for immediate transactional tracking goals. + +Cons: + +- Less expressive for nested iterative operators/recursive fixed points. +- May require conservative gating in complex feedback cases. + +#### Option S2: multidimensional time + antichain frontiers + +Representation: + +- Version vectors (for example `[epoch, iter]`, potentially more dimensions). +- Antichain frontiers as in Differential-style progress tracking. + +Pros: + +- Strongest formal model for recursion/feedback and concurrent iterative subcomputations. +- More precise progress and less conservative gating in advanced plans. + +Cons: + +- Significant complexity tax across streams/operators/runtime APIs. +- Larger cognitive and implementation overhead for debugging/tooling. + +#### Practical recommendation + +For this evolution, **S1 is the recommended first target**: + +- satisfies transactional up-query gating goals, +- keeps runtime changes tractable, +- remains compatible with later expansion to S2 if recursion/frontier precision demands it. + +--- + +## Compiled pipeline sketch (operator-routed up-query + gating) + +Pseudo-code showing the desired shape: + +```ts +// Source streams emit row deltas + stamps. +const users = sourceInput(`users`) // [key, row, stamp] +const orders = sourceInput(`orders`) // [key, row, stamp] + +const usersById = users.pipe( + indexBy(([, row]) => row.id, { evictable: true }), +) + +// Join operator can emit: +// - joined rows when right side is present +// - UpQueryNeed when right side is missing +const joinResult = orders.pipe( + lookupJoinWithUpquery({ + rightIndex: usersById, + rightKey: (orderRow) => orderRow.userId, + makeNeed: (orderRow, stamp) => ({ + sourceAlias: `users`, + load: { where: eq(ref(`id`), val(orderRow.userId)), limit: 1 }, + requiredEpoch: stamp.epoch, + }), + }), +) + +const needs = joinResult.needs +const candidates = joinResult.rows + +const upqueryAcks = needs.pipe( + coalesceNeeds(), + routeToSyncLoadSubset(), // executes via source sync adapters +) + +const stableRows = candidates.pipe( + gateByObligations({ + acks: upqueryAcks, + sourceFrontiers: sourceFrontierStream, + canEmit: (candidate, obligationState) => + obligationState.isSatisfied(candidate.requiredNeeds, candidate.requiredEpoch), + }), +) + +const output = stableRows.pipe(projectFinalShape()) +``` + +Important property: the query graph itself carries both data and control obligations. + +--- + +## Query output contract (constant transactional guarantees) + +Expose query-level stability metadata alongside rows: + +```ts +type QueryStability = { + stableEpoch?: number + sourceFrontiers: Record + pendingObligations: number +} +``` + +Interpretation: + +- Rows are guaranteed consistent through `stableEpoch` / frontiers. +- If `pendingObligations > 0`, output may still be incomplete due to outstanding up-queries. +- As obligations resolve and frontiers advance, snapshots converge. + +This matches the intended "may miss versions now, eventually answer correctly" model. + +--- + +## Mapping to Noria concepts + +Noria-inspired concept mapping: + +- **Partial materialization / holes** -> partial-state join with explicit obligations. +- **Upqueries** -> operator-emitted `UpQueryNeed` + router + ack stream. +- **Replay/backfill paths** -> source reload via `loadSubset` through graph control plane. +- **Consistency progress** -> frontier/stability tokens at query outputs. + +This preserves the core spirit (minimum resident state + on-demand replay) while fitting TanStack DB's live query model. + +--- + +## Likely implementation touchpoints in this repo + +If this direction is implemented incrementally, the likely first touchpoints are: + +- `packages/db/src/query/live/collection-config-builder.ts` + - transition from per-live-query graph ownership toward global graph manager integration. +- `packages/db/src/query/compiler/index.ts` and `packages/db/src/query/compiler/joins.ts` + - emit/operator plans for up-query control streams and obligation-gated joins. +- `packages/db/src/collection/subscription.ts` + - migrate direct snapshot-triggered up-query calls toward routed control-plane hooks. +- `packages/db/src/collection/sync.ts` and `packages/db/src/types.ts` + - extend `loadSubset` contract for explicit acknowledgements/frontier metadata. +- `packages/db-ivm/src/*` + - add internal message shape support for stamps/frontiers and control streams. +- `packages/query-db-collection/src/query.ts` and adapter packages (`electric-db-collection`, etc.) + - provide source frontier/version metadata and ack semantics from concrete sync implementations. + +--- + +## Phased rollout plan + +### Phase 0: instrumentation and invariants + +- Add metrics for: + - join index memory, + - loadSubset volume and latency, + - duplicate up-query ratio, + - time-to-stable for live queries. +- Add invariant checks around monotonic stability token progression. + +### Phase 1: operator-routed up-query control plane (no global epoch yet) + +- Introduce `UpQueryNeed`/`UpQueryAck` internal streams. +- Route existing lazy join loading through router operator instead of direct calls. +- Keep current runtime mostly intact; gate only by ack completion. + +### Phase 2: global mutable query graph manager + +- Implement canonical node matching and sink attach/detach. +- Move per-query compilation into "compile-and-merge" against global graph. +- Add reference-counted node lifecycle and cleanup. + +### Phase 3: single-dimensional time/frontier (Option S1) + +- Add epoch stamping and source frontier propagation. +- Upgrade obligation gate from "ack only" to "ack + frontier >= required epoch". +- Expose query stability token to consumers. + +### Phase 4: partial-state join eviction + +- Introduce tiered join caches (obligation + key skeleton + evictable full rows). +- Add adaptive eviction policy and anti-thrash controls. + +### Phase 5 (optional): multidimensional time (Option S2) + +- Re-evaluate only if recursion/feedback precision requires it. +- Keep S1 as default for most plans. + +--- + +## Tests needed for confidence + +1. **No premature join rows** + - Join with missing side row must not emit final row until up-query resolves. +2. **Monotonic stability** + - `stableEpoch`/frontier tokens must never move backward. +3. **Eventual convergence** + - Delayed up-query responses eventually produce the correct snapshot. +4. **Eviction safety** + - Evict/reload cycles preserve correctness under concurrent writes. +5. **Shared graph correctness** + - Query attach/detach does not leak state or cross-contaminate outputs. + +--- + +## Open questions + +1. Should `LoadSubsetFn` return an explicit ack payload (not only `Promise`) for stronger obligation accounting? +2. Do we expose stability metadata in `useLiveQuery` APIs directly, or via debug/internal API first? +3. How should cross-source transactional guarantees be defined when sources provide incomparable LSN domains? +4. What are the minimal frontier semantics required for non-Electric sources? +5. Which operators should become up-query-capable first (join, order/limit, recursive operator)? + +--- + +## Recommended direction + +Build toward: + +1. **operator-routed up-queries**, +2. **single global mutable graph**, and +3. **single-dimensional epoch/frontier semantics (S1)** as the default transaction-tracking layer. + +This reaches the requested guarantees with a manageable complexity step, while leaving a clear path to multidimensional time (S2) if and when recursion/frontier precision truly requires it. From 394e0af7325a5d32a56674708818d495f0d71ee6 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Sun, 1 Mar 2026 13:07:48 +0000 Subject: [PATCH 5/5] docs: add recursive-focused time model decision analysis Co-authored-by: Sam Willis --- .../query-engine-evolution-exploration.md | 63 ++++++++++++++----- 1 file changed, 49 insertions(+), 14 deletions(-) diff --git a/docs/guides/query-engine-evolution-exploration.md b/docs/guides/query-engine-evolution-exploration.md index 72663f461..209619222 100644 --- a/docs/guides/query-engine-evolution-exploration.md +++ b/docs/guides/query-engine-evolution-exploration.md @@ -243,13 +243,36 @@ Cons: - Significant complexity tax across streams/operators/runtime APIs. - Larger cognitive and implementation overhead for debugging/tooling. +#### Critical assessment for recursive includes and aggregates + +When we evaluate S1 vs S2 specifically for recursive query workloads (includes that recurse, per-level aggregates, and multi-level includes), the trade-off is less about "can we make it work?" and more about "where does complexity live?" + +| Dimension | S1 (single dimension) | S2 (multidimensional) | +|---|---|---| +| Recursive fixed-point progress | Requires operator-local iteration bookkeeping layered on top of global epoch | Native representation of outer txn + inner iter progress | +| Recursive aggregates with retractions/deletes | Correctness often needs conservative barriers or localized recompute | Delta + frontier semantics are explicit, reducing ad-hoc recompute paths | +| Multi-level include composition | Can become conservative/global when one branch is slow | Supports finer-grained partial progress across branches/subtrees | +| Up-query gating under out-of-order arrivals | Feasible but tends toward custom gate logic per operator | Unified obligation/frontier reasoning across operators | +| Future expressive operators (topK/limits within recursion, advanced feedback) | Higher risk of semantic corner cases and bespoke fixes | Better long-term foundation for expressive recursive plans | + +Critical observation: + +- **S1 minimizes early runtime complexity but shifts complexity into operator-specific logic over time.** +- **S2 increases early runtime complexity but centralizes semantics, which usually lowers total complexity for expressive recursive evolution.** + #### Practical recommendation -For this evolution, **S1 is the recommended first target**: +For this evolution, with a goal of expressive recursive queries, **S2 should be the default target**: -- satisfies transactional up-query gating goals, -- keeps runtime changes tractable, -- remains compatible with later expansion to S2 if recursion/frontier precision demands it. +- it gives the cleanest semantic model for recursive includes, aggregates, and multi-level composition, +- it avoids paying a migration tax later when S1 abstractions start to leak, +- it provides clearer correctness invariants for up-query gating and eventual convergence. + +Adopt S2 with implementation guardrails: + +1. Keep the external API simple (query-level stable token/frontier summary), even if internal time is multidimensional. +2. Scope V1 to minimal required dimensions (`[txn, iter]`), while keeping internal types extensible. +3. Implement only the first recursive-capable operators initially (join/lookup + recursive/aggregate path), then expand. --- @@ -382,21 +405,28 @@ If this direction is implemented incrementally, the likely first touchpoints are - Move per-query compilation into "compile-and-merge" against global graph. - Add reference-counted node lifecycle and cleanup. -### Phase 3: single-dimensional time/frontier (Option S1) +### Phase 3: multidimensional time/frontier core (Option S2) + +- Add version vectors (initially `[txn, iter]`) and antichain frontier plumbing in runtime/operator messages. +- Upgrade obligation gate from "ack only" to multidimensional frontier-aware satisfaction checks. +- Expose a simplified query stability token/frontier summary to consumers. + +### Phase 4: recursive includes and aggregates on S2 -- Add epoch stamping and source frontier propagation. -- Upgrade obligation gate from "ack only" to "ack + frontier >= required epoch". -- Expose query stability token to consumers. +- Implement recursive include operator path against S2 frontier semantics. +- Implement recursive aggregate correctness tests (insert/update/delete/retract scenarios). +- Validate multi-level include behavior with mixed loaded/unloaded branches. -### Phase 4: partial-state join eviction +### Phase 5: partial-state join eviction - Introduce tiered join caches (obligation + key skeleton + evictable full rows). - Add adaptive eviction policy and anti-thrash controls. +- Ensure eviction/reload correctness under recursive pipelines. -### Phase 5 (optional): multidimensional time (Option S2) +### Phase 6 (optional): dimension expansion beyond `[txn, iter]` -- Re-evaluate only if recursion/feedback precision requires it. -- Keep S1 as default for most plans. +- Add extra dimensions only when required by concrete operators/use-cases. +- Keep dimensionality minimal by default to control complexity. --- @@ -412,6 +442,10 @@ If this direction is implemented incrementally, the likely first touchpoints are - Evict/reload cycles preserve correctness under concurrent writes. 5. **Shared graph correctness** - Query attach/detach does not leak state or cross-contaminate outputs. +6. **Recursive aggregate correctness** + - Aggregates over recursive includes remain correct under inserts, updates, and retractions. +7. **Multi-level include progress isolation** + - Slow/deep branches do not unnecessarily block stable emission of unrelated branches. --- @@ -422,6 +456,7 @@ If this direction is implemented incrementally, the likely first touchpoints are 3. How should cross-source transactional guarantees be defined when sources provide incomparable LSN domains? 4. What are the minimal frontier semantics required for non-Electric sources? 5. Which operators should become up-query-capable first (join, order/limit, recursive operator)? +6. Under what concrete conditions do we need dimensions beyond `[txn, iter]`? --- @@ -431,6 +466,6 @@ Build toward: 1. **operator-routed up-queries**, 2. **single global mutable graph**, and -3. **single-dimensional epoch/frontier semantics (S1)** as the default transaction-tracking layer. +3. **multidimensional version/frontier semantics (S2)** as the internal default for recursive correctness. -This reaches the requested guarantees with a manageable complexity step, while leaving a clear path to multidimensional time (S2) if and when recursion/frontier precision truly requires it. +Given the goal of iterating quickly on expressive recursive queries (includes with aggregates and multi-level includes), taking on S2 complexity early is likely the lower total-cost path.