@@ -533,6 +533,57 @@ Wire the new scheduler into `ColumnWiseDatasetBuilder`.
533533 out-of-order row group completion.
534534- ** Checkpoint file naming and format** : parquet files use the same naming scheme and schema.
535535
536+ ## Relation to PR #269
537+
538+ PR #269 ("feat: add execution graph builder plan with reference implementation") is a
539+ companion design by @johnnygreco that we reviewed before finalising this plan. It proposes
540+ a static ` ExecutionGraph ` with typed node IDs (` CellNodeId ` , ` BarrierNodeId ` ), an
541+ ` ExecutionTraits ` flag enum, and a ` CompletionTracker ` . It intentionally stops at the
542+ graph/tracker layer; this plan covers the full stack on top of that foundation.
543+
544+ ### What we adopted
545+
546+ - ** Dependency source** : derive the graph from ` required_columns ` on existing configs,
547+ extended with a side-effect mapping for columns like ` summary__trace ` . No config schema
548+ changes in either approach.
549+ - ** Trait inference from properties, not class names** : ` GraphBuilder._infer_traits() `
550+ inspects ` can_generate_from_scratch ` and ` get_generation_strategy() ` rather than
551+ matching class names. We apply the same principle, keeping plugin generators compatible.
552+ - ** Lightweight completion tracking** : a ` dict[str, set[int]] ` mapping column → completed
553+ rows, rather than materialising O(C × R) cell-level state. Our ` CompletionTracker `
554+ follows the same design.
555+ - ** Statefulness as a separate concern from execution strategy** : PR #269 separates
556+ execution traits (how a generator runs) from per-instance concurrency safety. We
557+ formalise this with the ` is_stateful ` property.
558+
559+ ### What we changed, and why
560+
561+ ** Row-group tasks instead of cell-level nodes.** PR #269 models every ` (row, column) ` pair
562+ as a virtual ` CellNodeId ` . Full-column generators become ` BARRIER ` nodes — a synthetic
563+ node that must complete before any output cells are ready. This faithfully models
564+ dependencies but creates a problem the PR itself flags as an open issue: a validation
565+ column anywhere in the pipeline blocks all checkpointing until the entire dataset
566+ completes, because no row is "done" until every column, including the barrier, finishes.
567+
568+ We scope full-column tasks to a ** row group** (the existing ` buffer_size ` batch). The
569+ effective barrier is just the FULL_COLUMN task waiting for all rows * in that group* — not
570+ the whole dataset. Checkpoints happen as each row group completes, so a failure mid-run
571+ loses at most one batch.
572+
573+ ** ` ROW_STREAMABLE ` trait omitted.** PR #269 introduces ` is_row_streamable ` so full-column
574+ generators that process rows independently (e.g., ` ExpressionColumnGenerator ` ) can be
575+ scheduled row-by-row, recovering some pipelining within a barrier. In our row-group model
576+ this is unnecessary: even a full-column generator only blocks one batch, preserving
577+ checkpoint cadence without subdividing tasks further. Expression columns run in
578+ microseconds and are never the scheduling bottleneck. We note this as a potential
579+ follow-up if profiling shows otherwise.
580+
581+ ** Scheduler and concurrency layers added.** PR #269 deliberately stops at the graph and
582+ tracker. Steps 1–3 of this plan (dependency map, completion tracker, task model) are
583+ directly informed by PR #269 and we treat it as the reference design for that layer. The
584+ remaining steps — scheduler, concurrency controls, retry/salvage, buffer manager, and
585+ builder integration — extend that foundation to a deployable implementation.
586+
536587## Notes
537588
538589### Out of scope for this PR
0 commit comments