Skip to content

Commit 7765b7a

Browse files
committed
document relation to PR #269 and fix scheduler diagram
- add "Relation to PR #269" section explaining what we adopted (dependency source, trait inference, completion tracker design, statefulness separation) and what we changed (row-group tasks instead of cell-level nodes, ROW_STREAMABLE omitted) - fix scheduler main loop diagram: add async_max_concurrent_row_groups admission step, pre-batch failure path (skip row group + release slot), and loop back to ADMIT after row group completion
1 parent 979e417 commit 7765b7a

2 files changed

Lines changed: 60 additions & 5 deletions

File tree

plans/346/async-generators-and-task-queue.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -533,6 +533,57 @@ Wire the new scheduler into `ColumnWiseDatasetBuilder`.
533533
out-of-order row group completion.
534534
- **Checkpoint file naming and format**: parquet files use the same naming scheme and schema.
535535

536+
## Relation to PR #269
537+
538+
PR #269 ("feat: add execution graph builder plan with reference implementation") is a
539+
companion design by @johnnygreco that we reviewed before finalising this plan. It proposes
540+
a static `ExecutionGraph` with typed node IDs (`CellNodeId`, `BarrierNodeId`), an
541+
`ExecutionTraits` flag enum, and a `CompletionTracker`. It intentionally stops at the
542+
graph/tracker layer; this plan covers the full stack on top of that foundation.
543+
544+
### What we adopted
545+
546+
- **Dependency source**: derive the graph from `required_columns` on existing configs,
547+
extended with a side-effect mapping for columns like `summary__trace`. No config schema
548+
changes in either approach.
549+
- **Trait inference from properties, not class names**: `GraphBuilder._infer_traits()`
550+
inspects `can_generate_from_scratch` and `get_generation_strategy()` rather than
551+
matching class names. We apply the same principle, keeping plugin generators compatible.
552+
- **Lightweight completion tracking**: a `dict[str, set[int]]` mapping column → completed
553+
rows, rather than materialising O(C × R) cell-level state. Our `CompletionTracker`
554+
follows the same design.
555+
- **Statefulness as a separate concern from execution strategy**: PR #269 separates
556+
execution traits (how a generator runs) from per-instance concurrency safety. We
557+
formalise this with the `is_stateful` property.
558+
559+
### What we changed, and why
560+
561+
**Row-group tasks instead of cell-level nodes.** PR #269 models every `(row, column)` pair
562+
as a virtual `CellNodeId`. Full-column generators become `BARRIER` nodes — a synthetic
563+
node that must complete before any output cells are ready. This faithfully models
564+
dependencies but creates a problem the PR itself flags as an open issue: a validation
565+
column anywhere in the pipeline blocks all checkpointing until the entire dataset
566+
completes, because no row is "done" until every column, including the barrier, finishes.
567+
568+
We scope full-column tasks to a **row group** (the existing `buffer_size` batch). The
569+
effective barrier is just the FULL_COLUMN task waiting for all rows *in that group* — not
570+
the whole dataset. Checkpoints happen as each row group completes, so a failure mid-run
571+
loses at most one batch.
572+
573+
**`ROW_STREAMABLE` trait omitted.** PR #269 introduces `is_row_streamable` so full-column
574+
generators that process rows independently (e.g., `ExpressionColumnGenerator`) can be
575+
scheduled row-by-row, recovering some pipelining within a barrier. In our row-group model
576+
this is unnecessary: even a full-column generator only blocks one batch, preserving
577+
checkpoint cadence without subdividing tasks further. Expression columns run in
578+
microseconds and are never the scheduling bottleneck. We note this as a potential
579+
follow-up if profiling shows otherwise.
580+
581+
**Scheduler and concurrency layers added.** PR #269 deliberately stops at the graph and
582+
tracker. Steps 1–3 of this plan (dependency map, completion tracker, task model) are
583+
directly informed by PR #269 and we treat it as the reference design for that layer. The
584+
remaining steps — scheduler, concurrency controls, retry/salvage, buffer manager, and
585+
builder integration — extend that foundation to a deployable implementation.
586+
536587
## Notes
537588

538589
### Out of scope for this PR

plans/346/diagrams.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,14 +33,18 @@ The overall orchestration flow from start to row group checkpoint.
3333

3434
```mermaid
3535
flowchart TD
36-
START[Start] --> SEED[Dispatch from_scratch tasks]
36+
START[Start] --> ADMIT[Admit row group\nacquire async_max_concurrent_row_groups slot]
37+
ADMIT --> SEED[Dispatch from_scratch tasks]
3738
SEED --> SEED_CHECK{Stateful generator?}
3839
SEED_CHECK -->|Yes| SER[Serialize per-instance\nrow group N before N+1]
39-
SEED_CHECK -->|No| PAR[Dispatch all row groups\nconcurrently]
40+
SEED_CHECK -->|No| PAR[Dispatch concurrently\nwithin admitted set]
4041
SER --> PRE
4142
PAR --> PRE
4243
43-
PRE[Pre-batch barrier\nrun processors, reset tracker] --> LOOP
44+
PRE[Pre-batch barrier\nrun processors, reset tracker] --> PRE_OK{Processor\nsucceeded?}
45+
PRE_OK -->|No| SKIP[Skip row group\nrelease semaphore slot]
46+
SKIP --> DONE
47+
PRE_OK -->|Yes| LOOP
4448
4549
LOOP[Query tracker:\nget_ready_tasks] --> READY{Tasks ready?}
4650
@@ -56,14 +60,14 @@ flowchart TD
5660
DEFERRED -->|No, or budget exhausted| RG_CHECK{Row group\ncomplete?}
5761
5862
RG_CHECK -->|Yes| POST[Post-batch processors]
59-
POST --> CP[Checkpoint to parquet\nfree memory]
63+
POST --> CP[Checkpoint to parquet\nfree memory\nrelease semaphore slot]
6064
CP --> DONE{All row groups\ndone?}
6165
6266
RG_CHECK -->|No| WAIT[Wait for in-flight\ntasks to complete]
6367
WAIT --> LOOP
6468
6569
DONE -->|Yes| FIN[Done]
66-
DONE -->|No| LOOP
70+
DONE -->|No| ADMIT
6771
```
6872

6973
## 3. Dependency Resolution Example

0 commit comments

Comments
 (0)