Add two composite CallableModels for expressing the structural shape of a workflow.
Sequence holds an ordered list of child CallableModels and declares that they must execute one after another. Useful when ordering matters because of external side effects — for example, an ETL stage that must finish writing before the next stage reads, or a migration that must complete before a backfill.
Independent holds a list of child CallableModels and declares that they have no ordering relationship to one another. Whether they actually execute concurrently, in parallel across processes, distributed across workers, or simply one after another is left to the evaluator.
Data dependencies between steps are expressed the existing ccflow way: by composing models so that one contains another. These two primitives only declare ordering constraints, and can be freely nested (a Sequence of Independents of Sequences, etc.) to build arbitrary DAGs. The resulting graph is itself a CallableModel.
Execution policy — concurrency, scheduling, distribution, retries — is the evaluator's responsibility. The same Sequence/Independent graph should run unchanged under the existing GraphEvaluator, a future Ray-based evaluator, a future Celery-based evaluator, etc.
Open questions
- Naming.
Independent captures the structural property accurately but is less immediately recognizable than something like Parallel. Worth a final naming pass before settling.
- Result shape. Should the composite return a list keyed by position, a dict keyed by child name, or both via a configurable option? Names are friendlier for downstream consumers but require unique identifiers per child.
- Failure semantics. For
Independent, should a single child failure prevent siblings from being scheduled (fail-fast) or let them complete and report all errors? For Sequence, fail-fast is the obvious default but worth confirming. In either case the actual cancellation/scheduling behavior is the evaluator's call; the composite only declares the policy.
Add two composite
CallableModels for expressing the structural shape of a workflow.Sequenceholds an ordered list of childCallableModels and declares that they must execute one after another. Useful when ordering matters because of external side effects — for example, an ETL stage that must finish writing before the next stage reads, or a migration that must complete before a backfill.Independentholds a list of childCallableModels and declares that they have no ordering relationship to one another. Whether they actually execute concurrently, in parallel across processes, distributed across workers, or simply one after another is left to the evaluator.Data dependencies between steps are expressed the existing ccflow way: by composing models so that one contains another. These two primitives only declare ordering constraints, and can be freely nested (a
SequenceofIndependents ofSequences, etc.) to build arbitrary DAGs. The resulting graph is itself aCallableModel.Execution policy — concurrency, scheduling, distribution, retries — is the evaluator's responsibility. The same
Sequence/Independentgraph should run unchanged under the existingGraphEvaluator, a future Ray-based evaluator, a future Celery-based evaluator, etc.Open questions
Independentcaptures the structural property accurately but is less immediately recognizable than something likeParallel. Worth a final naming pass before settling.Independent, should a single child failure prevent siblings from being scheduled (fail-fast) or let them complete and report all errors? ForSequence, fail-fast is the obvious default but worth confirming. In either case the actual cancellation/scheduling behavior is the evaluator's call; the composite only declares the policy.