Add two deeper concept pages: Schema as a Workflow Specification + Comparison to Workflow Languages#185
Merged
Conversation
…mparison to Workflow Languages
Two new pages under Concepts > Data Model that follow from the
Relational Workflow Model overview and address the informed-reader
questions the overview page cannot answer in its scope:
1. Schema as a Workflow Specification
- Names the Relational Workflow Model as DataJoint's major innovation
- Describes the schema as a formal language: grammar (annotated DDL
excerpt for the Scan / AverageFrame / SegmentationParam /
Segmentation pipeline), typed semantics (three-condition existence
rule for a Computed row), the make() contract recording the git
hash of the producing code, the five-operator algebra with
closure, the type system, populate() as the self-healing engine
that brings the world into compliance with the schema, and
machine-readability / export pathways (DOT, Mermaid, YAML, JSON,
W3C PROV, OpenLineage, PROV-O, workflow-language conversion).
- Closes with the schema-as-control-plane framing (parallel to
routing tables in a network control plane).
2. Comparison to Workflow Languages
- Fair, structural comparison against CWL, Snakemake, Nextflow
(file-based workflows) and Airflow, Argo, Prefect, Dagster (task
orchestrators). Adjacent categories (data catalogs, lakehouses)
noted but flagged as solving different problems.
- Side-by-side table across nine concerns (data structure, types,
FK integrity, computation, execution order, provenance, drift
detection, query interface, retry semantics).
- What workflow languages offer, what they omit, DataJoint's
deliberate trade-off (paraphrasing Section 5 of Yatsenko & Nguyen
2026).
- Convertibility: any CWL workflow translates mechanically to a
DataJoint schema and back, with the data-structure layer the
workflow language omits supplied on conversion. GATK WGS pipeline
used as the empirical reference.
- "When to choose what" guidance including the "use both" pattern
(DataJoint inside an Airflow / Argo / Prefect orchestration).
Nav: both pages inserted under Concepts > Data Model after Relational
Workflow Model and before Entity Integrity, in mkdocs.yaml.
…ines Cohesion pass after adding Schema as a Workflow Specification and Comparison to Workflow Languages: - Nav (mkdocs.yaml): move the two new pages to the end of the Data Model group so the progression reads paradigm > components > synthesis > comparison: Relational Workflow Model > Entity Integrity > Normalization > Computation Model > Schema as a Workflow Specification > Comparison to Workflow Languages. - Concepts index (explanation/index.md): add cards for both new pages. - FAQ (faq.md): the "Is DataJoint a Workflow Management System?" answer was duplicating the Comparison page; trim it to a two-paragraph pointer to the new page. - Data Pipelines (data-pipelines.md): the "Comparing Approaches" table was a mini version of the new Comparison page; trim to a short paragraph + pointer.
Merged
5 tasks
MilagrosMarin
approved these changes
Jun 14, 2026
dimitri-yatsenko
added a commit
that referenced
this pull request
Jun 14, 2026
Placeholder for follow-up work after #184 (expand RWM) and #185 (deeper concept pages) merge. Tracker file outlines what to trim, why, and how to pick the work up once both upstream PRs land. No content changes to docs source in this PR. The tracker file is to be deleted in the same commit that applies the trim.
dimitri-yatsenko
added a commit
that referenced
this pull request
Jun 14, 2026
The developed argument lives on the Comparison to Workflow Languages page (added in #185). The RWM page now mentions the trade-off in one paragraph and links out, preventing drift between two homes for the same argument. Removes the .github/follow-ups/ tracker that scheduled this work.
dimitri-yatsenko
added a commit
that referenced
this pull request
Jun 26, 2026
…ucture concrete-first (#186) * WIP tracker: trim "deliberate trade-off" prose from RWM concept page Placeholder for follow-up work after #184 (expand RWM) and #185 (deeper concept pages) merge. Tracker file outlines what to trim, why, and how to pick the work up once both upstream PRs land. No content changes to docs source in this PR. The tracker file is to be deleted in the same commit that applies the trim. * docs(rwm): trim "deliberate trade-off" prose; link to Comparison page The developed argument lives on the Comparison to Workflow Languages page (added in #185). The RWM page now mentions the trade-off in one paragraph and links out, preventing drift between two homes for the same argument. Removes the .github/follow-ups/ tracker that scheduled this work. * docs(rwm): align worked-example diagram with dj.Diagram notation Match the conventions from datajoint-python's dj.Diagram (diagram.py:1017-1082): - Manual: green rectangle (unchanged) - Lookup: plaintext — no border/fill (was a filled rectangle) - Imported: blue stadium-shaped node — closest Mermaid approximation to dj.Diagram's ellipse - Computed: red stadium-shaped node — same Drop the inline tier-name and make() annotations on each node; tier is now conveyed by shape and color alone, as in the real diagrams. A new lead paragraph spells out the convention so the reader can decode the diagram without a separate legend. * docs(rwm): restructure concrete-first; reframe as added interpretation Two structural cleanups on relational-workflow-model.md: Concrete-first ordering. Open with a tight paragraph naming the model, then lead with the worked example (diagram + walkthrough). The historical lineage (Codd/Chen/RWM three interpretations) now follows the example, placing DataJoint's contribution in context once the reader has a concrete pipeline to anchor on. The closing side-by-side reading table moves to the end of the page. Reframe as interpretation, not departure. Classical relational concepts (tables, rows, foreign keys, normalization, the query algebra) apply unchanged; RWM adds a semantic interpretation on top. Renamed and rewrote two sections to reflect this: - "Four shifts from the classical relational model" → "A semantic interpretation, not a departure" Bullets now read additively ("tables also represent workflow steps") rather than contrastively ("not merely categories"). - "From transactions to transformations" → "Two readings of the same schema" Lead-in clarifies both readings hold simultaneously. Column header changes from "Traditional view" to "Classical reading."
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The Relational Workflow Model concept page (overview / paradigm) and the
component pages under Concepts > Data Model (Entity Integrity,
Normalization, Computation Model) leave two reader needs unmet:
reader asks for the grammar, the typed semantics, the algebra, and the
machine-readable surface — Hal Stern's question on the June 12 call:
"Python is not a formal spec — is there a grammar? Can it be published
as YAML? Is there an API set for it?"
already know? A fair structural comparison against CWL, Snakemake,
Nextflow, Airflow, Argo, Prefect, and Dagster — and guidance on when
each fits.
This PR adds two new pages that close those gaps and integrates them
with the existing concept set.
Changes
New pages
explanation/schema-as-workflow-specification.md(~1,150 words)and positions the schema as the formal language expressing it
SegmentationParam, Segmentation) showing the
---separator,->foreign keys, codec types, tier decoration
make()as a typed function, git-hash code provenance per rowpopulate()brings the worldinto compliance with the schema
OpenLineage, PROV-O, workflow-language conversion
observable (parallel to network routing tables)
explanation/comparison-to-workflow-languages.md(~870 words)Snakemake, Nextflow) and task orchestrators (Airflow, Argo, Prefect,
Dagster), with adjacent categories (data catalogs, lakehouses) noted
but separated
integrity, computation spec, execution order, provenance, drift
detection, query interface, retry/idempotence)
trade-off (paraphrased from Yatsenko & Nguyen 2026 Section 5)
DataJoint schema and back; DataJoint adds the data-structure layer
that workflow languages omit; GATK WGS example referenced
(DataJoint inside an Airflow / Argo / Prefect orchestration)
Integration with existing concept set
mkdocs.yaml): place the two new pages at the end of theData Model group so the progression reads
paradigm > components > synthesis > comparison:
RWM > Entity Integrity > Normalization > Computation Model >
Schema as a Workflow Specification > Comparison to Workflow Languages.
explanation/index.md): cards added forboth new pages.
faq.md): the "Is DataJoint a Workflow Management System?"answer overlapped substantively with the new Comparison page; trimmed
it to a two-paragraph pointer.
data-pipelines.md): the "Comparing Approaches"table was a mini-version of the new Comparison page; trimmed to a
short paragraph + pointer.
Merge order with PR #184
Both new pages cross-reference the expanded Relational Workflow Model
page from PR #184. Suggested merge order:
If merged in the opposite order, the new pages still resolve their links
correctly — the cross-references just read against the older, shorter
RWM page until #184 lands.