Expand Relational Workflow Model concept page by dimitri-yatsenko · Pull Request #184 · datajoint/datajoint-docs

dimitri-yatsenko · 2026-06-13T17:43:09Z

Context

The current Relational Workflow Model (RWM) concept page (src/explanation/relational-workflow-model.md) is understated relative to the model's significance. It reads as a brief positioning statement rather than as an entry point that lands the structural argument for a reader who already knows informatics (databases, FK graphs, ER modeling, workflow managers, lakehouses).

This PR expands the page to function as that entry point — the audience pictured is a knowledgeable peer (e.g., an infrastructure architect from pharma R&D evaluating where DataJoint sits in the landscape they already know).

Changes

Lead with the three-interpretations taxonomy (Codd / Chen / RWM) and the computational substrate framing from the DataJoint 2.0 preprint (Yatsenko & Nguyen, 2026, arXiv:2602.16585).
Name the surrounding tool categories explicitly and what each is silent on:
- File-based workflow systems (CWL, Snakemake, Nextflow) — fragment provenance across the filesystem
- Task orchestrators (Airflow, Argo, Prefect) — agnostic to data structure
- Data catalogs (DataHub, Atlan, Marquez) — describe data after it lands
- Lakehouses (Delta, Iceberg, Hudi) — treat computation as external
Add a worked-example pipeline (Mouse → Session → Scan → AverageFrame → Segmentation → Fluorescence, with SegmentationParam as Lookup) rendered as a mermaid diagram with tier-color classes.
Add a "deliberate trade-off" section that acknowledges the legitimate strengths of decoupled architectures and frames DataJoint's coupling as a chosen trade-off — directly drawn from the preprint Section 5.
Add a "substrate consequences" section that covers:
- Provenance and lineage as structural properties of the substrate (mapping to W3C PROV / OpenLineage is translation, not reconstruction)
- The five agent-substrate properties from the preprint: self-describing, safe by default, explicit dependencies, idempotent, observable
Preserve the existing detailed sections (table tiers, master-part, workflow normalization, entity integrity, query algebra with closure, transactions vs transformations) under a "Beneath the model" header for readers who want the structural detail.

Net change

+177 / -112 lines; one file.

Sources

Yatsenko & Nguyen, 2026 — DataJoint 2.0 whitepaper (computational substrate, four innovations, substrate properties for agents, deliberate-trade-off discussion)
Yatsenko et al., 2018 — original theoretical formalization (relational workflow model, query algebra)

Notes for reviewers

The mermaid diagram uses tier-color classDefs. If the docs site's mermaid theme overrides these, we may need to drop colors or adapt to the site theme.
Cross-references in See also all resolve against current src/explanation/ and src/how-to/ content.

The previous intro understated the model's significance. The expansion positions the RWM for an informatics-knowledgeable reader: - Lead with the three-interpretations taxonomy (Codd / Chen / RWM) and the computational-substrate framing from the DataJoint 2.0 preprint. - Name the surrounding tool categories explicitly (CWL/Snakemake/Nextflow, Airflow/Argo/Prefect, DataHub/Atlan/Marquez, Delta/Iceberg/Hudi) and what each is silent on. - Add a worked example pipeline (Mouse > Session > Scan > AverageFrame > Segmentation > Fluorescence, with SegmentationParam as Lookup) rendered as a mermaid diagram with tier colors. - Add a "deliberate trade-off" section addressing the legitimate strengths of decoupled architectures and why DataJoint accepts coupling. - Add a substrate-consequences section: provenance and lineage as structural properties (mapping to W3C PROV / OpenLineage is translation, not reconstruction), and the five agent-substrate properties (self-describing, safe by default, explicit dependencies, idempotent, observable) from the preprint. - Preserve the existing detailed sections (table tiers, master-part, normalization, entity integrity, query algebra, transactions vs transformations) under a "Beneath the model" header for readers who want the structural detail.

Placeholder for follow-up work after #184 (expand RWM) and #185 (deeper concept pages) merge. Tracker file outlines what to trim, why, and how to pick the work up once both upstream PRs land. No content changes to docs source in this PR. The tracker file is to be deleted in the same commit that applies the trim.

…ucture concrete-first (#186) * WIP tracker: trim "deliberate trade-off" prose from RWM concept page Placeholder for follow-up work after #184 (expand RWM) and #185 (deeper concept pages) merge. Tracker file outlines what to trim, why, and how to pick the work up once both upstream PRs land. No content changes to docs source in this PR. The tracker file is to be deleted in the same commit that applies the trim. * docs(rwm): trim "deliberate trade-off" prose; link to Comparison page The developed argument lives on the Comparison to Workflow Languages page (added in #185). The RWM page now mentions the trade-off in one paragraph and links out, preventing drift between two homes for the same argument. Removes the .github/follow-ups/ tracker that scheduled this work. * docs(rwm): align worked-example diagram with dj.Diagram notation Match the conventions from datajoint-python's dj.Diagram (diagram.py:1017-1082): - Manual: green rectangle (unchanged) - Lookup: plaintext — no border/fill (was a filled rectangle) - Imported: blue stadium-shaped node — closest Mermaid approximation to dj.Diagram's ellipse - Computed: red stadium-shaped node — same Drop the inline tier-name and make() annotations on each node; tier is now conveyed by shape and color alone, as in the real diagrams. A new lead paragraph spells out the convention so the reader can decode the diagram without a separate legend. * docs(rwm): restructure concrete-first; reframe as added interpretation Two structural cleanups on relational-workflow-model.md: Concrete-first ordering. Open with a tight paragraph naming the model, then lead with the worked example (diagram + walkthrough). The historical lineage (Codd/Chen/RWM three interpretations) now follows the example, placing DataJoint's contribution in context once the reader has a concrete pipeline to anchor on. The closing side-by-side reading table moves to the end of the page. Reframe as interpretation, not departure. Classical relational concepts (tables, rows, foreign keys, normalization, the query algebra) apply unchanged; RWM adds a semantic interpretation on top. Renamed and rewrote two sections to reflect this: - "Four shifts from the classical relational model" → "A semantic interpretation, not a departure" Bullets now read additively ("tables also represent workflow steps") rather than contrastively ("not merely categories"). - "From transactions to transformations" → "Two readings of the same schema" Lead-in clarifies both readings hold simultaneously. Column header changes from "Traditional view" to "Classical reading."

dimitri-yatsenko requested review from MilagrosMarin, esutlie and ttngu207 June 13, 2026 17:43

This was referenced Jun 13, 2026

Add two deeper concept pages: Schema as a Workflow Specification + Comparison to Workflow Languages #185

Merged

docs(rwm): trim trade-off prose, align diagram with dj.Diagram, restructure concrete-first #186

Merged

MilagrosMarin approved these changes Jun 14, 2026

View reviewed changes

dimitri-yatsenko merged commit 0f95b3f into main Jun 14, 2026
2 checks passed

dimitri-yatsenko deleted the expand/relational-workflow-model-intro branch June 14, 2026 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expand Relational Workflow Model concept page#184

Expand Relational Workflow Model concept page#184
dimitri-yatsenko merged 1 commit into
mainfrom
expand/relational-workflow-model-intro

dimitri-yatsenko commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

dimitri-yatsenko commented Jun 13, 2026

Context

Changes

Net change

Sources

Notes for reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants