Skip to content

Expand Relational Workflow Model concept page#184

Merged
dimitri-yatsenko merged 1 commit into
mainfrom
expand/relational-workflow-model-intro
Jun 14, 2026
Merged

Expand Relational Workflow Model concept page#184
dimitri-yatsenko merged 1 commit into
mainfrom
expand/relational-workflow-model-intro

Conversation

@dimitri-yatsenko

Copy link
Copy Markdown
Member

Context

The current Relational Workflow Model (RWM) concept page (src/explanation/relational-workflow-model.md) is understated relative to the model's significance. It reads as a brief positioning statement rather than as an entry point that lands the structural argument for a reader who already knows informatics (databases, FK graphs, ER modeling, workflow managers, lakehouses).

This PR expands the page to function as that entry point — the audience pictured is a knowledgeable peer (e.g., an infrastructure architect from pharma R&D evaluating where DataJoint sits in the landscape they already know).

Changes

  • Lead with the three-interpretations taxonomy (Codd / Chen / RWM) and the computational substrate framing from the DataJoint 2.0 preprint (Yatsenko & Nguyen, 2026, arXiv:2602.16585).
  • Name the surrounding tool categories explicitly and what each is silent on:
    • File-based workflow systems (CWL, Snakemake, Nextflow) — fragment provenance across the filesystem
    • Task orchestrators (Airflow, Argo, Prefect) — agnostic to data structure
    • Data catalogs (DataHub, Atlan, Marquez) — describe data after it lands
    • Lakehouses (Delta, Iceberg, Hudi) — treat computation as external
  • Add a worked-example pipeline (Mouse → Session → Scan → AverageFrame → Segmentation → Fluorescence, with SegmentationParam as Lookup) rendered as a mermaid diagram with tier-color classes.
  • Add a "deliberate trade-off" section that acknowledges the legitimate strengths of decoupled architectures and frames DataJoint's coupling as a chosen trade-off — directly drawn from the preprint Section 5.
  • Add a "substrate consequences" section that covers:
    • Provenance and lineage as structural properties of the substrate (mapping to W3C PROV / OpenLineage is translation, not reconstruction)
    • The five agent-substrate properties from the preprint: self-describing, safe by default, explicit dependencies, idempotent, observable
  • Preserve the existing detailed sections (table tiers, master-part, workflow normalization, entity integrity, query algebra with closure, transactions vs transformations) under a "Beneath the model" header for readers who want the structural detail.

Net change

+177 / -112 lines; one file.

Sources

  • Yatsenko & Nguyen, 2026 — DataJoint 2.0 whitepaper (computational substrate, four innovations, substrate properties for agents, deliberate-trade-off discussion)
  • Yatsenko et al., 2018 — original theoretical formalization (relational workflow model, query algebra)

Notes for reviewers

  • The mermaid diagram uses tier-color classDefs. If the docs site's mermaid theme overrides these, we may need to drop colors or adapt to the site theme.
  • Cross-references in See also all resolve against current src/explanation/ and src/how-to/ content.

The previous intro understated the model's significance. The expansion
positions the RWM for an informatics-knowledgeable reader:

- Lead with the three-interpretations taxonomy (Codd / Chen / RWM) and the
  computational-substrate framing from the DataJoint 2.0 preprint.
- Name the surrounding tool categories explicitly (CWL/Snakemake/Nextflow,
  Airflow/Argo/Prefect, DataHub/Atlan/Marquez, Delta/Iceberg/Hudi) and what
  each is silent on.
- Add a worked example pipeline (Mouse > Session > Scan > AverageFrame >
  Segmentation > Fluorescence, with SegmentationParam as Lookup) rendered
  as a mermaid diagram with tier colors.
- Add a "deliberate trade-off" section addressing the legitimate strengths
  of decoupled architectures and why DataJoint accepts coupling.
- Add a substrate-consequences section: provenance and lineage as
  structural properties (mapping to W3C PROV / OpenLineage is translation,
  not reconstruction), and the five agent-substrate properties
  (self-describing, safe by default, explicit dependencies, idempotent,
  observable) from the preprint.
- Preserve the existing detailed sections (table tiers, master-part,
  normalization, entity integrity, query algebra, transactions vs
  transformations) under a "Beneath the model" header for readers who want
  the structural detail.
@dimitri-yatsenko dimitri-yatsenko merged commit 0f95b3f into main Jun 14, 2026
2 checks passed
@dimitri-yatsenko dimitri-yatsenko deleted the expand/relational-workflow-model-intro branch June 14, 2026 15:55
dimitri-yatsenko added a commit that referenced this pull request Jun 14, 2026
Placeholder for follow-up work after #184 (expand RWM) and #185 (deeper
concept pages) merge. Tracker file outlines what to trim, why, and how to
pick the work up once both upstream PRs land.

No content changes to docs source in this PR. The tracker file is to be
deleted in the same commit that applies the trim.
dimitri-yatsenko added a commit that referenced this pull request Jun 26, 2026
…ucture concrete-first (#186)

* WIP tracker: trim "deliberate trade-off" prose from RWM concept page

Placeholder for follow-up work after #184 (expand RWM) and #185 (deeper
concept pages) merge. Tracker file outlines what to trim, why, and how to
pick the work up once both upstream PRs land.

No content changes to docs source in this PR. The tracker file is to be
deleted in the same commit that applies the trim.

* docs(rwm): trim "deliberate trade-off" prose; link to Comparison page

The developed argument lives on the Comparison to Workflow Languages
page (added in #185). The RWM page now mentions the trade-off in one
paragraph and links out, preventing drift between two homes for the
same argument.

Removes the .github/follow-ups/ tracker that scheduled this work.

* docs(rwm): align worked-example diagram with dj.Diagram notation

Match the conventions from datajoint-python's dj.Diagram
(diagram.py:1017-1082):

- Manual: green rectangle (unchanged)
- Lookup: plaintext — no border/fill (was a filled rectangle)
- Imported: blue stadium-shaped node — closest Mermaid approximation
  to dj.Diagram's ellipse
- Computed: red stadium-shaped node — same

Drop the inline tier-name and make() annotations on each node; tier
is now conveyed by shape and color alone, as in the real diagrams.
A new lead paragraph spells out the convention so the reader can
decode the diagram without a separate legend.

* docs(rwm): restructure concrete-first; reframe as added interpretation

Two structural cleanups on relational-workflow-model.md:

Concrete-first ordering. Open with a tight paragraph naming the
model, then lead with the worked example (diagram + walkthrough).
The historical lineage (Codd/Chen/RWM three interpretations) now
follows the example, placing DataJoint's contribution in context
once the reader has a concrete pipeline to anchor on. The closing
side-by-side reading table moves to the end of the page.

Reframe as interpretation, not departure. Classical relational
concepts (tables, rows, foreign keys, normalization, the query
algebra) apply unchanged; RWM adds a semantic interpretation on
top. Renamed and rewrote two sections to reflect this:

- "Four shifts from the classical relational model"
  → "A semantic interpretation, not a departure"
  Bullets now read additively ("tables also represent workflow
  steps") rather than contrastively ("not merely categories").

- "From transactions to transformations"
  → "Two readings of the same schema"
  Lead-in clarifies both readings hold simultaneously. Column
  header changes from "Traditional view" to "Classical reading."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants