Skip to content

Support cross-table slot lookup in class_derivation slot_derivations #134

@amc-corey-cox

Description

@amc-corey-cox

Summary

When a class_derivation has a top-level populated_from: TableA, all slot_derivation blocks within it can only reference columns from TableA. There is currently no way to pull a value from a different source table (e.g., a demographics table) into a slot derivation that lives under a class derivation targeting a different table.

This is a common need in tabular data harmonization where related variables live in separate tables — for example, linking an age_at_observation from a demographics table to measurement observations loaded from a different data table.

Current behavior

populated_from at the slot level resolves within the same source class/table as the parent class_derivation. The existing cross-class support (#101, resolve_fk_path in fk_utils.py) handles dot-notation FK traversal (e.g., org_id.name) but requires a foreign key relationship defined in the schema. It does not support arbitrary cross-table lookups where the join key must be specified in the transformation spec.

Desired behavior

Within a class_derivation block with populated_from: TableA, a slot_derivation should be able to reference columns from TableB — for example:

class_derivations:
  MeasurementObservation:
    populated_from: lab_results
    slot_derivations:
      analyte_value:
        populated_from: result_value
      age_at_observation:
        populated_from: demographics.age_at_exam  # cross-table lookup
        # join key: shared participant_id

The exact syntax and join-key specification mechanism need design work. Options include:

  • Extending dot-notation (OtherClass.slot) with an implicit or explicit join key
  • A new attribute on slot_derivation specifying the foreign table and join column
  • Making cross-table references available in expr: blocks via bindings

Use cases

  • Longitudinal study data: Age variables live in a demographics/visit table but need to be attached to measurement observations from separate data tables
  • Reference data joins: Coded values in one table need to be enriched with labels from a reference/lookup table
  • Multi-table harmonization: Any scenario where the target schema combines attributes that originate from different source tables

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions