Skip to content

populated_from_descendants: iterate source instances regardless of containment path #236

@amc-corey-cox

Description

@amc-corey-cox

Motivation

Some transformations need to lift instances of a given source class out of a deep tree into a flat target collection, regardless of where in the source hierarchy they appear. Today there's no syntax for this — populated_from is scoped to the immediate parent slot's range, so authors must either enumerate every containment path or pre-flatten the source in Python before invoking linkml-map.

Concrete driver: schema-automator's EML importer (linkml/schema-automator#208). EML's <enumeratedDomain> elements appear nested under dataset.dataTable[*].attributeList.attribute[*].measurementScale.nominal.nonNumericDomain (and similarly under ordinal). Building a flat enums map at the target schema's root requires walking all those nested locations. The same pattern will recur for future schema-as-source importers (XSD, JSON-Schema).

Proposed direction

Add a derivation key that matches descendants of a given source class rather than slot-scoped immediate children:

class_derivations:
  EnumDefinitions:
    populated_from_descendants: EnumeratedDomain   # all instances under tree_root
    target_class: enum_definition
    slot_derivations: { ... }

Or as a slot-derivation modifier:

slot_derivations:
  enums:
    populated_from: EnumeratedDomain
    descendants_of: $tree_root

Execution / memory

Streaming-compatible: a single pass can materialize matched descendants directly into the target collection without retaining ancestors. No new memory pressure beyond what the target collection itself requires.

For trans-specs that combine descendant iteration with cross-references (#237), the same opt-in DuckDB-backed scratch store proposed there serves both — collect descendants into a temp table in pass 1, query in pass 2. This unifies the execution-mode story: linkml-map has a streaming default and a single opt-in "scratch-store, two-pass" mode that unlocks descendant iteration, cross-references, and their combination.

Open questions

  • Match semantics. All descendants, or first-match-per-branch? (Lean: all.)
  • Order. Document-order is the natural default for tree-walks; guarantee it or leave implementation-defined?
  • Surface choice. Explicit key (populated_from_descendants) is unambiguous. A path-expression form (populated_from: "**.EnumeratedDomain") is more compact but adds parser surface. Lean toward the explicit key first; paths can follow if real patterns demand it.

References / contrasts

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions