Motivation
Some transformations need to lift instances of a given source class out of a deep tree into a flat target collection, regardless of where in the source hierarchy they appear. Today there's no syntax for this — populated_from is scoped to the immediate parent slot's range, so authors must either enumerate every containment path or pre-flatten the source in Python before invoking linkml-map.
Concrete driver: schema-automator's EML importer (linkml/schema-automator#208). EML's <enumeratedDomain> elements appear nested under dataset.dataTable[*].attributeList.attribute[*].measurementScale.nominal.nonNumericDomain (and similarly under ordinal). Building a flat enums map at the target schema's root requires walking all those nested locations. The same pattern will recur for future schema-as-source importers (XSD, JSON-Schema).
Proposed direction
Add a derivation key that matches descendants of a given source class rather than slot-scoped immediate children:
class_derivations:
EnumDefinitions:
populated_from_descendants: EnumeratedDomain # all instances under tree_root
target_class: enum_definition
slot_derivations: { ... }
Or as a slot-derivation modifier:
slot_derivations:
enums:
populated_from: EnumeratedDomain
descendants_of: $tree_root
Execution / memory
Streaming-compatible: a single pass can materialize matched descendants directly into the target collection without retaining ancestors. No new memory pressure beyond what the target collection itself requires.
For trans-specs that combine descendant iteration with cross-references (#237), the same opt-in DuckDB-backed scratch store proposed there serves both — collect descendants into a temp table in pass 1, query in pass 2. This unifies the execution-mode story: linkml-map has a streaming default and a single opt-in "scratch-store, two-pass" mode that unlocks descendant iteration, cross-references, and their combination.
Open questions
- Match semantics. All descendants, or first-match-per-branch? (Lean: all.)
- Order. Document-order is the natural default for tree-walks; guarantee it or leave implementation-defined?
- Surface choice. Explicit key (
populated_from_descendants) is unambiguous. A path-expression form (populated_from: "**.EnumeratedDomain") is more compact but adds parser surface. Lean toward the explicit key first; paths can follow if real patterns demand it.
References / contrasts
Motivation
Some transformations need to lift instances of a given source class out of a deep tree into a flat target collection, regardless of where in the source hierarchy they appear. Today there's no syntax for this —
populated_fromis scoped to the immediate parent slot's range, so authors must either enumerate every containment path or pre-flatten the source in Python before invoking linkml-map.Concrete driver: schema-automator's EML importer (linkml/schema-automator#208). EML's
<enumeratedDomain>elements appear nested underdataset.dataTable[*].attributeList.attribute[*].measurementScale.nominal.nonNumericDomain(and similarly underordinal). Building a flatenumsmap at the target schema's root requires walking all those nested locations. The same pattern will recur for future schema-as-source importers (XSD, JSON-Schema).Proposed direction
Add a derivation key that matches descendants of a given source class rather than slot-scoped immediate children:
Or as a slot-derivation modifier:
Execution / memory
Streaming-compatible: a single pass can materialize matched descendants directly into the target collection without retaining ancestors. No new memory pressure beyond what the target collection itself requires.
For trans-specs that combine descendant iteration with cross-references (#237), the same opt-in DuckDB-backed scratch store proposed there serves both — collect descendants into a temp table in pass 1, query in pass 2. This unifies the execution-mode story: linkml-map has a streaming default and a single opt-in "scratch-store, two-pass" mode that unlocks descendant iteration, cross-references, and their combination.
Open questions
populated_from_descendants) is unambiguous. A path-expression form (populated_from: "**.EnumeratedDomain") is more compact but adds parser surface. Lean toward the explicit key first; paths can follow if real patterns demand it.References / contrasts
sources(enumerated source-class list; no tree walk)PivotOperation(tabular reshape, orthogonal)