You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When two derived artifacts must share a name to form a reference — typical case: a slot_definition.range pointing at an enum_definition.name where both were derived from the same source instance — there's no mechanism today to declare that pairing. slot() (133a9e2) addresses intra-class binding but doesn't reach across derivations.
Authors today work around this by writing the same naming expression in both derivations and trusting them to stay consistent. That's fragile and makes the cross-reference invisible to the planner: it can't detect that the two artifacts must agree, can't validate the round-trip, can't optimize execution.
Concrete driver: schema-automator's EML importer (linkml/schema-automator#208), where an EML attribute with an enumeratedDomain derives both a slot (in the parent class's attributes) and an enum (in the schema-level enums map), and the slot's range must equal the enum's name. Same pattern for future XSD/JSON-Schema importers.
Proposed direction
Allow a derivation to publish a named binding scoped to its source instance, and another derivation to consume it:
The binding key (enum_name_for) plus the source-instance identifier (self) form a lookup; the runtime guarantees the consumer sees what the producer published.
Execution model — opt-in scratch store
Cross-references require persistence between derivations, which conflicts with streaming and with the deliberate earlier decision to remove implicit memoization from the runtime. The proposal is opt-in two-pass execution backed by a DuckDB scratch store:
Pass 1: stream the source. Producer derivations write bindings into a DuckDB temp table (source_id, binding_key, value). Artifacts that don't depend on cross-refs are emitted normally.
Pass 2: consumer derivations resolve ref() calls against the scratch table and emit dependent artifacts.
Properties:
Opt-in per trans-spec. Specs that declare no publishes/ref pairs stay single-pass and streamable. The memory cost is paid only by specs that ask for it.
Deterministic by construction. Read-before-write can't happen — all writes in pass 1 complete before any reads in pass 2. Missing-binding errors surface as clear diagnostics ("no binding enum_name_for=Attribute_42") rather than silent nulls.
Optimizable. When the dependency graph is tractable — producer and consumer in the same derivation, or unambiguously ordered — the planner can choose single-pass topo-sort instead. Two-pass is the safe default.
Reversibility
Cross-references make structural pairing explicit and therefore reversible by the inverse engine: given a slot with range: foo and an enum with name: foo, the inverse derivation reconstructs the source Attribute. Reversibility of the binding key expression depends on its own invertibility (e.g., slugify is one-way and breaks round-trips; equality / FK-style joins are fine). This inherits linkml-map's existing reversibility-where-lossless principle without adding new rules.
Open questions
Source-key scope. Restrict bindings to source-instance identity (must be hashable, drawn from identifier slots), or allow arbitrary computed keys? Lean toward source-instance identity first — keeps the scratch table small and the inverse mechanical.
Multiple producers, one consumer. Should a binding key be allowed to be published by more than one derivation? Probably error unless explicitly declared multi-valued.
Surface syntax.publishes / ref is the strawman; alternatives (e.g., explicit bindings: section at the spec level) might be cleaner.
References / contrasts
slot() (133a9e2) — intra-class binding; doesn't reach across derivations
dictionary_key / cast_collection_as: MultiValuedDict — in-collection keying, different problem
Motivation
When two derived artifacts must share a name to form a reference — typical case: a
slot_definition.rangepointing at anenum_definition.namewhere both were derived from the same source instance — there's no mechanism today to declare that pairing.slot()(133a9e2) addresses intra-class binding but doesn't reach across derivations.Authors today work around this by writing the same naming expression in both derivations and trusting them to stay consistent. That's fragile and makes the cross-reference invisible to the planner: it can't detect that the two artifacts must agree, can't validate the round-trip, can't optimize execution.
Concrete driver: schema-automator's EML importer (linkml/schema-automator#208), where an EML attribute with an
enumeratedDomainderives both a slot (in the parent class's attributes) and an enum (in the schema-level enums map), and the slot'srangemust equal the enum'sname. Same pattern for future XSD/JSON-Schema importers.Proposed direction
Allow a derivation to publish a named binding scoped to its source instance, and another derivation to consume it:
The binding key (
enum_name_for) plus the source-instance identifier (self) form a lookup; the runtime guarantees the consumer sees what the producer published.Execution model — opt-in scratch store
Cross-references require persistence between derivations, which conflicts with streaming and with the deliberate earlier decision to remove implicit memoization from the runtime. The proposal is opt-in two-pass execution backed by a DuckDB scratch store:
source_id,binding_key,value). Artifacts that don't depend on cross-refs are emitted normally.ref()calls against the scratch table and emit dependent artifacts.Properties:
publishes/refpairs stay single-pass and streamable. The memory cost is paid only by specs that ask for it.enum_name_for=Attribute_42") rather than silent nulls.Reversibility
Cross-references make structural pairing explicit and therefore reversible by the inverse engine: given a slot with
range: fooand an enum withname: foo, the inverse derivation reconstructs the sourceAttribute. Reversibility of the binding key expression depends on its own invertibility (e.g.,slugifyis one-way and breaks round-trips; equality / FK-style joins are fine). This inherits linkml-map's existing reversibility-where-lossless principle without adding new rules.Open questions
publishes/refis the strawman; alternatives (e.g., explicitbindings:section at the spec level) might be cleaner.References / contrasts
slot()(133a9e2) — intra-class binding; doesn't reach across derivationsdictionary_key/cast_collection_as: MultiValuedDict— in-collection keying, different problem