Refactor deprecation scanning to pre-normalize; allow list-form populated_from on PV#250
Refactor deprecation scanning to pre-normalize; allow list-form populated_from on PV#250amc-corey-cox wants to merge 8 commits into
Conversation
Extends populated_from on PermissibleValueDerivation to accept a list of source values, completing the deprecation of `sources` started in #194 (issue #193). Scope is limited to PV derivations; class/slot/enum list support is separate work. - Schema: populated_from is multivalued; scalar input is wrapped at load. - Runtime: deprecated `sources` is migrated into `populated_from` so the enum-mapping loop has a single field to consult. `sources` is left in place after migration so the validator's deprecation warning still fires. - Validator: setting both `populated_from` and `sources` on the same PV now produces severity=error (validate-spec gates on it). - Architecture: the migration runs only at runtime entry points, so the validator sees un-migrated data and can distinguish "user wrote both" from "we migrated sources". - Updated schema_mapper, inverter, and the markdown template to handle populated_from as a list. Inverter raises NonInvertibleSpecificationError when a PV deriv has multiple source values.
There was a problem hiding this comment.
Pull request overview
This PR completes the deprecation path for PermissibleValueDerivation.sources by making PermissibleValueDerivation.populated_from multivalued (with scalar-to-list coercion), adding validator errors for sources+populated_from conflicts, and updating runtime/inference/compiler code to handle list semantics.
Changes:
- Update the transformer model so PV-level
populated_fromis multivalued, and coerce scalar PVpopulated_fromto a one-element list during normalization. - Add validator logic to error when both
sourcesandpopulated_fromare set on the same PV derivation; runtime migratessources→populated_fromwhenpopulated_fromis empty. - Update enum mapping, schema derivation, inversion, and markdown compilation to treat PV
populated_fromas a list; add tests for list/scalar/migration/conflict behavior.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_validator.py | Adds structural-validation tests for PV populated_from list/scalar acceptance, sources-only deprecation behavior, and both-set conflict error. |
| tests/test_transformer/test_multi_enum.py | Adds runtime tests for list-form PV mapping, scalar normalization, and sources→populated_from migration behavior. |
| src/linkml_map/validator.py | Adds check_pv_field_conflicts() and wires it into validate_spec() before deprecation/semantic checks. |
| src/linkml_map/transformer/transformer.py | Adds PV-derivation iteration, scalar-to-list coercion for PV populated_from, and runtime migration from PV sources. |
| src/linkml_map/transformer/object_transformer.py | Updates enum value mapping to use list-form populated_from membership checks. |
| src/linkml_map/inference/schema_mapper.py | Treats PV sources as populated_from (preferred) or sources for derived enum PV enumeration. |
| src/linkml_map/inference/inverter.py | Rejects inversion when PV populated_from has multiple sources; updates inverted PV derivations to list-form populated_from. |
| src/linkml_map/datamodel/transformer_model.yaml | Declares PV populated_from as multivalued and updates deprecation text for sources. |
| src/linkml_map/datamodel/transformer_model.py | Regenerates the Pydantic model to reflect PV populated_from: list[str] and updated deprecation metadata. |
| src/linkml_map/compiler/templates/markdown.j2 | Renders PV populated_from lists as comma-joined strings. |
…tion All deprecation scanning now runs in one pre-normalize pass inside _normalize_spec_dict, so every spec-loading entry point gets it. PV migration clears sources; runtime asserts the invariant. _warn_deprecated_fields shim removed; pre-flight reads scan messages from the Transformer. object_derivations deprecation gains validate-spec surface. Also regenerated docs/schema/PermissibleValueDerivation.md (CI fix).
|
Scope expansion: architectural refactor of deprecation scanning Both Copilot threads and offline discussion flagged that the original implementation put the PV Commit 6317029 reworks this:
PR title updated. Original PR body retained for history. |
- _iter_derivation_dicts now recognizes compact-key list form so the
deprecation scan catches `permissible_value_derivations: [{name: {...}}]`.
- _coerce_pv_populated_from_to_list strips explicit-null populated_from
(also handles [None] post-ReferenceValidator) so pydantic uses the
default-factory.
- markdown.j2 defends with `or []` so a programmatic None doesn't crash
the template.
…dict Three phases now: SHAPE (recursive compact-key + dict↔list everywhere), SCAN (on canonical shape), MIGRATE (object_derivations flatten, populated_from inheritance, PV scalar→list, PV None-strip, PV sources→populated_from). Field semantics no longer entangled with structural normalization, so _iter_derivation_dicts can assume canonical shape.
xfail(strict=True) on the correct post-fix behavior. Fails loudly with XPASS when upstream lands the ReferenceValidator fix, signaling that the PV-level branch in _pre_shape_expand_compact_keys can be removed.
Removed tests/test_upstream_canaries.py — its xfail(strict=True) only flips on the "accept and normalize" path; if upstream picks "reject loudly" instead, the test errors out and never signals usefully. Updated _pre_shape_expand_compact_keys docstring to explain compact-key list is a linkml-map convention, not LinkML-canonical, and that the pre-pass makes us independent of any upstream resolution.
Upstream's proposed fix hoists unambiguous compact-key list items as
the pk (errors on genuinely ambiguous cases). Our test case is
unambiguous — key is not a slot of PermissibleValueDerivation, value is
a dict — so when the fix lands, RV will produce {'red': {...}} and the
xfail(strict=True) flips to a clean signal to remove the workaround.
- check_deprecated_fields docstring: reflect actual SHAPE→SCAN→MIGRATE ordering (scan now runs after RV, not before). - _coerce_pv_populated_from_to_list: only strip explicit None / all-None lists; empty strings stay (legitimate PV value). - _migrate_pv_sources_to_populated_from: wrap scalar string instead of exploding into characters. - transform_enum: replace `assert not pv_deriv.sources` with explicit SpecificationError raise — invariant check stays enforced under `-O`. - _iter_values_or_list docstring: relax to match actual use (pre- and post-SHAPE callers).
Closes the deprecation of
sourcesonPermissibleValueDerivationstarted in #194 (issue #193). Users who needed multi-source-to-one-target PV mapping had to keep using deprecatedsources: [...]; now they can writepopulated_from: [light_red, dark_red].Scope: PermissibleValueDerivation only. Class/Slot/Enum
populated_fromlist semantics are entangled with broader dispatch design and can follow on their own schedule.Summary
populated_fromon PV derivations is now multivalued. Scalar input (populated_from: 'x') is wrapped to a one-element list at load time.sources(deprecated) is migrated intopopulated_fromat runtime entry points whenpopulated_fromis empty.sourcesitself is left in place so the validator's existing deprecation warning still fires.populated_fromandsourceson the same PV is now a validator error.validate-specexits 1;map-datapre-flight surfaces it informationally per existing pre-flight behavior.populated_fromwins at runtime when both are set.Architecture note
Transformer._normalize_spec_dictdoes structural normalization only (shared with the validator). The new_migrate_pv_sources_to_populated_fromis called separately from the three runtime entry points (create_transformer_specificationand the twoload_*methods). The validator therefore sees the un-migrated dict and can cleanly distinguish "user wrote both" from "we migrated sources".Spillover fixes
The schema-mapper and inverter were treating
populated_fromas a scalar string and now handle the list. The markdown template joins the list with,so output readsSIBLING_OFrather than['SIBLING_OF']. The inverter raisesNonInvertibleSpecificationErrorwhen a PV has multiple source values (the compliance suite already marks list-form mappings as non-invertible).Asked-from context
Surfaced from dm-bip: a user wrote
populated_from: '1'(PV-level, value form) and needed to add'M'as a second source value mapping to the same target PV. The old answer was "use deprecatedsources"; now it's "use listpopulated_from".Test plan
uv run pytest— 835 passed, 4 skippeduv run ruff check .anduv run ruff format .— cleantests/test_validator.py(list form, scalar form, sources-only deprecation, both-set conflict error)tests/test_transformer/test_multi_enum.py(list-form maps multiple sources, scalar still works, sources migration preserved)test_deprecations.py::test_sources_on_pv_emits_deprecation_warning) still passes