Design: Support multi-source to single-target harmonization rules.
Background
- Some harmonizations require combining multiple source data elements into a single target element.
- Example: one-hot encoded source columns -> single enum target (e.g., REDCap coding).
- Other examples: split fields (first/last) to a canonical name, separate date/time to a combined timestamp, or multiple flags to a consolidated category.
Goals
- Define a rule model that can reference multiple source elements.
- Specify serialization schema changes and backward compatibility expectations.
- Clarify how rule lookup and application should work in the pipeline (e.g., list-of-sources mapping).
- Define how replay logging should represent multi-source transformations.
Open questions
- Should a new rule type be introduced (e.g., "multi_source") or extend existing rule schema?
- How to express source ordering and missing-value handling?
- How to support one-hot -> enum mapping explicitly (e.g., reduce + mapping)?
- How should validation work when sources are missing?
Deliverables (design only)
- Proposed schema with examples
- Changes needed in RuleRegistry lookup and harmonize pipeline
- Replay log implications
- Test strategy
Design: Support multi-source to single-target harmonization rules.
Background
Goals
Open questions
Deliverables (design only)