Skip to content

[FEATURE] Use a business value for deduplication in sattelites macro #384

@MajorDaxx

Description

@MajorDaxx

What additional value does this feature bring to the project?

When source data arrives with multiple rows per ldts, the current macro deduplicates incorrectly. Introducing a configurable “deduplication column” (e.g., business_event_time) would simplify user code and produce correct Satellite records without complex custom staging logic.

Is your feature request related to a problem? Please describe.

In the following I am describing a mock of my source data I am trying to load.
Each ldts is loaded seperatly.

The source data

row_num parent_hk ldts business_event_time business_hd (Exclusiv business_event_time )
1 A a_1 1 abc
2 A a_1 2 hxw
3 A a_1 3 abc
4 A a_2 4 abc
5 A a_2 5 abc
6 A a_3 6 abc
7 A a_3 7 hxw
8 A a_3 8 dfg

Used with the given macro the result would like somthing like this. But that is not correct

row_num parent_hk ldts business_event_time business_hd (Exclusiv business_event_time )
1 A a_1 1 abc
2 A a_1 2 hxw
4 A a_2 4 abc
6 A a_3 6 abc
7 A a_3 7 hxw
8 A a_3 8 dfg

The correct expected Sattelite would be like this.

row_num parent_hk ldts business_event_time business_hd (Exclusiv business_event_time )
1 A a_1 1 abc
2 A a_1 2 hxw
3 A a_1 3 abc
7 A a_3 7 hxw
8 A a_3 8 dfg

Describe the solution you'd like

Its probably a smaller case but would signifanctly reduce code complexity for this type of source data. And its only a minor change in the code base.
By simply introducing a deduplication column and use it instead of the ldts the wanted behaviour could be archieved.
Basically it needs to be determined if this is "the correct way" from a methodology point of view

Describe alternatives you've considered

Alternativly the same thing could be placed in stanging or in a custome layer inbetween staging and raw_vault.

Additional context
The line in source can be found here

Metadata

Metadata

Assignees

No one assigned

    Labels

    stalewontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions