What additional value does this feature bring to the project?
When source data arrives with multiple rows per ldts, the current macro deduplicates incorrectly. Introducing a configurable “deduplication column” (e.g., business_event_time) would simplify user code and produce correct Satellite records without complex custom staging logic.
Is your feature request related to a problem? Please describe.
In the following I am describing a mock of my source data I am trying to load.
Each ldts is loaded seperatly.
The source data
| row_num |
parent_hk |
ldts |
business_event_time |
business_hd (Exclusiv business_event_time ) |
| 1 |
A |
a_1 |
1 |
abc |
| 2 |
A |
a_1 |
2 |
hxw |
| 3 |
A |
a_1 |
3 |
abc |
| 4 |
A |
a_2 |
4 |
abc |
| 5 |
A |
a_2 |
5 |
abc |
| 6 |
A |
a_3 |
6 |
abc |
| 7 |
A |
a_3 |
7 |
hxw |
| 8 |
A |
a_3 |
8 |
dfg |
Used with the given macro the result would like somthing like this. But that is not correct
| row_num |
parent_hk |
ldts |
business_event_time |
business_hd (Exclusiv business_event_time ) |
| 1 |
A |
a_1 |
1 |
abc |
| 2 |
A |
a_1 |
2 |
hxw |
| 4 |
A |
a_2 |
4 |
abc |
| 6 |
A |
a_3 |
6 |
abc |
| 7 |
A |
a_3 |
7 |
hxw |
| 8 |
A |
a_3 |
8 |
dfg |
The correct expected Sattelite would be like this.
| row_num |
parent_hk |
ldts |
business_event_time |
business_hd (Exclusiv business_event_time ) |
| 1 |
A |
a_1 |
1 |
abc |
| 2 |
A |
a_1 |
2 |
hxw |
| 3 |
A |
a_1 |
3 |
abc |
| 7 |
A |
a_3 |
7 |
hxw |
| 8 |
A |
a_3 |
8 |
dfg |
Describe the solution you'd like
Its probably a smaller case but would signifanctly reduce code complexity for this type of source data. And its only a minor change in the code base.
By simply introducing a deduplication column and use it instead of the ldts the wanted behaviour could be archieved.
Basically it needs to be determined if this is "the correct way" from a methodology point of view
Describe alternatives you've considered
Alternativly the same thing could be placed in stanging or in a custome layer inbetween staging and raw_vault.
Additional context
The line in source can be found here
What additional value does this feature bring to the project?
When source data arrives with multiple rows per ldts, the current macro deduplicates incorrectly. Introducing a configurable “deduplication column” (e.g., business_event_time) would simplify user code and produce correct Satellite records without complex custom staging logic.
Is your feature request related to a problem? Please describe.
In the following I am describing a mock of my source data I am trying to load.
Each ldts is loaded seperatly.
The source data
Used with the given macro the result would like somthing like this. But that is not correct
The correct expected Sattelite would be like this.
Describe the solution you'd like
Its probably a smaller case but would signifanctly reduce code complexity for this type of source data. And its only a minor change in the code base.
By simply introducing a deduplication column and use it instead of the ldts the wanted behaviour could be archieved.
Basically it needs to be determined if this is "the correct way" from a methodology point of view
Describe alternatives you've considered
Alternativly the same thing could be placed in stanging or in a custome layer inbetween staging and raw_vault.
Additional context
The line in source can be found here