280 - Making Payload in Satellites and Reference Satellites optional#458
280 - Making Payload in Satellites and Reference Satellites optional#458polat-deniz wants to merge 6 commits into
Conversation
|
dbt test combined result: ❌ DetailsRESULTS for Synapse: RESULTS for Postgres: RESULTS for BigQuery: RESULTS for Redshift: RESULTS for Snowflake: RESULTS for Exasol: RESULTS for Fabric: RESULTS for Oracle: RESULTS for Databricks: RESULTS for SQL Server: RESULTS for Trino: Link to workflow summary: https://github.com/ScalefreeCOM/datavault4dbt-ci-cd/actions/runs/26023273407 |
|
dbt test combined result: ❌ DetailsRESULTS for Synapse: RESULTS for Postgres: RESULTS for BigQuery: RESULTS for Redshift: RESULTS for Snowflake: RESULTS for Exasol: RESULTS for Fabric: RESULTS for Oracle: RESULTS for Databricks: RESULTS for SQL Server: RESULTS for Trino: Link to workflow summary: https://github.com/ScalefreeCOM/datavault4dbt-ci-cd/actions/runs/26043144400 |
tkirschke
left a comment
There was a problem hiding this comment.
Hi @polat-deniz , habe dann nicht alle Adapter durch geschaut, aber paar grundsätzliche Änderungswünsche, die entsprechend auf alle Adapter angewendet werden sollten :)
| WHERE {{ src_ldts }} != {{ datavault4dbt.string_to_timestamp(timestamp_format, end_of_all_times) }} | ||
| ) | ||
| {%- if is_incremental() %} | ||
| WHERE {{ src_ldts }} != {{ datavault4dbt.string_to_timestamp(timestamp_format, end_of_all_times) }} |
There was a problem hiding this comment.
@polat-deniz das ist unnötig so - in der alten Variante existiert die WHERE != end_of_all_times nur, um die HWM nicht zu gefährden (ohne den filter wäre max(ldts) immer end_of_all_times).
In deiner Änderung würde der Filter auch greifen, wenn incremental=true und disable_hwm=true, und würde potentiell den ghost record rausfiltern (dieser wäre eigentlich eh nach initial load drin, aber so eine where würde ich vermeiden). Damit der Ghost Record nicht erneut geinserted wird, reicht die klassische deduplizierung des Satellites: error_key existiert schon im sat, hashdiff hat sich nicht geändert, kein Insert
Also mMn bitte diese Änderung bei allen Satellites wieder rausnehmen
| ), | ||
| {%- endif %} | ||
|
|
||
| {%- set last_cte = 'source_data' -%} |
There was a problem hiding this comment.
@polat-deniz das wird hier gesetzt, was erstmal eine gute Idee ist, aber dann müsste es auch in Zeile 97 genutzt werden, sonst ist es unnötig definiert hier, da es in 104 wieder überschrieben wurde.
Wird ähnlich für alle anderen Satellites sein.
| MAX({{ src_ldts }}) FROM {{ this }} | ||
| WHERE {{ src_ldts }} != {{ datavault4dbt.string_to_timestamp(timestamp_format, end_of_all_times) }} | ||
| ) | ||
| AND {{ src_ldts }} != {{ datavault4dbt.string_to_timestamp(timestamp_format, end_of_all_times) }} |
|
|
||
| {%- set ref_key = datavault4dbt.escape_column_names(ref_key) -%} | ||
| {%- if has_hashdiff -%} | ||
| {%- set ns.src_hashdiff = datavault4dbt.escape_column_names(ns.src_hashdiff) -%} |
There was a problem hiding this comment.
indent den inneren Teil der If
| WHERE {{ src_ldts }} != {{ datavault4dbt.string_to_timestamp(timestamp_format, end_of_all_times) }} | ||
| ) | ||
| {%- if is_incremental() %} | ||
| WHERE {{ src_ldts }} != {{ datavault4dbt.string_to_timestamp(timestamp_format, end_of_all_times) }} |
| {%- set src_rsrc = datavault4dbt.escape_column_names(src_rsrc) -%} | ||
| {%- set parent_hashkey = datavault4dbt.escape_column_names(parent_hashkey) -%} | ||
| {%- if has_hashdiff -%} | ||
| {%- set src_hashdiff = datavault4dbt.escape_column_names(src_hashdiff) -%} |
| MAX({{ src_ldts }}) FROM {{ this }} | ||
| WHERE {{ src_ldts }} != {{ datavault4dbt.string_to_timestamp(timestamp_format, end_of_all_times) }} | ||
| ) | ||
| AND {{ src_ldts }} != {{ datavault4dbt.string_to_timestamp(timestamp_format, end_of_all_times) }} |
|
dbt test combined result: ❌ DetailsRESULTS for Synapse: RESULTS for Postgres: RESULTS for BigQuery: RESULTS for Redshift: RESULTS for Snowflake: RESULTS for Exasol: RESULTS for Fabric: RESULTS for Oracle: RESULTS for Databricks: RESULTS for SQL Server: RESULTS for Trino: Link to workflow summary: https://github.com/ScalefreeCOM/datavault4dbt-ci-cd/actions/runs/26281641264 |
84c26d2 to
eec45ff
Compare
|
Hey @tkirschke, I addressed the requested changes and applied them across the affected adapters:
|
|
dbt test combined result: ❌ DetailsRESULTS for Synapse: RESULTS for Postgres: RESULTS for BigQuery: RESULTS for Redshift: RESULTS for Snowflake: RESULTS for Exasol: RESULTS for Fabric: RESULTS for Oracle: RESULTS for Databricks: RESULTS for SQL Server: RESULTS for Trino: Link to workflow summary: https://github.com/ScalefreeCOM/datavault4dbt-ci-cd/actions/runs/26458084442 |
Description
feat: optional
src_payloadfor Satellites and Reference SatellitesMakes
src_payload(as well assrc_hashdiff) optional across all satellite macros (sat_v0,sat_v1,ref_sat_v0,ref_sat_v1) for all 9 supported database adapters. This introduces two new ways to use satellites alongside the existing classic approach.Three ways to use satellites
src_payloadsrc_hashdiffValidation: if
src_payloadhas more than one column andsrc_hashdiffis not set, dbt will throw a clear compiler error.What changed
Top-level dispatchers
macros/tables/sat_v0.sql,sat_v1.sql,ref_sat_v0.sql,ref_sat_v1.sqlsrc_payloadandsrc_hashdiff/hashdiffare no longer required parametersDialect implementations
9 adapters × 4 macro types = 36 files
SELECTlists and duplicate checks when it is not providedBackwards compatibility
All existing models that already provide
src_payloadandsrc_hashdiffcontinue to work without any changes.Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Tested with branch 280-sat-payload-optional of datavault4dbt and branch 280-sat-payload-optional of datavault4dbt-automatic-tests, to make sure that all special cases were covered. Was tested in the CI/CD.