You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| unique_key | Unique key used for identifying rows between source and target | List of strings or string |
1247
+
| valid_from_name | The name of the `valid_from` column to create in the target table. Default: `valid_from`| string |
1248
+
| valid_to_name | The name of the `valid_to` column to create in the target table. Default: `valid_to`| string |
1249
+
| invalidate_hard_deletes | If set to `true`, when a record is missing from the source table it will be marked as invalid. Default: `false`| bool |
1250
+
| batch_size | The maximum number of intervals that can be evaluated in a single backfill task. If this is `None`, all intervals will be processed as part of a single task. See [Processing Source Table with Historical Data](#processing-source-table-with-historical-data) for more info on this use case. (Default: `None`) | int |
1250
1251
1251
1252
!!! tip "Important"
1252
1253
@@ -1273,10 +1274,66 @@ This is the most accurate representation of the menu based on the source data pr
| columns | The name of the columns to check for changes. `*` to represent that all columns should be checked. | List of strings or string |
1279
-
| execution_time_as_valid_from | By default, when the model is first loaded `valid_from` is set to `1970-01-01 00:00:00` and future new rows will have `execution_time` of when the pipeline ran. This changes the behavior to always use `execution_time`. Default: `false`| bool |
| columns | The name of the columns to check for changes. `*` to represent that all columns should be checked. | List of strings or string |
1280
+
| execution_time_as_valid_from | By default, when the model is first loaded `valid_from` is set to `1970-01-01 00:00:00` and future new rows will have `execution_time` of when the pipeline ran. This changes the behavior to always use `execution_time`. Default: `false`| bool |
1281
+
| updated_at_name | If sourcing from a table that includes as timestamp to use as valid_from, set this property to that column. See [Processing Source Table with Historical Data](#processing-source-table-with-historical-data) for more info on this use case. (Default: `None`) | int |
1282
+
1283
+
1284
+
### Processing Source Table with Historical Data
1285
+
1286
+
The most common case for SCD Type 2 is creating history for a table that it doesn't have it already.
1287
+
In the example of the restaurant menu, the menu just tells you what is offered right now, but you want to know what was offered over time.
1288
+
In this case, the default setting of `None` for `batch_size` is the best option.
1289
+
1290
+
Another use case though is processing a source table that already has history in it.
1291
+
A common example of this is a "daily snapshot" table that is created by a source system that takes a snapshot of the data at the end of each day.
1292
+
If your source table has historical records, like a "daily snapshot" table, then set `batch_size` to `1` to process each interval (each day if a `@daily` cron) in sequential order.
1293
+
That way the historical records will be properly captured in the SCD Type 2 table.
1294
+
1295
+
#### Example - Source from Daily Snapshot Table
1296
+
1297
+
```sql linenums="1"
1298
+
MODEL (
1299
+
name db.table,
1300
+
kind SCD_TYPE_2_BY_COLUMN (
1301
+
unique_key id,
1302
+
columns [some_value],
1303
+
updated_at_name ds,
1304
+
batch_size 1
1305
+
),
1306
+
start '2025-01-01',
1307
+
cron '@daily'
1308
+
);
1309
+
SELECT
1310
+
id,
1311
+
some_value,
1312
+
ds
1313
+
FROM
1314
+
source_table
1315
+
WHERE
1316
+
ds between @start_ds and @end_ds
1317
+
```
1318
+
1319
+
This will process each day of the source table in sequential order (if more than one day to process), checking `some_value` column to see if it changed. If it did change, `valid_from` will be set to match the `ds` column (except for first value which would be `1970-01-01 00:00:00`).
0 commit comments