-
Notifications
You must be signed in to change notification settings - Fork 430
Improve document on catchup feature #5239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve document on catchup feature #5239
Conversation
|
@dlouseiro is attempting to deploy a commit to the ClickHouse Team on Vercel. A member of the Team first needs to authorize it. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
| **Example scenario:** | ||
| - Your materialized view has been collecting data for 30 days | ||
| - You run `dbt run --full-refresh` with `catchup: False` | ||
| - **Result**: All 30 days of historical data will be permanently lost | ||
| - The target table will start fresh and only capture new data going forward | ||
|
|
||
| **Before using `catchup: False` during full refresh, ensure:** | ||
| - Downstream consumers (dashboards, reports, applications) can handle the data gap | ||
| - You have backups if the historical data might be needed later | ||
| - Stakeholders who rely on this data are aware of the data loss | ||
|
|
||
| **Common use cases for `catchup: False`:** | ||
| - Development and testing environments where historical data isn't critical | ||
| - Heavy transformations that would be too resource-intensive to backfill | ||
| - Intentional fresh start with new schema or logic where old data should be discarded | ||
| ::: | ||
|
|
||
| **Behavior summary:** | ||
|
|
||
| | Operation | `catchup: True` (default) | `catchup: False` | | ||
| |-----------|---------------------------|------------------| | ||
| | Initial deployment (`dbt run`) | Target table backfilled with historical data | Target table created empty | | ||
| | Full refresh (`dbt run --full-refresh`) | Target table rebuilt and backfilled | Target table recreated empty, **existing data lost** | | ||
| | Normal operation | Materialized view captures new inserts | Materialized view captures new inserts | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part is of the docs is pretty redundant and it just repeats the first part with other words.
For example the summary can be already used on the top instead of the first list. The example scenario can be simplified to something like "be aware that --full-refresh drops the values in the current table".
Would you simplify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I found the example useful but maybe too much indeed. I can simplify it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done @koletzilla
Summary
Update documentation of
dbtcatchup feature to be aligned with the changes made in this PR, reported in this issue.