Summary
After publishing a sandbox build for QA, the next question is always "what changed vs prod?". Today this is answered with hand-written SQL. Add a dbts diff command that compares sandbox tables against staging (or live) and prints a delta table.
Why
- Direct continuation of the QA workflow that
dbts publish opened up.
- Reviewers can answer "is this row-count drop expected?" without leaving the CLI.
Sketch
- For each table in the build set (or every table in the sandbox if no selectors): row count in sandbox vs row count in the comparison target.
- Optional
--hash flag: include a checksum/hash on a configurable column subset for quick "are the same rows there?" sanity.
- Output: Rich table with
model | sandbox rows | <target> rows | delta | % columns. Flag empty-on-one-side rows in red.
- CLI shape:
dbts diff [selectors...] [--against staging|live] [--hash].
Where it'd live
- New module
src/dbts/diff.py, mirroring the structure of src/dbts/freshness.py.
- Reuse
_dbt_ls.py for model enumeration (already shared by plan and freshness).
- Reuse
snowflake.connect / run_sql.
Effort
Medium — a few days. Most complexity is in the SQL (single UNION-ALL query per pair, or one query per table) and edge cases (table missing on one side, view vs table, etc.).
Tier
Tier 1 — recommended (closes the QA workflow loop).
Summary
After publishing a sandbox build for QA, the next question is always "what changed vs prod?". Today this is answered with hand-written SQL. Add a
dbts diffcommand that compares sandbox tables againststaging(orlive) and prints a delta table.Why
dbts publishopened up.Sketch
--hashflag: include a checksum/hash on a configurable column subset for quick "are the same rows there?" sanity.model | sandbox rows | <target> rows | delta | %columns. Flag empty-on-one-side rows in red.dbts diff [selectors...] [--against staging|live] [--hash].Where it'd live
src/dbts/diff.py, mirroring the structure ofsrc/dbts/freshness.py._dbt_ls.pyfor model enumeration (already shared byplanandfreshness).snowflake.connect/run_sql.Effort
Medium — a few days. Most complexity is in the SQL (single UNION-ALL query per pair, or one query per table) and edge cases (table missing on one side, view vs table, etc.).
Tier
Tier 1 — recommended (closes the QA workflow loop).