Skip to content

feat: dbts diff — sandbox vs staging/live data delta #4

@luiul

Description

@luiul

Summary

After publishing a sandbox build for QA, the next question is always "what changed vs prod?". Today this is answered with hand-written SQL. Add a dbts diff command that compares sandbox tables against staging (or live) and prints a delta table.

Why

  • Direct continuation of the QA workflow that dbts publish opened up.
  • Reviewers can answer "is this row-count drop expected?" without leaving the CLI.

Sketch

  • For each table in the build set (or every table in the sandbox if no selectors): row count in sandbox vs row count in the comparison target.
  • Optional --hash flag: include a checksum/hash on a configurable column subset for quick "are the same rows there?" sanity.
  • Output: Rich table with model | sandbox rows | <target> rows | delta | % columns. Flag empty-on-one-side rows in red.
  • CLI shape: dbts diff [selectors...] [--against staging|live] [--hash].

Where it'd live

  • New module src/dbts/diff.py, mirroring the structure of src/dbts/freshness.py.
  • Reuse _dbt_ls.py for model enumeration (already shared by plan and freshness).
  • Reuse snowflake.connect / run_sql.

Effort

Medium — a few days. Most complexity is in the SQL (single UNION-ALL query per pair, or one query per table) and edge cases (table missing on one side, view vs table, etc.).

Tier

Tier 1 — recommended (closes the QA workflow loop).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions