Skip to content

Commit 18376fe

Browse files
blarghmateyCopilot
andcommitted
feat(ol-dbt-cli): add SQL/YAML linting and column-level impact analysis CLI
Adds a new `ol-dbt` CLI package at `src/ol_dbt_cli/` with two commands: ## `ol-dbt validate` Structural consistency checks for dbt models: 1. **YAML/SQL sync** — warns when columns declared in a model's YAML schema are absent from its compiled SELECT output, or vice-versa. 2. **Missing upstream refs** — flags `ref()` or `source()` calls that resolve to models/sources not present in the project. 3. **Upstream column existence** — validates that columns referenced from an upstream model actually exist in that model's output. 4. **Duplicate column aliases** — detects accidental repeated alias names within a single model. 5. **SELECT \*** — INFO-level notice for unresolved star expansions; `--warn-select-star` promotes these to warnings. Options: - `--model/-m` (repeatable): filter to one or more model names, comma-separated lists, or directory paths under `models/` - `--errors-only`: suppress INFO/WARN output, show only errors - `--warn-select-star`: treat unresolved SELECT * as warnings - `--compiled-dir`: point at `target/compiled/` for Jinja-heavy models - `--output json`: machine-readable output ## `ol-dbt impact` Column-level downstream impact analysis driven by git diffs: - Reads `git diff [--cached] [base]` to find changed/removed SQL column aliases - Traverses the manifest or parsed SQL to find all downstream models that reference each changed column - Reports broken references, risky usages, and models needing review ## Jinja pre-processing Robust regex-based Jinja stripping so sqlglot can parse the raw `.sql` files without a running dbt environment: - `{{ ref() }}` / `{{ source() }}` → stable SQL identifiers with reverse maps - `{{ var() }}` and bare `{{ variable }}` → unquoted placeholders (avoids the `''__jinja__''` doubled-quote bug when templates surround with SQL quotes) - Block-level macro calls alone on a line → SQL comment - Broken column expressions where a macro splits a SQL column boundary (e.g. `{{ array_join('partial SQL', "path") }}`) → collapsed to `__jinja__ as alias` via `_BROKEN_COL_RE` cleanup pass Result: **588/588** dbt models in this project parse successfully. ## YAML registry improvements - Parses both `models:` and `sources:` blocks for SELECT * resolution - Rescues model definitions accidentally nested inside another model's `columns:` list (YAML authoring error in `_stg_mitlearn_models.yml`) - Filters out nested-model entries from the parent model's column list ## Tests 85 pytest tests covering all commands, lib modules, and edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent a1a24f7 commit 18376fe

18 files changed

Lines changed: 3331 additions & 1 deletion

pyproject.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,15 @@ dependencies = [
1515
"cyclopts~=4.8.0",
1616
"dbt-duckdb>=1.10.0",
1717
"dbt-trino>=1.10.0",
18+
"rich>=13.0",
19+
"sqlglot>=25.0",
1820
"trino[external-authentication-token-cache]>=0.336.0",
1921
"universal-pathlib ~=0.3.1",
2022
]
2123
repository = "https://github.com/mitodl/ol-data-platform"
2224

2325
[tool.uv.workspace]
24-
members = ["packages/*"]
26+
members = ["packages/*", "src/ol_dbt_cli"]
2527

2628
[tool.uv]
2729
package = false # This is a workspace root, not a package

src/ol_dbt_cli/ol_dbt_cli/__init__.py

Whitespace-only changes.

src/ol_dbt_cli/ol_dbt_cli/cli.py

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
"""Main CLI entry point for ol-dbt tooling."""
2+
3+
import sys
4+
5+
import cyclopts
6+
7+
from ol_dbt_cli.commands.impact import impact
8+
from ol_dbt_cli.commands.validate import validate
9+
10+
app = cyclopts.App(
11+
name="ol-dbt",
12+
help="""
13+
ol-dbt: dbt project analysis and validation CLI
14+
15+
Tools for the engineering team to reason about in-progress dbt changes,
16+
validate model quality, and understand column-level impact.
17+
18+
Common workflows:
19+
20+
1. Check what breaks before opening a PR:
21+
$ ol-dbt impact --changed-only
22+
23+
2. Validate model SQL/YAML consistency:
24+
$ ol-dbt validate --changed-only
25+
26+
3. Full validation of all models:
27+
$ ol-dbt validate
28+
29+
4. JSON output for CI pipelines:
30+
$ ol-dbt impact --format json
31+
$ ol-dbt validate --format json
32+
33+
Use --help on any command for detailed usage.
34+
""",
35+
version="0.1.0",
36+
)
37+
38+
app.command(impact, name="impact")
39+
app.command(validate, name="validate")
40+
41+
42+
def main() -> None:
43+
"""CLI entry point."""
44+
try:
45+
app()
46+
except KeyboardInterrupt:
47+
print("\n\nOperation cancelled by user.", file=sys.stderr)
48+
sys.exit(130)
49+
except Exception as e:
50+
print(f"\nError: {e}", file=sys.stderr)
51+
sys.exit(1)
52+
53+
54+
if __name__ == "__main__":
55+
main()

src/ol_dbt_cli/ol_dbt_cli/commands/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)