Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 31 additions & 12 deletions .claude/agents/docs-dev.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
---
name: docs-dev
description: Owns the dbdocs package — the click CLI plus the extract/site pipeline that turns dbt artifacts into a self-contained single-page-app (SPA). Use for CLI commands (generate/serve/deploy), the artifact extractors (nodes/erd/graph/column-lineage), the data-dict assembly + base64 injection, the bundled vanilla-JS SPA, and the pytest suite. Scope is `dbdocs/` and `tests/`.
description: Owns the dbdocs package — the click CLI plus the extract/site pipeline that turns dbt artifacts into a single-page-app (SPA) with an external gzip data payload. Use for CLI commands (generate/serve/deploy), the artifact extractors (nodes/erd/graph/column-lineage/health), the data-dict assembly + external payload, the bundled vanilla-JS SPA, and the pytest suite. Scope is `dbdocs/` and `tests/`.
tools: Read, Edit, Write, Glob, Grep, Bash
model: sonnet
memory: project
---

You own the `dbdocs/` package. It reads dbt artifacts (`manifest.json` /
`catalog.json`) via the `dbterd` Python API, derives one project data dict, and
builds a **self-contained single-page-app** — a single `site/index.html` with all
data base64-injected as `window.dbdocsData`, plus vendored JS assets. dbdocs is
an **alternative dbt docs site = dbt docs + ERD + column-level lineage**. It is a
builds a **single-page-app**: a small `site/index.html` plus an **external**
`site/dbdocs-data.json.gz` that a hand-written vanilla-JS SPA fetches and
decompresses client-side (the data is never inlined into the HTML). dbdocs is an
**alternative dbt docs site = dbt docs + ERD + column-level lineage**. It is a
**doc generator**, not a dbt or dbterd reimplementation. There is no mkdocs,
mkdocs-material, mike, or Jinja2 templating — those are gone.

Expand All @@ -19,16 +20,22 @@ mkdocs-material, mike, or Jinja2 templating — those are gone.
- `dbdocs/cli/main.py` — the click command group and subcommands
(`generate`, `serve`, `deploy`).
- `dbdocs/extract/` — derive doc data from artifacts: `nodes` (models/sources/
seeds/snapshots → display records + nav tree), `erd` (Mermaid ERDs via
dbterd), `graph` (the node-level DAG), `column_lineage` +
`_sqlglot_lineage` (column-level lineage via sqlglot).
- `dbdocs/site/` — `builder` (assemble the data dict + write the site),
`inject` (base64 `window.dbdocsData`), `deploy` (hand-rolled versioning), and
the `bundle/` SPA (`index.html` + `assets/app.js` + `assets/style.css` +
`assets/vendor/`).
seeds/snapshots → display records + nav tree), `erd` + `erd_json` (structured
ERD `{nodes, edges}` via a dbterd `json` target adapter — not Mermaid text; the
SPA renders it with React Flow), `graph` (the node-level DAG), `column_lineage`
+ `_sqlglot_lineage` (column-level lineage via sqlglot), and the `health/`
sub-package (the always-built Health Check section from `run_results.json`).
- `dbdocs/site/` — `builder` (assemble the one data dict + write the site),
`inject` (`strip_marker` removes the `<!-- DBDOCS_DATA -->` placeholder — the
data is external, not inlined), `deploy` (hand-rolled versioning), and the
`bundle/` SPA (`index.html` + `assets/{css,js,vendor,graph}/`; `js/` is the
3-tier `data → service → ui` ES modules, `graph/` the committed React Flow
bundle).
- `dbdocs/core/` — `config` (`DbDocsConfig` from `dbdocs.yml`), `artifacts`
(artifact loading), `exceptions`, and the colored `log` singleton.
- `pytest` coverage at 100%.
- `pytest` coverage at 100% (`tests/`), **plus** the Playwright E2E specs at
`frontend/test/e2e/spa.spec.ts` that cover the rendered SPA — extend them
whenever you change bundle behavior (this is your one window outside `dbdocs/`).

## Non-responsibilities

Expand All @@ -46,13 +53,25 @@ mkdocs-material, mike, or Jinja2 templating — those are gone.
4. Run `uv run pytest --cov=dbdocs --cov-report=term-missing`.
5. Ensure coverage is 100%. Add tests before reporting done. Only `# pragma: no
cover` lines that are genuinely untestable I/O boundaries, and say so.
6. **If the change touches the rendered SPA** — the bundle (`site/bundle/**`:
`index.html`, `assets/{css,js}/`) or the React Flow graph (`frontend/**`) —
pytest does **not** exercise it; the Playwright E2E suite is the only thing
that does. Run `task frontend:e2e` (Node + a real demo build; one-time
`task frontend:e2e:install` for the browser). The E2E suite is **independent
of the 100% coverage gate** — green pytest coverage says nothing about the
SPA. Add/extend a spec in `frontend/test/e2e/spa.spec.ts` for new
user-visible behavior, and for a graph-source change also
`task frontend:build` to refresh the committed `assets/graph/` bundle.

## Conventions

- Follow the user's global Python rules: no relative imports, all imports at the
top of the file, one class per file (exception: multiple exception classes may
share one file), no nested functions/classes.
- Use specific exception types, never bare `except:` or `except Exception`.
- Keep comments sparse and present-tense — add one only when the code isn't
self-evident, describing what it does now, never historically (no "now / no
longer / used to / as before" changelog framing; git holds the history).
- Keep presentation in the SPA assets; keep the Python the thin glue that loads
artifacts and assembles the data dict.
- Follow DRY in tests — share fixtures via `tests/conftest.py`.
106 changes: 106 additions & 0 deletions .github/scripts/check_dpe_rules.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
"""Compare the published dbt-project-evaluator rule set against dbdocs's rules.

Scrapes each DPE rules category page for its rule heading anchors, derives the
anchors dbdocs already implements from the live ``DIMENSION_RULES`` registry (via
the same ``docs_url`` builder the findings use), and prints any DPE rule dbdocs
hasn't implemented yet. The watcher workflow turns a non-empty result into a
``feat:`` issue.

Output (stdout) is GitHub-Actions friendly: a ``missing<<EOF`` heredoc block
written to ``$GITHUB_OUTPUT`` when set, else a plain report. Exit code is always
0 — "nothing missing" is a normal, healthy result, not a failure.
"""

import os
import re
import sys
import urllib.request

from dbdocs.extract.health.rules.base import docs_url
from dbdocs.extract.health.rules.registry import DIMENSION_RULES

# The six DPE rules category pages, each diffed against the rules dbdocs registers
# for that dimension. Rule headings are the page's ``<h2>`` anchors (see below).
DPE_CATEGORIES = ("modeling", "testing", "documentation", "structure", "performance", "governance")
_BASE = "https://dbt-labs.github.io/dbt-project-evaluator/latest/rules"

# mkdocs-material renders each rule as an ``<h2 id="<slug>">`` heading (the page
# title is the ``<h1>``, any sub-detail is ``<h3>+``); matching only ``<h2>`` keeps
# non-rule headings out, so a new section heading can't masquerade as a rule.
_RULE_HEADING_ID = re.compile(r'<h2[^>]*\bid="([^"]+)"', re.IGNORECASE)

# A handful of ``<h2>`` slugs that aren't a rule (page-structure headings).
_NON_RULE_ANCHORS = set(DPE_CATEGORIES) | {
"rules",
"overview",
"exceptions",
"customization",
}


def _fetch(url: str) -> str:
request = urllib.request.Request(url, headers={"User-Agent": "dbdocs-dpe-watcher"})
with urllib.request.urlopen(request, timeout=30) as response: # noqa: S310 - fixed HTTPS host
return response.read().decode("utf-8", "replace")


def published_anchors(category: str) -> "set[str]":
"""The rule heading anchors published on a DPE category page.

Raises ``ValueError`` if the page yields no rule headings — a structural change
upstream — so the run fails loudly rather than reporting every rule as missing.
"""
html = _fetch(f"{_BASE}/{category}/")
anchors = {a for a in _RULE_HEADING_ID.findall(html) if a not in _NON_RULE_ANCHORS}
if not anchors:
raise ValueError(
f"No rule headings found on the DPE {category} page — page structure changed?"
)
return anchors


def implemented_anchors(category: str) -> "set[str]":
"""The DPE anchors dbdocs implements for *category* (via the rules' docs_url)."""
out = set()
for rule in DIMENSION_RULES.get(category, []):
url = docs_url(category, rule.__name__)
out.add(url.rsplit("#", 1)[-1])
return out


def find_missing() -> "dict[str, list[str]]":
"""Map each category to the DPE rule anchors dbdocs hasn't implemented yet."""
missing = {}
for category in DPE_CATEGORIES:
gap = sorted(published_anchors(category) - implemented_anchors(category))
if gap:
missing[category] = gap
return missing


def render(missing: "dict[str, list[str]]") -> str:
lines = []
for category in DPE_CATEGORIES:
for anchor in missing.get(category, []):
url = f"{_BASE}/{category}/#{anchor}"
lines.append(f"- **{category}** — `{anchor}` ([docs]({url}))")
return "\n".join(lines)


def main() -> int:
missing = find_missing()
report = render(missing)
github_output = os.environ.get("GITHUB_OUTPUT")
if github_output:
with open(github_output, "a", encoding="utf-8") as handle:
handle.write(f"has_missing={'true' if missing else 'false'}\n")
handle.write(f"missing<<DPE_EOF\n{report}\nDPE_EOF\n")
if missing:
print("DPE rules not yet implemented in dbdocs:\n" + report)
else:
print("dbdocs implements every published dbt-project-evaluator rule.")
return 0


if __name__ == "__main__":
sys.exit(main())
65 changes: 65 additions & 0 deletions .github/workflows/dpe-rules-watch.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
name: Watch dbt-project-evaluator rules

# Scrapes the published dbt-project-evaluator rule set weekly and opens a feature
# issue when DPE ships a rule dbdocs hasn't implemented yet. Health rules track
# DPE one-to-one, so a new DPE rule is a standing to-do.
on:
schedule:
- cron: "0 6 * * 1" # Mondays 06:00 UTC
workflow_dispatch:

permissions:
contents: read
issues: write

jobs:
watch:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v5

- name: Set up Python
run: uv python install 3.12

- name: Sync environment
run: uv sync

- name: Compare DPE rules against dbdocs
id: check
run: uv run python .github/scripts/check_dpe_rules.py

- name: Open / update a feature issue for new rules
if: steps.check.outputs.has_missing == 'true'
env:
GH_TOKEN: ${{ github.token }}
MISSING: ${{ steps.check.outputs.missing }}
run: |
set -euo pipefail
TITLE="feat: implement newly published dbt-project-evaluator rule(s)"
BODY=$(cat <<EOF
The weekly dbt-project-evaluator rules watcher found rule(s) published by
DPE that dbdocs doesn't implement yet. dbdocs health rules track DPE
one-to-one, so each of these wants a new rule under the matching
\`dbdocs/extract/health/rules/dimensions/\` module (plus a valid
\`docs_url\` anchor and a unit test).

## Missing rules

${MISSING}

---
_Filed automatically by \`.github/workflows/dpe-rules-watch.yml\`. Closes
itself only when re-run after the rules are implemented._
EOF
)
# Dedupe on the stable title: update the open issue if one exists, else create.
EXISTING=$(gh issue list --state open --search "in:title \"$TITLE\"" --json number --jq '.[0].number // empty')
if [ -n "$EXISTING" ]; then
gh issue edit "$EXISTING" --body "$BODY"
echo "Updated existing issue #$EXISTING."
else
gh issue create --title "$TITLE" --body "$BODY" --label enhancement --label triage
fi
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,8 @@ frontend/coverage/
frontend/*.tsbuildinfo
frontend/test-results/
frontend/playwright-report/
test-results/
playwright-report/
.eslintcache

# npm / yarn / pnpm logs & debug output
Expand Down
5 changes: 5 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,11 @@ block (e.g. `algo`) controlling ERD relationship detection.
- Specific exception types in `try/except` — never bare `except:` /
`except Exception`.
- No backward-compat shims unless explicitly asked.
- Comments are sparse and present-tense. Add one only when the code isn't
self-evident, and have it describe the code as it stands — never historically.
No changelog narration: drop "now / no longer / used to / as before / instead
of the old" framing (that's what git is for). Applies to Python, the bundle JS,
and tests (incl. test names).
- DRY in tests — share fixtures via `tests/conftest.py`.
- The SPA (vanilla JS under `site/bundle/`) owns presentation; the Python only
assembles the data dict. The shell is native ES modules in 3 tiers under
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ dbt's built-in docs stop short of telling you *which upstream column fed this do
- **Fail-soft** — an unparseable model is skipped, not fatal.
- **Project Health Check** across the six [dbt-project-evaluator](https://dbt-labs.github.io/dbt-project-evaluator/) dimensions.
- **Versioned deploys** with a built-in version switcher, no plugins.
- **Catalog navigation + client-side search**, no backend.
- **Full-text search** across names, columns, descriptions, tags, and SQL at the client-side, no backend.
- **Dark / light theme.**

## Install
Expand Down
19 changes: 12 additions & 7 deletions dbdocs.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -82,18 +82,23 @@ default_version: latest
# model_fanout: 3 # > N direct model children is flagged
# too_many_joins: 7 # >= N upstream dependencies is flagged
# chained_view_dependencies: 4 # >= N-deep view/ephemeral chain is flagged
# documentation_coverage: 100 # < N% of models documented is flagged
#
# # Disable individual rules by name. The full built-in set:
# # testing: test_coverage, missing_primary_key_tests
# # modeling: direct_join_to_source, duplicate_sources, model_fanout,
# # multiple_sources_joined, rejoining_of_upstream_concepts,
# # Disable individual rules by name. The full built-in set (one-to-one with
# # the dbt-project-evaluator rules):
# # testing: test_coverage, missing_primary_key_tests,
# # missing_source_freshness
# # modeling: direct_join_to_source,
# # downstream_models_dependent_on_source, duplicate_sources,
# # hard_coded_references, model_fanout, multiple_sources_joined,
# # rejoining_of_upstream_concepts,
# # root_models, source_fanout, staging_dependent_on_staging,
# # staging_dependent_on_marts_or_intermediate, unused_sources,
# # too_many_joins
# # documentation: undocumented_models, undocumented_sources,
# # undocumented_source_tables
# # documentation: documentation_coverage, undocumented_models,
# # undocumented_sources, undocumented_source_tables
# # structure: model_naming_conventions, model_directories,
# # source_directories
# # source_directories, test_directories
# # performance: chained_view_dependencies, exposure_parents_materializations
# # governance: public_models_without_contracts, undocumented_public_models,
# # exposures_dependent_on_private_models
Expand Down
22 changes: 22 additions & 0 deletions dbdocs/extract/health/dimensions.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ def __init__(self, manifest: "Any | None", thresholds: "dict | None" = None) ->
self.models = [n for uid, n in self._nodes.items() if uid.startswith("model.")]
self.sources = list(self._sources.values())
self.exposures = list(self._exposures.values())
# Singular tests are custom-SQL test nodes (no test_metadata); generic
# tests (unique/not_null/…) carry test_metadata and are excluded.
self.singular_tests = [
n
for uid, n in self._nodes.items()
if uid.startswith("test.") and getattr(n, "test_metadata", None) is None
]

# Rule thresholds: per-run overrides layered over the DPE defaults.
self._thresholds = {**DEFAULT_THRESHOLDS, **(thresholds or {})}
Expand Down Expand Up @@ -147,6 +154,21 @@ def access(model: Any) -> str:
access = access or getattr(model, "access", None)
return str(access or "protected").lower()

@staticmethod
def has_source_freshness(source: Any) -> bool:
"""Whether a source has a freshness check: a ``loaded_at_field`` plus a
``warn_after``/``error_after`` threshold count."""
if not str(getattr(source, "loaded_at_field", "") or "").strip():
return False
freshness = getattr(source, "freshness", None)
if freshness is None:
return False
for bound in ("warn_after", "error_after"):
period = getattr(freshness, bound, None)
if period is not None and getattr(period, "count", None) is not None:
return True
return False

@staticmethod
def contract_enforced(model: Any) -> bool:
"""Whether the model has an enforced contract (``contract.enforced``)."""
Expand Down
Loading
Loading