datnguye · datnguye · Jun 13, 2026 · Jun 13, 2026
diff --git a/.claude/agents/docs-dev.md b/.claude/agents/docs-dev.md
@@ -1,16 +1,17 @@
 ---
 name: docs-dev
-description: Owns the dbdocs package — the click CLI plus the extract/site pipeline that turns dbt artifacts into a self-contained single-page-app (SPA). Use for CLI commands (generate/serve/deploy), the artifact extractors (nodes/erd/graph/column-lineage), the data-dict assembly + base64 injection, the bundled vanilla-JS SPA, and the pytest suite. Scope is `dbdocs/` and `tests/`.
+description: Owns the dbdocs package — the click CLI plus the extract/site pipeline that turns dbt artifacts into a single-page-app (SPA) with an external gzip data payload. Use for CLI commands (generate/serve/deploy), the artifact extractors (nodes/erd/graph/column-lineage/health), the data-dict assembly + external payload, the bundled vanilla-JS SPA, and the pytest suite. Scope is `dbdocs/` and `tests/`.
 tools: Read, Edit, Write, Glob, Grep, Bash
 model: sonnet
 memory: project
 ---
 
 You own the `dbdocs/` package. It reads dbt artifacts (`manifest.json` /
 `catalog.json`) via the `dbterd` Python API, derives one project data dict, and
-builds a **self-contained single-page-app** — a single `site/index.html` with all
-data base64-injected as `window.dbdocsData`, plus vendored JS assets. dbdocs is
-an **alternative dbt docs site = dbt docs + ERD + column-level lineage**. It is a
+builds a **single-page-app**: a small `site/index.html` plus an **external**
+`site/dbdocs-data.json.gz` that a hand-written vanilla-JS SPA fetches and
+decompresses client-side (the data is never inlined into the HTML). dbdocs is an
+**alternative dbt docs site = dbt docs + ERD + column-level lineage**. It is a
 **doc generator**, not a dbt or dbterd reimplementation. There is no mkdocs,
 mkdocs-material, mike, or Jinja2 templating — those are gone.
 
@@ -19,16 +20,22 @@ mkdocs-material, mike, or Jinja2 templating — those are gone.
 - `dbdocs/cli/main.py` — the click command group and subcommands
   (`generate`, `serve`, `deploy`).
 - `dbdocs/extract/` — derive doc data from artifacts: `nodes` (models/sources/
-  seeds/snapshots → display records + nav tree), `erd` (Mermaid ERDs via
-  dbterd), `graph` (the node-level DAG), `column_lineage` +
-  `_sqlglot_lineage` (column-level lineage via sqlglot).
-- `dbdocs/site/` — `builder` (assemble the data dict + write the site),
-  `inject` (base64 `window.dbdocsData`), `deploy` (hand-rolled versioning), and
-  the `bundle/` SPA (`index.html` + `assets/app.js` + `assets/style.css` +
-  `assets/vendor/`).
+  seeds/snapshots → display records + nav tree), `erd` + `erd_json` (structured
+  ERD `{nodes, edges}` via a dbterd `json` target adapter — not Mermaid text; the
+  SPA renders it with React Flow), `graph` (the node-level DAG), `column_lineage`
+  + `_sqlglot_lineage` (column-level lineage via sqlglot), and the `health/`
+  sub-package (the always-built Health Check section from `run_results.json`).
+- `dbdocs/site/` — `builder` (assemble the one data dict + write the site),
+  `inject` (`strip_marker` removes the `<!-- DBDOCS_DATA -->` placeholder — the
+  data is external, not inlined), `deploy` (hand-rolled versioning), and the
+  `bundle/` SPA (`index.html` + `assets/{css,js,vendor,graph}/`; `js/` is the
+  3-tier `data → service → ui` ES modules, `graph/` the committed React Flow
+  bundle).
 - `dbdocs/core/` — `config` (`DbDocsConfig` from `dbdocs.yml`), `artifacts`
   (artifact loading), `exceptions`, and the colored `log` singleton.
-- `pytest` coverage at 100%.
+- `pytest` coverage at 100% (`tests/`), **plus** the Playwright E2E specs at
+  `frontend/test/e2e/spa.spec.ts` that cover the rendered SPA — extend them
+  whenever you change bundle behavior (this is your one window outside `dbdocs/`).
 
 ## Non-responsibilities
 
@@ -46,13 +53,25 @@ mkdocs-material, mike, or Jinja2 templating — those are gone.
 4. Run `uv run pytest --cov=dbdocs --cov-report=term-missing`.
 5. Ensure coverage is 100%. Add tests before reporting done. Only `# pragma: no
    cover` lines that are genuinely untestable I/O boundaries, and say so.
+6. **If the change touches the rendered SPA** — the bundle (`site/bundle/**`:
+   `index.html`, `assets/{css,js}/`) or the React Flow graph (`frontend/**`) —
+   pytest does **not** exercise it; the Playwright E2E suite is the only thing
+   that does. Run `task frontend:e2e` (Node + a real demo build; one-time
+   `task frontend:e2e:install` for the browser). The E2E suite is **independent
+   of the 100% coverage gate** — green pytest coverage says nothing about the
+   SPA. Add/extend a spec in `frontend/test/e2e/spa.spec.ts` for new
+   user-visible behavior, and for a graph-source change also
+   `task frontend:build` to refresh the committed `assets/graph/` bundle.
 
 ## Conventions
 
 - Follow the user's global Python rules: no relative imports, all imports at the
   top of the file, one class per file (exception: multiple exception classes may
   share one file), no nested functions/classes.
 - Use specific exception types, never bare `except:` or `except Exception`.
+- Keep comments sparse and present-tense — add one only when the code isn't
+  self-evident, describing what it does now, never historically (no "now / no
+  longer / used to / as before" changelog framing; git holds the history).
 - Keep presentation in the SPA assets; keep the Python the thin glue that loads
   artifacts and assembles the data dict.
 - Follow DRY in tests — share fixtures via `tests/conftest.py`.
diff --git a/.github/scripts/check_dpe_rules.py b/.github/scripts/check_dpe_rules.py
@@ -0,0 +1,106 @@
+"""Compare the published dbt-project-evaluator rule set against dbdocs's rules.
+
+Scrapes each DPE rules category page for its rule heading anchors, derives the
+anchors dbdocs already implements from the live ``DIMENSION_RULES`` registry (via
+the same ``docs_url`` builder the findings use), and prints any DPE rule dbdocs
+hasn't implemented yet. The watcher workflow turns a non-empty result into a
+``feat:`` issue.
+
+Output (stdout) is GitHub-Actions friendly: a ``missing<<EOF`` heredoc block
+written to ``$GITHUB_OUTPUT`` when set, else a plain report. Exit code is always
+0 — "nothing missing" is a normal, healthy result, not a failure.
+"""
+
+import os
+import re
+import sys
+import urllib.request
+
+from dbdocs.extract.health.rules.base import docs_url
+from dbdocs.extract.health.rules.registry import DIMENSION_RULES
+
+# The six DPE rules category pages, each diffed against the rules dbdocs registers
+# for that dimension. Rule headings are the page's ``<h2>`` anchors (see below).
+DPE_CATEGORIES = ("modeling", "testing", "documentation", "structure", "performance", "governance")
+_BASE = "https://dbt-labs.github.io/dbt-project-evaluator/latest/rules"
+
+# mkdocs-material renders each rule as an ``<h2 id="<slug>">`` heading (the page
+# title is the ``<h1>``, any sub-detail is ``<h3>+``); matching only ``<h2>`` keeps
+# non-rule headings out, so a new section heading can't masquerade as a rule.
+_RULE_HEADING_ID = re.compile(r'<h2[^>]*\bid="([^"]+)"', re.IGNORECASE)
+
+# A handful of ``<h2>`` slugs that aren't a rule (page-structure headings).
+_NON_RULE_ANCHORS = set(DPE_CATEGORIES) | {
+    "rules",
+    "overview",
+    "exceptions",
+    "customization",
+}
+
+
+def _fetch(url: str) -> str:
+    request = urllib.request.Request(url, headers={"User-Agent": "dbdocs-dpe-watcher"})
+    with urllib.request.urlopen(request, timeout=30) as response:  # noqa: S310 - fixed HTTPS host
+        return response.read().decode("utf-8", "replace")
+
+
+def published_anchors(category: str) -> "set[str]":
+    """The rule heading anchors published on a DPE category page.
+
+    Raises ``ValueError`` if the page yields no rule headings — a structural change
+    upstream — so the run fails loudly rather than reporting every rule as missing.
+    """
+    html = _fetch(f"{_BASE}/{category}/")
+    anchors = {a for a in _RULE_HEADING_ID.findall(html) if a not in _NON_RULE_ANCHORS}
+    if not anchors:
+        raise ValueError(
+            f"No rule headings found on the DPE {category} page — page structure changed?"
+        )
+    return anchors
+
+
+def implemented_anchors(category: str) -> "set[str]":
+    """The DPE anchors dbdocs implements for *category* (via the rules' docs_url)."""
+    out = set()
+    for rule in DIMENSION_RULES.get(category, []):
+        url = docs_url(category, rule.__name__)
+        out.add(url.rsplit("#", 1)[-1])
+    return out
+
+
+def find_missing() -> "dict[str, list[str]]":
+    """Map each category to the DPE rule anchors dbdocs hasn't implemented yet."""
+    missing = {}
+    for category in DPE_CATEGORIES:
+        gap = sorted(published_anchors(category) - implemented_anchors(category))
+        if gap:
+            missing[category] = gap
+    return missing
+
+
+def render(missing: "dict[str, list[str]]") -> str:
+    lines = []
+    for category in DPE_CATEGORIES:
+        for anchor in missing.get(category, []):
+            url = f"{_BASE}/{category}/#{anchor}"
+            lines.append(f"- **{category}** — `{anchor}` ([docs]({url}))")
+    return "\n".join(lines)
+
+
+def main() -> int:
+    missing = find_missing()
+    report = render(missing)
+    github_output = os.environ.get("GITHUB_OUTPUT")
+    if github_output:
+        with open(github_output, "a", encoding="utf-8") as handle:
+            handle.write(f"has_missing={'true' if missing else 'false'}\n")
+            handle.write(f"missing<<DPE_EOF\n{report}\nDPE_EOF\n")
+    if missing:
+        print("DPE rules not yet implemented in dbdocs:\n" + report)
+    else:
+        print("dbdocs implements every published dbt-project-evaluator rule.")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/.github/workflows/dpe-rules-watch.yml b/.github/workflows/dpe-rules-watch.yml
@@ -0,0 +1,65 @@
+name: Watch dbt-project-evaluator rules
+
+# Scrapes the published dbt-project-evaluator rule set weekly and opens a feature
+# issue when DPE ships a rule dbdocs hasn't implemented yet. Health rules track
+# DPE one-to-one, so a new DPE rule is a standing to-do.
+on:
+  schedule:
+    - cron: "0 6 * * 1" # Mondays 06:00 UTC
+  workflow_dispatch:
+
+permissions:
+  contents: read
+  issues: write
+
+jobs:
+  watch:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+
+      - name: Set up Python
+        run: uv python install 3.12
+
+      - name: Sync environment
+        run: uv sync
+
+      - name: Compare DPE rules against dbdocs
+        id: check
+        run: uv run python .github/scripts/check_dpe_rules.py
+
+      - name: Open / update a feature issue for new rules
+        if: steps.check.outputs.has_missing == 'true'
+        env:
+          GH_TOKEN: ${{ github.token }}
+          MISSING: ${{ steps.check.outputs.missing }}
+        run: |
+          set -euo pipefail
+          TITLE="feat: implement newly published dbt-project-evaluator rule(s)"
+          BODY=$(cat <<EOF
+          The weekly dbt-project-evaluator rules watcher found rule(s) published by
+          DPE that dbdocs doesn't implement yet. dbdocs health rules track DPE
+          one-to-one, so each of these wants a new rule under the matching
+          \`dbdocs/extract/health/rules/dimensions/\` module (plus a valid
+          \`docs_url\` anchor and a unit test).
+
+          ## Missing rules
+
+          ${MISSING}
+
+          ---
+          _Filed automatically by \`.github/workflows/dpe-rules-watch.yml\`. Closes
+          itself only when re-run after the rules are implemented._
+          EOF
+          )
+          # Dedupe on the stable title: update the open issue if one exists, else create.
+          EXISTING=$(gh issue list --state open --search "in:title \"$TITLE\"" --json number --jq '.[0].number // empty')
+          if [ -n "$EXISTING" ]; then
+            gh issue edit "$EXISTING" --body "$BODY"
+            echo "Updated existing issue #$EXISTING."
+          else
+            gh issue create --title "$TITLE" --body "$BODY" --label enhancement --label triage
+          fi
diff --git a/.gitignore b/.gitignore
@@ -154,6 +154,8 @@ frontend/coverage/
 frontend/*.tsbuildinfo
 frontend/test-results/
 frontend/playwright-report/
+test-results/
+playwright-report/
 .eslintcache
 
 # npm / yarn / pnpm logs & debug output

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -128,6 +128,11 @@ block (e.g. `algo`) controlling ERD relationship detection.
 - Specific exception types in `try/except` — never bare `except:` /
   `except Exception`.
 - No backward-compat shims unless explicitly asked.
+- Comments are sparse and present-tense. Add one only when the code isn't
+  self-evident, and have it describe the code as it stands — never historically.
+  No changelog narration: drop "now / no longer / used to / as before / instead
+  of the old" framing (that's what git is for). Applies to Python, the bundle JS,
+  and tests (incl. test names).
 - DRY in tests — share fixtures via `tests/conftest.py`.
 - The SPA (vanilla JS under `site/bundle/`) owns presentation; the Python only
   assembles the data dict. The shell is native ES modules in 3 tiers under

diff --git a/README.md b/README.md
@@ -31,7 +31,7 @@ dbt's built-in docs stop short of telling you *which upstream column fed this do
 - **Fail-soft** — an unparseable model is skipped, not fatal.
 - **Project Health Check** across the six [dbt-project-evaluator](https://dbt-labs.github.io/dbt-project-evaluator/) dimensions.
 - **Versioned deploys** with a built-in version switcher, no plugins.
-- **Catalog navigation + client-side search**, no backend.
+- **Full-text search** across names, columns, descriptions, tags, and SQL at the client-side, no backend.
 - **Dark / light theme.**
 
 ## Install

diff --git a/dbdocs.yml.example b/dbdocs.yml.example
@@ -82,18 +82,23 @@ default_version: latest
 #     model_fanout: 3              # > N direct model children is flagged
 #     too_many_joins: 7            # >= N upstream dependencies is flagged
 #     chained_view_dependencies: 4 # >= N-deep view/ephemeral chain is flagged
+#     documentation_coverage: 100  # < N% of models documented is flagged
 #
-#   # Disable individual rules by name. The full built-in set:
-#   #   testing:       test_coverage, missing_primary_key_tests
-#   #   modeling:      direct_join_to_source, duplicate_sources, model_fanout,
-#   #                  multiple_sources_joined, rejoining_of_upstream_concepts,
+#   # Disable individual rules by name. The full built-in set (one-to-one with
+#   # the dbt-project-evaluator rules):
+#   #   testing:       test_coverage, missing_primary_key_tests,
+#   #                  missing_source_freshness
+#   #   modeling:      direct_join_to_source,
+#   #                  downstream_models_dependent_on_source, duplicate_sources,
+#   #                  hard_coded_references, model_fanout, multiple_sources_joined,
+#   #                  rejoining_of_upstream_concepts,
 #   #                  root_models, source_fanout, staging_dependent_on_staging,
 #   #                  staging_dependent_on_marts_or_intermediate, unused_sources,
 #   #                  too_many_joins
-#   #   documentation: undocumented_models, undocumented_sources,
-#   #                  undocumented_source_tables
+#   #   documentation: documentation_coverage, undocumented_models,
+#   #                  undocumented_sources, undocumented_source_tables
 #   #   structure:     model_naming_conventions, model_directories,
-#   #                  source_directories
+#   #                  source_directories, test_directories
 #   #   performance:   chained_view_dependencies, exposure_parents_materializations
 #   #   governance:    public_models_without_contracts, undocumented_public_models,
 #   #                  exposures_dependent_on_private_models

diff --git a/dbdocs/extract/health/dimensions.py b/dbdocs/extract/health/dimensions.py
@@ -50,6 +50,13 @@ def __init__(self, manifest: "Any | None", thresholds: "dict | None" = None) ->
         self.models = [n for uid, n in self._nodes.items() if uid.startswith("model.")]
         self.sources = list(self._sources.values())
         self.exposures = list(self._exposures.values())
+        # Singular tests are custom-SQL test nodes (no test_metadata); generic
+        # tests (unique/not_null/…) carry test_metadata and are excluded.
+        self.singular_tests = [
+            n
+            for uid, n in self._nodes.items()
+            if uid.startswith("test.") and getattr(n, "test_metadata", None) is None
+        ]
 
         # Rule thresholds: per-run overrides layered over the DPE defaults.
         self._thresholds = {**DEFAULT_THRESHOLDS, **(thresholds or {})}
@@ -147,6 +154,21 @@ def access(model: Any) -> str:
         access = access or getattr(model, "access", None)
         return str(access or "protected").lower()
 
+    @staticmethod
+    def has_source_freshness(source: Any) -> bool:
+        """Whether a source has a freshness check: a ``loaded_at_field`` plus a
+        ``warn_after``/``error_after`` threshold count."""
+        if not str(getattr(source, "loaded_at_field", "") or "").strip():
+            return False
+        freshness = getattr(source, "freshness", None)
+        if freshness is None:
+            return False
+        for bound in ("warn_after", "error_after"):
+            period = getattr(freshness, bound, None)
+            if period is not None and getattr(period, "count", None) is not None:
+                return True
+        return False
+
     @staticmethod
     def contract_enforced(model: Any) -> bool:
         """Whether the model has an enforced contract (``contract.enforced``)."""