Skip to content

feat: full-text search + complete dpe#3

Merged
datnguye merged 1 commit into
mainfrom
feat/search-and-orphan-rule
Jun 13, 2026
Merged

feat: full-text search + complete dpe#3
datnguye merged 1 commit into
mainfrom
feat/search-and-orphan-rule

Conversation

@il-dat

@il-dat il-dat commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

What & why

Two capabilities in one change.

1. Full-text search overhaul (SPA). The bundled SPA's search becomes a real full-text search instead of a name/column-only lookup. A query now indexes the whole searchable surface of every node — name, columns and column descriptions, tags, description, warehouse relation, package, called macros, and the raw + compiled SQL — and each non-name hit carries a match-reason snippet (mkdocs-material style): a labelled chip (Column, SQL, Tag, …) plus an excerpt with the matched terms highlighted, so you can see why a result is there. Two inline operators (type:<resource_type> and label:/name:) narrow without leaving the box, and the dropdown is fully keyboard-operable (↑/↓ rove, Enter follows, Esc closes) and wired as an ARIA combobox/listbox with a live region for screen readers.

2. Complete dbt-project-evaluator (DPE) health-rule parity. dbdocs health rules now implement all 29 published DPE rules across the six dimensions:

  • Removed the non-DPE orphan_models rule — it had no DPE counterpart, so its docs_url linked to a dead anchor.
  • Added the 5 missing DPE rules, each with a valid docs anchor and unit tests:
    • downstream_models_dependent_on_source, hard_coded_references (modeling)
    • missing_source_freshness (testing)
    • documentation_coverage (documentation)
    • test_directories (structure)
  • hard_coded_references parses each model's raw SQL with sqlglot (ref()/source() jinja rewritten to numbered sentinels; CTEs excluded via scope analysis; fail-soft on unparseable SQL — one model never sinks the pass).
  • Fixed 3 dead docs anchors via a new name→anchor override map in base.py (the rule name doesn't always kebab to the DPE heading): too_many_joins, staging_dependent_on_marts_or_intermediate, and staging_dependent_on_staging.
  • Added a weekly watcher (.github/workflows/dpe-rules-watch.yml + .github/scripts/check_dpe_rules.py) that scrapes the DPE rules pages, diffs them against the live rule registry, and opens (or updates) a single feat: issue when DPE publishes a rule dbdocs doesn't have yet. The watcher is what surfaced the third dead anchor.

Type of change

  • 🐞 Bug fix (non-breaking change that fixes an issue)
  • ✨ Feature (non-breaking change that adds capability)
  • 💥 Breaking change (existing behavior changes)
  • 🧹 Refactor / internal (no user-facing change)
  • 📖 Docs only

Note: removing orphan_models is technically a behavior change to the Health section, but it never matched a DPE rule and only ever produced a finding with a broken docs link.

Area

  • CLI (generate / serve / deploy)
  • extract/ (nodes / erd / graph / column_lineage / health)
  • site/ (data dict / builder / 3-tier SPA / deploy)
  • frontend/ (React Flow graph bundle)
  • Config (dbdocs.yml) / packaging / CI / docs

How to test

task generate && task serve   # then use the search box:
#   count_food_items   → a result with a "Column" snippet (matched a column, not a name)
#   type:seed          → lists every seed, nothing else
#   type:model orders  → only models matching "orders"
#   label:stg          → names containing "stg", ignoring SQL-body mentions
#   type:bogus         → "No matches." cue (not a silent empty dropdown)
#   ↑ / ↓ / Enter / Esc → keyboard nav of the dropdown

task test            # 100% coverage (331 tests), incl. the new DPE rules + watcher logic
task frontend:e2e    # the new Playwright search specs (Node + a real demo build)

# Verify full DPE parity + that every rule's docs anchor resolves (hits the live DPE pages):
uv run python .github/scripts/check_dpe_rules.py
#   → "dbdocs implements every published dbt-project-evaluator rule."

Checklist

  • task lint passes (ruff format --check + ruff check).
  • task test passes at 100% coverage (331 tests, 100.00% total).
  • I followed the load-bearing patterns in .claude/design_patterns.md. Extended the health-rule registry (Always-built Health-Check pattern: one module per dimension, register_rule/docs_url/finding public API) and the 3-tier data→service→ui SPA seam (service.js DOM-free, ui.js the only DOM toucher). No pattern added/removed — the doc describes the rule package generically, so no edit needed.
  • Data-dict / SPA changes keep producer ↔ consumer in sync and don't re-inline the payload. Search reads existing data-dict fields only — no Python builder change, payload stays the external gzip; new logic lives in service.js (pure) + ui.js (DOM). Health findings keep the same {rule, node, node_type, message, docs_url} shape.
  • Graph-UI changes rebuilt the committed bundle — n/a: no frontend/src/** change; only frontend/test/e2e/spa.spec.ts was touched.
  • New files under dbdocs/site/bundle/**n/a: no new bundle files (bundle assets edited in place; the dbdocs/site/bundle/**/* glob already covers them). New files are under .github/ + tests/, not shipped in the wheel.
  • Docs / dbdocs.yml.example updated. dbdocs.yml.example + docs/dbdocs-demo.yml list all 29 rules + the new documentation_coverage threshold; the search guide (docs/nav/guide/search.md) is wired into mkdocs.yml; README/docs/index.md feature line refreshed.

Screenshots / notes

  • Site change — a before/after of the search dropdown belongs here, showing the new match-reason snippet (labelled chip + highlighted excerpt).
  • Search behavior is verified by the new Playwright specs in frontend/test/e2e/spa.spec.ts, which run in the independent CI e2e job (outside the coverage gate, since pytest doesn't exercise the SPA JS).
  • The DPE watcher runs weekly (Mondays 06:00 UTC) and on manual dispatch; it dedupes on a stable issue title so it updates rather than spams.

@il-dat il-dat force-pushed the feat/search-and-orphan-rule branch from 7006d8a to cfd89b7 Compare June 13, 2026 14:26
@il-dat il-dat changed the title feat: full-text search + orphan_models health rule feat: full-text search + complete dbt-project-evaluator health rule parity Jun 13, 2026
@datnguye datnguye changed the title feat: full-text search + complete dbt-project-evaluator health rule parity feat: full-text search + complete dpe Jun 13, 2026
@il-dat il-dat force-pushed the feat/search-and-orphan-rule branch from cfd89b7 to b715a62 Compare June 13, 2026 14:34
@datnguye datnguye merged commit 4264570 into main Jun 13, 2026
14 checks passed
@datnguye datnguye deleted the feat/search-and-orphan-rule branch June 13, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants