Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .claude/agents/docs-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,12 @@ mkdocs-material, mike, or Jinja2 templating — those are gone.
- `dbdocs/cli/main.py` — the click command group and subcommands
(`generate`, `serve`, `deploy`).
- `dbdocs/extract/` — derive doc data from artifacts: `nodes` (models/sources/
seeds/snapshots → display records + nav tree), `erd` + `erd_json` (structured
ERD `{nodes, edges}` via a dbterd `json` target adapter — not Mermaid text; the
SPA renders it with React Flow), `graph` (the node-level DAG), `column_lineage`
+ `_sqlglot_lineage` (column-level lineage via sqlglot), and the `health/`
sub-package (the always-built Health Check section from `run_results.json`).
seeds/snapshots → display records + nav tree), `erd` (structured ERD
`{nodes, edges}` via dbterd's built-in `json` target ≥ 1.28.0 — not Mermaid
text; the SPA renders it with React Flow), `graph` (the node-level DAG),
`column_lineage` + `_sqlglot_lineage` (column-level lineage via sqlglot), and
the `health/` sub-package (the always-built Health Check section from
`run_results.json`).
- `dbdocs/site/` — `builder` (assemble the one data dict + write the site),
`inject` (`strip_marker` removes the `<!-- DBDOCS_DATA -->` placeholder — the
data is external, not inlined), `deploy` (hand-rolled versioning), and the
Expand Down
62 changes: 57 additions & 5 deletions .claude/design_patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,21 +32,24 @@ authoritative; grep it.
- [Windowed graph rendering](#windowed-graph-rendering)
- [Theory](#theory-7)
- [Example](#example-7)
- [Bundled SPA directory resolution](#bundled-spa-directory-resolution)
- [ERD from dbterd's built-in json target](#erd-from-dbterds-built-in-json-target)
- [Theory](#theory-8)
- [Example](#example-8)
- [Versioned deploy without mike](#versioned-deploy-without-mike)
- [Bundled SPA directory resolution](#bundled-spa-directory-resolution)
- [Theory](#theory-9)
- [Example](#example-9)
- [Click group entrypoint](#click-group-entrypoint)
- [Versioned deploy without mike](#versioned-deploy-without-mike)
- [Theory](#theory-10)
- [Example](#example-10)
- [Singleton colored logger](#singleton-colored-logger)
- [Click group entrypoint](#click-group-entrypoint)
- [Theory](#theory-11)
- [Example](#example-11)
- [Always-built artifact-derived data-dict section (Health Check)](#always-built-artifact-derived-data-dict-section-health-check)
- [Singleton colored logger](#singleton-colored-logger)
- [Theory](#theory-12)
- [Example](#example-12)
- [Always-built artifact-derived data-dict section (Health Check)](#always-built-artifact-derived-data-dict-section-health-check)
- [Theory](#theory-13)
- [Example](#example-13)

## Pipeline-stage package layout

Expand Down Expand Up @@ -354,6 +357,55 @@ const dagKeep = useMemo(() => {
- `frontend/src/components/GraphApp.tsx` — `MAX_UNFOCUSED_DAG_NODES`, `dagKeep`, `erdNodeEmpty`
- `dbdocs/site/builder.py` / `dbdocs/extract/erd.py` — `erd_algo` (metadata)

## ERD from dbterd's built-in json target

### Theory

The SPA renders its ERD with React Flow, which needs structured node/edge data
— not the diagram *text* dbterd's other targets emit. dbterd ≥ 1.28 ships a
**built-in, schema-validated `json` target** that emits `{nodes, edges,
metadata}`; `build_erd(target="json")` forces it, and `build_erd_data` maps that
into the SPA's `{nodes, edges}`. Do **not** reintroduce a custom
`@register_target("json")` adapter — dbterd owns this contract now. Two dbterd
quirks `build_erd_data` patches (verify after any dbterd bump with
`task frontend:e2e`):

1. **Short-name edge ids.** With `entity_name_format` configured, dbterd emits
edge `from_id`/`to_id` as the *formatted entity name* (e.g. `orders`), not the
full unique_id (e.g. `model.jaffle_shop.orders`). `_resolve_edge_id` resolves
those back through a `name_to_id` map (built from each node's `name`) so the
SPA's `source`/`target` always reference a valid node `id`. An id already in
`node_ids` passes through untouched (the no-`entity_name_format` case).
2. **Missing FK flags.** Some algos (e.g. `model_contract`) don't set
`is_foreign_key` on node columns even when those columns appear in FK edges.
`_backfill_fk_flags` sets `is_foreign_key=True` on any column named in an
edge's `from_columns` (the FK/child side), indexed by node id so it's O(1) per
column per edge.

**SPA edge direction:** `source` = the referenced/parent side (dbterd `to_id`),
`target` = the FK/child side (dbterd `from_id`). The graph bundle's per-column
connector handles (`buildErdFlow` in `frontend/src/lib/data.ts`) resolve each
handle against whichever endpoint actually owns the named column (`owned(...)`),
so a join whose FK/PK columns differ in name still lands on the right rows.

### Example

```python
# dbdocs/extract/erd.py — official {nodes, edges}; resolve short names + back-fill FK flags
payload = json.loads(erd.get_erd())
raw_nodes = payload.get("nodes", [])
nodes = [_build_node(n) for n in raw_nodes]
node_ids = {n["id"] for n in nodes}
name_to_id = {n.get("name"): n["id"] for n in raw_nodes if n.get("name")}
edges = [_build_edge(e, i, node_ids, name_to_id) for i, e in enumerate(payload.get("edges", []))]
_backfill_fk_flags(nodes, edges)
return {"nodes": nodes, "edges": edges}
```

- `dbdocs/extract/erd.py` — `def build_erd` (forces `target="json"`), `def build_erd_data`, `def _build_node`, `def _build_edge`, `def _resolve_edge_id`, `def _backfill_fk_flags`
- `frontend/src/lib/data.ts` — `buildErdFlow` (consumes `source`/`target`/`from_columns`/`to_columns`/`is_foreign_key`; `owned()` picks the handle column each endpoint owns)
- `pyproject.toml` — `dbterd>=1.28` (the built-in `json` target floor)

## Bundled SPA directory resolution

### Theory
Expand Down
7 changes: 3 additions & 4 deletions .claude/skills/spa-site/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,10 +95,9 @@ SPA loads (via `data.js`, which fetches `dbdocs-data.json.gz` and exposes it on
entities with their columns (`is_primary_key` / `is_foreign_key` flags),
`edges` are foreign-key relationships — all keyed by dbt `unique_id`. Built by
`dbdocs/extract/erd.py` (`build_erd` / `build_erd_data`), which runs dbterd's
`json` target — a custom `@register_target("json")` adapter in
`dbdocs/extract/erd_json.py` that emits `{tables, relationships}`. The React
Flow bundle derives all three graph surfaces (full DAG, global ERD, per-node
ERD) from `lineage` + `erd`.
built-in `json` target (dbterd ≥ 1.28.0; emits `{nodes, edges, metadata}`).
The React Flow bundle derives all three graph surfaces (full DAG, global ERD,
per-node ERD) from `lineage` + `erd`.

## Payload (external gzip — `dbdocs/site/inject.py` + `builder.generate`)

Expand Down
153 changes: 96 additions & 57 deletions dbdocs/extract/erd.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
"""Structured ERD data via dbterd's ``json`` target.

dbterd's built-in targets emit diagram text; the SPA renders its ERD with React
Flow, which needs structured node/edge data. We register a ``json`` target
(:mod:`dbdocs.extract.erd_json`) and parse its ``{tables, relationships}`` output
into the SPA's ``{nodes, edges}`` — entities with columns (PK/FK flags) and
foreign-key edges between them, all keyed by dbt unique_id.
"""Structured ERD data via dbterd's official ``json`` target.

dbterd 1.28.0 ships a built-in, schema-validated ``json`` target that emits
``{nodes, edges, metadata}``. Each node carries ``id`` (the dbt unique_id),
``name``, ``schema_name``, ``database``, ``resource_type``, and ``columns``
(with ``data_type``, ``is_primary_key``, ``is_foreign_key``). Each edge carries
``id``, ``from_id`` (FK/child side), ``to_id`` (referenced/parent side),
``from_columns``, ``to_columns``, ``label``, and ``cardinality``.

``build_erd_data`` maps that shape into the SPA's ``{nodes, edges}`` — the
React Flow bundle reads ``nodes`` (entities + column flags) and ``edges``
(FK relationships, ``source``/``target`` keyed by dbt unique_id).
"""

import json

from dbterd.api import DbtErd, default

# Importing the module registers the "json" target with dbterd's PluginRegistry.
from dbdocs.extract import erd_json # noqa: F401


def erd_algo(dbterd_options: "dict | None" = None) -> str:
"""The dbterd algorithm that detected the ERD relationships.
Expand Down Expand Up @@ -46,66 +48,103 @@ def build_erd(dbterd_options: "dict | None" = None, artifacts_dir: "str | None"


def build_erd_data(erd: DbtErd) -> dict:
"""Parse the json target into ``{"nodes": [...], "edges": [...]}``.

Nodes are entities (with columns, ``is_primary_key``/``is_foreign_key`` flags
and the resolved dbt unique_id); edges are foreign-key relationships between
them. dbterd's relationships reference tables by *name*, so we map those back
to unique_ids via each table's ``node_name``.
"""Parse dbterd's official ``{nodes, edges}`` payload into the SPA shape.

dbterd's ``json`` target emits nodes keyed by dbt unique_id (``id`` field).
When ``entity_name_format`` is configured, dbterd emits edge ``from_id`` /
``to_id`` as the formatted entity name (e.g. ``orders``) rather than the
full unique_id (e.g. ``model.jaffle_shop.orders``). A ``name_to_id`` map
resolves those short names back to the node id before building edges, so the
SPA's ``source``/``target`` always reference a valid node ``id``.

Some dbterd algos (e.g. ``model_contract``) do not set ``is_foreign_key``
on node columns even when those columns appear in FK edges. After building
edges, any column named in an edge's ``from_columns`` (the FK/child side)
has its ``is_foreign_key`` flag back-filled to ``True`` on the target node.

SPA edge direction: ``source`` is the referenced (parent) side, ``target``
is the FK (child) side — matching dbterd's ``to_id`` and ``from_id``
respectively.
"""
payload = json.loads(erd.get_erd())
tables = payload.get("tables", [])
relationships = payload.get("relationships", [])

# table name (as dbterd refers to it in relationships) → dbt unique_id.
name_to_id = {t["name"]: (t.get("node_name") or t["name"]) for t in tables}

edges, fk_columns = _build_edges(relationships, name_to_id)
nodes = [_build_node(t, fk_columns.get(t.get("node_name") or t["name"], set())) for t in tables]
raw_nodes = payload.get("nodes", [])
raw_edges = payload.get("edges", [])
nodes = [_build_node(n) for n in raw_nodes]
node_ids = {n["id"] for n in nodes}
# Count occurrences first; ambiguous names (more than one node) are excluded
# so a collision can't silently resolve to the wrong node.
name_counts: dict[str, int] = {}
for n in raw_nodes:
nm = n.get("name")
if nm:
name_counts[nm] = name_counts.get(nm, 0) + 1
name_to_id = {
n.get("name"): n["id"]
for n in raw_nodes
if n.get("name") and name_counts[n.get("name")] == 1
}
edges = [_build_edge(e, i, node_ids, name_to_id) for i, e in enumerate(raw_edges)]
_backfill_fk_flags(nodes, edges)
return {"nodes": nodes, "edges": edges}


def _build_edges(relationships: list, name_to_id: dict) -> "tuple[list, dict]":
"""Map relationships → edges and collect each node's FK column names."""
edges = []
fk_columns: dict = {}
for index, rel in enumerate(relationships):
parent_name, child_name = rel["table_map"]
parent_cols, child_cols = rel["column_map"]
source = name_to_id.get(parent_name, parent_name)
target = name_to_id.get(child_name, child_name)
# The child side holds the foreign key columns.
fk_columns.setdefault(target, set()).update(child_cols)
edges.append(
{
"id": rel.get("name") or f"e{index}",
"source": source,
"target": target,
"from_columns": list(parent_cols),
"to_columns": list(child_cols),
"label": rel.get("relationship_label"),
"type": rel.get("type", ""),
}
)
return edges, fk_columns

def _backfill_fk_flags(nodes: "list[dict]", edges: "list[dict]") -> None:
"""Set is_foreign_key=True on columns named in each edge's from_columns.

def _build_node(table: dict, fk_cols: set) -> dict:
node_id = table.get("node_name") or table["name"]
Keyed by node id so the lookup is O(1) per column per edge.
"""
nodes_by_id = {n["id"]: n for n in nodes}
for edge in edges:
target_node = nodes_by_id.get(edge["target"])
if target_node is None:
continue
fk_cols = {c.lower() for c in edge.get("from_columns", [])}
if not fk_cols:
continue
for col in target_node["columns"]:
if col["name"].lower() in fk_cols:
col["is_foreign_key"] = True


def _build_node(node: dict) -> dict:
return {
"id": node_id,
"label": table["name"],
"database": table.get("database") or "",
"schema": table.get("schema") or "",
"resource_type": table.get("resource_type") or "model",
"id": node["id"],
"label": node.get("name") or "",
"database": node.get("database") or "",
"schema": node.get("schema_name") or "",
"resource_type": node.get("resource_type") or "model",
"columns": [
{
"name": c["name"],
"type": c.get("data_type") or "",
"description": c.get("description") or "",
"is_primary_key": bool(c.get("is_primary_key")),
"is_foreign_key": c["name"] in fk_cols,
"is_foreign_key": bool(c.get("is_foreign_key")),
}
for c in table.get("columns", [])
for c in node.get("columns", [])
],
}


def _resolve_edge_id(raw: str, node_ids: "set[str]", name_to_id: "dict[str, str]") -> str:
# If raw is already a valid node id, keep it (the no-entity_name_format case).
# Otherwise resolve through the name→id map built from node labels.
if raw in node_ids:
return raw
return name_to_id.get(raw, raw)


def _build_edge(edge: dict, index: int, node_ids: "set[str]", name_to_id: "dict[str, str]") -> dict:
# from_id is the FK/child side; to_id is the referenced/parent side.
# SPA convention: source = parent (to_id), target = child (from_id).
raw_from = edge.get("from_id") or ""
raw_to = edge.get("to_id") or ""
return {
"id": edge.get("id") or f"e{index}",
"source": _resolve_edge_id(raw_to, node_ids, name_to_id),
"target": _resolve_edge_id(raw_from, node_ids, name_to_id),
"from_columns": edge.get("from_columns") or [],
"to_columns": edge.get("to_columns") or [],
"label": edge.get("label") or "",
"type": edge.get("cardinality") or "",
}
80 changes: 0 additions & 80 deletions dbdocs/extract/erd_json.py

This file was deleted.

Loading
Loading