aboyalejandro · aboyalejandro · Apr 5, 2026 · Apr 9, 2026
@@ -0,0 +1,104 @@
+---
+name: metadata-exposure-enrichment
+description: Enrich dbt exposure definitions by querying Metabase directly via MCP. Discovers dashboard cards, maps table and column references to dbt models, audits the existing _exposures.yml for gaps, and writes back a fully enriched exposure. Triggers include "enrich exposure", "exposure enrichment", "document dashboard", "update exposures", "what does the dashboard use".
+---
+
+# Exposure Enrichment
+
+Discovers what a Metabase dashboard actually contains and writes that context back into dbt exposures. Works with Metabase MCP and Postgres MCP only, no metadata platform required.
+
+## How It Works
+
+1. **Parse `$ARGUMENTS`** -- Dashboard name, dashboard ID (e.g. `2`), or `all`. If empty, default to `all`. Extract any optional `--dry-run` flag (report only, no file write).
+2. **Check file exists** -- Read `dbt/models/marts/_exposures.yml`. If missing, stop and output:
+   > ERROR: `dbt/models/marts/_exposures.yml` not found. Create a barebones version first with `name`, `type`, `url`, and `depends_on` fields, then re-run this skill.
+3. **Discover via Metabase MCP** -- Execute sequentially:
+   a. `metabase-list-dashboards` -- confirm the target dashboard exists and get its ID
+   b. `metabase-get-dashboard` with the dashboard ID -- extract the `dashcards` array to get all `card_id` values
+   c. For each `card_id`: call `metabase-get-question` -- collect card name, display type, and `dataset_query` (MBQL `source-table` or native SQL)
+   d. `metabase-get-database-metadata` for the database -- map internal Metabase table IDs to real table names in the `marketing` schema
+   e. `metabase-get-current-user` -- capture email for the exposure `owner` field
+4. **Cross-reference to dbt** -- For each card's source table, determine whether it maps to a dbt mart model or a raw source:
+   - Read `dbt/models/marts/` SQL files and `_marts.yml` to confirm mart models
+   - Read `dbt/models/staging/_sources.yml` to identify raw source tables
+   - Mart models become `ref('model_name')` in `depends_on`
+   - Raw source tables become `source('marketing_raw', 'table_name')` in `depends_on`
+   - Flag any table that doesn't map to either
+5. **Audit the existing exposure** -- Read `_exposures.yml` and check each exposure:
+   - Is `description` present and non-empty?
+   - Is `owner` present with name and email?
+   - Is `maturity` set?
+   - Does `depends_on` include ALL models/sources discovered in step 3?
+   - Are card-level details documented in the description?
+   - Are key columns documented?
+6. **Report** -- Print a structured summary before offering any write:
+   - Dashboard: name, URL, total card count
+   - Card inventory: ID, name, display type, source table, columns used (aggregation + breakout)
+   - Audit gaps: what `_exposures.yml` is missing vs what was discovered
+   - End with: `GAPS: N fields missing | Cards: N discovered | Models: N mapped`
+7. **Offer to enrich** -- Propose the enriched YAML and confirm with user before writing. If `--dry-run`, print proposed content only. On confirmation:
+   - Write to `dbt/models/marts/_exposures.yml`
+   - Report what changed (before vs after)
+
+## Description Format
+
+Plain text with bracketed headers. Same convention as `_marts.yml` descriptions. dbt YAML and OpenMetadata both render plain text.
+
+**Exposure description**:
+`[Business Purpose]` what decisions the dashboard drives and who uses it.
+`[Cards]` list each card: name, chart type, what it measures.
+`[Key Columns]` columns surfaced in the dashboard (aggregation columns + breakout dimensions).
+`[Data Sources]` which dbt models and sources feed the dashboard, and how.
+`[Known Issues / Caveats]` date range defaults, missing channels, filter behavior, archived cards.
+
+## Reference
+
+**Target dashboard**: "Agentic Data Modeling Demo" (ID=2), URL `http://localhost:3000/dashboard/2`
+
+**Known card inventory on dashboard 2**:
+- Card 40: ROAS (smartscalar) -- avg(roas) from campaign_performance, grouped by date
+- Card 41: CR% (smartscalar) -- avg(conversion_rate) from campaign_performance, grouped by date
+- Card 42: Target Revenue (progress) -- sum(total_revenue) from campaign_performance, grouped by date
+- Card 43: Daily Spend by Channel (bar) -- sum(spend) from campaigns_daily, grouped by date + channel
+- Card 44: Desktop Per Channel (pie) -- sum(desktop_sessions) from campaign_performance, grouped by channel
+- Card 45: Mobile Per Channel (pie) -- sum(mobile_sessions) from campaign_performance, grouped by channel
+
+**Standalone card NOT on dashboard 2** (do not include):
+- Card 38: ROAS (table, native SQL) -- archived, not on any active dashboard
+
+**Metabase table to dbt model map**:
+- `campaign_performance` -> mart model, use `ref('campaign_performance')`
+- `campaigns_daily` -> raw source table staged as `stg_campaigns_daily`, use `source('marketing_raw', 'campaigns_daily')`
+- `daily_summary` -> mart model, not directly queried by any card but is a rollup of campaign_performance
+
+**dbt mart models**: `campaign_performance`, `daily_summary`, `user_journey`, `channel_attribution`
+**Files**: `dbt/models/marts/_exposures.yml`, `dbt/models/marts/_marts.yml`
+**Sources**: `dbt/models/staging/_sources.yml` (source name: `marketing_raw`, schema: `marketing`)
+
+## Output Format
+
+```
+## Exposure Enrichment: {dashboard_name} (ID={id})
+
+### Dashboard Discovery
+- Cards on dashboard: {count}
+- Source tables: {table} ({n} cards), ...
+- dbt mapping: {table} -> {ref or source}
+
+### Card Inventory
+| Card | Name | Type | Source Table | Columns Used |
+|------|------|------|--------------|--------------|
+| ...  | ...  | ...  | ...          | ...          |
+
+### Audit: _exposures.yml gaps
+- description: {PRESENT | MISSING}
+- owner: {PRESENT | MISSING}
+- maturity: {PRESENT | MISSING}
+- depends_on: {complete | missing: list}
+- card documentation: {PRESENT | MISSING}
+
+### Proposed enrichment
+{full enriched YAML}
+
+GAPS: {n} fields missing | Cards: {n} discovered | Models: {n} mapped
+```
@@ -23,6 +23,14 @@
       "env": {
         "AUTH_HEADER": "Bearer <YOUR_OPENMETADATA_JWT_TOKEN>"
       }
+    },
+    "metabase": {
+      "command": "npx",
+      "args": ["-y", "@getnao/metabase-mcp-server@latest"],
+      "env": {
+        "METABASE_URL": "http://localhost:3000",
+        "METABASE_API_KEY": "<YOUR_METABASE_API_KEY>"
+      }
     }
   }
 }
@@ -11,6 +11,7 @@ A step by step guide on how to get started with this project.
 OPENMETADATA_JWT_TOKEN=your_jwt_token_here
 METABASE_USERNAME=your_metabase_username
 METABASE_PASSWORD=your_metabase_password
+METABASE_API_KEY=your_metabase_api_key_here
 ```
 
 2. Run the docker container:
@@ -91,6 +92,23 @@ Replace `<YOUR_OPENMETADATA_JWT_TOKEN>` in `.mcp.json` with the token you genera
 
 > **Docs**: [OpenMetadata MCP Reference](https://docs.open-metadata.org/v1.10.x/how-to-guides/mcp/reference)
 
+### Metabase MCP
+
+The `/metadata-exposure-enrichment` skill queries Metabase dashboards directly via the [nao-metabase-mcp-server](https://github.com/getnao/nao-mcp-servers). It is already configured in `.mcp.json`. You need to supply an API key.
+
+**Generate an API key:**
+1. Go to `http://localhost:3000/admin/settings/authentication/api-keys`
+2. Create a new API key
+3. Add it to your `.env`:
+
+```bash
+METABASE_API_KEY=your_api_key_here
+```
+
+Replace `<YOUR_METABASE_API_KEY>` in `.mcp.json` with your key.
+
+> **Docs**: [nao-metabase-mcp-server](https://github.com/getnao/nao-mcp-servers)
+
 ### Verify `.mcp.json`
 
 Your `.mcp.json` should look like this (already included in the repo):
@@ -121,6 +139,14 @@ Your `.mcp.json` should look like this (already included in the repo):
       "env": {
         "AUTH_HEADER": "Bearer <YOUR_OPENMETADATA_JWT_TOKEN>"
       }
+    },
+    "metabase": {
+      "command": "npx",
+      "args": ["-y", "@getnao/metabase-mcp-server@latest"],
+      "env": {
+        "METABASE_URL": "http://localhost:3000",
+        "METABASE_API_KEY": "<YOUR_METABASE_API_KEY>"
+      }
     }
   }
 }
@@ -136,6 +162,7 @@ Then use the MCP servers to ask questions such as:
 - "Who owns the Agentic Data Modeling Demo dashboard?"
 - "Is `user_journey` ready to be consumed by an AI agent?"
 - "Create a business glossary from our dbt models"
+- "Enrich the exposure for the Agentic Data Modeling Demo dashboard"
 
 ---
 

@@ -25,12 +25,13 @@ This project connects Claude to the data stack through **two MCP servers**, givi
 |---|---|---|
 | **OpenMetadata MCP** | Metadata catalog — lineage, search, glossaries, entity details | `search_metadata`, `get_entity_lineage`, `get_entity_details`, `create_glossary_term` |
 | **PostgreSQL MCP** | Direct database access — query data, profile columns, validate models | `execute_sql`, `list_tables`, `list_table_stats` |
+| **Metabase MCP** | Direct dashboard access — discover cards, questions, database metadata | `metabase-list-dashboards`, `metabase-get-dashboard`, `metabase-get-question` |
 
 The **PostgreSQL MCP** uses [Google GenAI Toolbox](https://github.com/googleapis/genai-toolbox) (pre-downloaded binary in `bin/toolbox`) to give Claude direct SQL access to the local PostgreSQL instance. This enables data profiling, edge case discovery, and validation queries — capabilities used heavily by the AI Readiness skill.
 
 The **OpenMetadata MCP** connects to the OpenMetadata server's native MCP endpoint, providing metadata search, lineage tracing, and glossary management through natural language.
 
-Both servers are configured in `.mcp.json` at the project root, with permissions managed in `.claude/settings.local.json`.
+All three servers are configured in `.mcp.json` at the project root, with permissions managed in `.claude/settings.local.json`.
 
 ### What this enables
 
@@ -43,7 +44,7 @@ Both servers are configured in `.mcp.json` at the project root, with permissions
 
 ## 🛠️ Claude Code Skills
 
-The project includes three custom **Claude Code skills** (in `.claude/skills/`) that encode repeatable data engineering workflows as slash commands. These skills combine the OpenMetadata and PostgreSQL MCP tools with local file analysis to automate common tasks:
+The project includes four custom **Claude Code skills** (in `.claude/skills/`) that encode repeatable data engineering workflows as slash commands. These skills combine the OpenMetadata and PostgreSQL MCP tools with local file analysis to automate common tasks:
 
 ### `/metadata-impact-analysis`
 Analyze downstream impact before making schema changes. Traces lineage through dbt models and dashboards to identify what breaks if a column is renamed, dropped, or its type changes.
@@ -54,6 +55,9 @@ Audit and enrich dbt mart models for AI consumption. Checks schema quality, quer
 ### `/metadata-glossary`
 Manage an OpenMetadata glossary derived from dbt models. Parses dbt YAML for column names and descriptions, groups them into business categories, and creates/syncs glossary terms via OpenMetadata.
 
+### `/metadata-exposure-enrichment`
+Enrich dbt exposure definitions by querying Metabase directly via MCP. Discovers dashboard cards, maps table and column references to dbt models, audits the existing `_exposures.yml` for gaps, and writes back a fully enriched exposure.
+
 ## 📚 Documentation
 
 This project includes comprehensive documentation to help you get started:
@@ -90,8 +94,8 @@ This setup enables a complete data analytics workflow where:
 2. dbt transforms and models the data locally
 3. Metabase provides interactive dashboards
 4. OpenMetadata centralizes metadata from all components via **YAML-based ingestion** (not UI), providing unified lineage and metadata views
-5. Claude connects via two MCP servers (OpenMetadata + PostgreSQL) for metadata exploration and direct data access
-6. Custom skills (`/metadata-impact-analysis`, `/metadata-ai-readiness`, `/metadata-glossary`) automate repeatable data engineering workflows
+5. Claude connects via three MCP servers (OpenMetadata + PostgreSQL + Metabase) for metadata exploration, direct data access, and dashboard discovery
+6. Custom skills (`/metadata-impact-analysis`, `/metadata-ai-readiness`, `/metadata-glossary`, `/metadata-exposure-enrichment`) automate repeatable data engineering workflows
 
 **Key Feature:** All OpenMetadata ingestion is configured through YAML files, enabling Infrastructure as Code (IaC) practices. Ingestion runs on-demand using Docker Compose profiles, giving you control over when metadata is synchronized. While OpenMetadata provides a UI for configuration, this project uses YAML files for version control, automation, and reproducibility.
 
@@ -101,8 +105,9 @@ This setup enables a complete data analytics workflow where:
 │   └── skills/                     # Custom Claude Code skills
 │       ├── metadata-impact-analysis/
 │       ├── metadata-ai-readiness/
-│       └── metadata-glossary/
-├── .mcp.json                       # MCP server definitions (Postgres + OpenMetadata)
+│       ├── metadata-glossary/
+│       └── metadata-exposure-enrichment/
+├── .mcp.json                       # MCP server definitions (Postgres + OpenMetadata + Metabase)
 ├── bin/
 │   └── toolbox                     # Google GenAI Toolbox binary (Postgres MCP)
 ├── dbt/                            # dbt project

@@ -0,0 +1,37 @@
+version: 2
+
+exposures:
+  - name: agentic_data_modeling_demo
+    type: dashboard
+    maturity: low
+    url: http://localhost:3000/dashboard/2
+    description: >
+      [Business Purpose] Marketing performance dashboard used to monitor campaign ROI,
+      conversion efficiency, revenue targets, and channel-level spend and device breakdown.
+      Supports daily decision-making on budget allocation and channel optimization.
+
+      [Cards] 6 cards:
+      (1) ROAS -- smartscalar showing average return on ad spend over time.
+      (2) CR% -- smartscalar showing average conversion rate over time.
+      (3) Target Revenue -- progress bar tracking cumulative revenue against a 100k goal.
+      (4) Daily Spend by Channel -- stacked bar chart of daily spend broken down by marketing channel.
+      (5) Desktop Per Channel -- pie chart of total desktop sessions by channel.
+      (6) Mobile Per Channel -- pie chart of total mobile sessions by channel.
+
+      [Key Columns] roas, conversion_rate, total_revenue, spend, desktop_sessions,
+      mobile_sessions, date, channel.
+
+      [Data Sources] campaign_performance mart (cards 1-3, 5-6) and campaigns_daily
+      raw source (card 4). daily_summary is an indirect dependency as a rollup of
+      campaign_performance.
+
+      [Known Issues / Caveats] Dashboard date filter defaults to past 7 days.
+      Card 43 (Daily Spend) queries the raw campaigns_daily source table directly
+      rather than a mart model.
+    owner:
+      name: Alejandro Aboy
+      email: aboyalejandro@gmail.com
+    depends_on:
+      - ref('campaign_performance')
+      - ref('daily_summary')
+      - source('marketing_raw', 'campaigns_daily')