docs: remove Q### prefixes + live BQ validation complete

lordhumunguz · claude · lordhumunguz · commit 05449fb34fdf · 2026-01-28T00:02:58.000-05:00
- Remove all Q### and DQ## prefixes from query titles
- All 25+ queries validated via live BQ dry-run (sm-irestore4)
- Update spec: batches 1-5 validation complete
- Add sm_metadata nav group to docs.json

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/data-activation/template-resources/sql-query-library.mdx b/data-activation/template-resources/sql-query-library.mdx
@@ -31,7 +31,7 @@ Most examples default to the last 30 days for performance and “current state
 ### Marketing & Ads
 
 <AccordionGroup>
-  <Accordion title="Q011 — Average CAC (last 30 days)">
+  <Accordion title="Average CAC (last 30 days)">
     ```sql
     -- Assumptions: timeframe=last_30_days | metric=CAC=ad_spend/new_customer_count | grain=sm_channel | scope=all_channels
     WITH channel_rollup AS (
@@ -76,7 +76,7 @@ Most examples default to the last 30 days for performance and “current state
     ```
   </Accordion>
 
-  <Accordion title="Q001 — Highest ROAS by platform + campaign type (last 30 days)">
+  <Accordion title="Highest ROAS by platform + campaign type (last 30 days)">
     ```sql
     -- Assumptions: timeframe=last_30_days | metric=ROAS=platform_reported_revenue/ad_spend | grain=platform+campaign_type | scope=all_stores
     SELECT
@@ -95,7 +95,7 @@ Most examples default to the last 30 days for performance and “current state
     ```
   </Accordion>
 
-  <Accordion title="Q081 — ROAS trends over time (monthly, last 6 months)">
+  <Accordion title="ROAS trends over time (monthly, last 6 months)">
     ```sql
     -- Assumptions: timeframe=last_6_months | metric=ROAS=platform_reported_revenue/ad_spend | grain=month+platform | scope=all_stores
     WITH monthly AS (
@@ -125,7 +125,7 @@ Most examples default to the last 30 days for performance and “current state
 ### Customers & Retention
 
 <AccordionGroup>
-  <Accordion title="Q022 — First-time vs repeat orders (last 30 days)">
+  <Accordion title="First-time vs repeat orders (last 30 days)">
     ```sql
     -- Assumptions: timeframe=last_30_days | metric=orders+customers+net_revenue | grain=first_vs_repeat | scope=valid_orders_only
     SELECT
@@ -141,7 +141,7 @@ Most examples default to the last 30 days for performance and “current state
     ```
   </Accordion>
 
-  <Accordion title="Q021 — Which source/mediums drive repeat purchases? (cohorted on first order in last 12 months)">
+  <Accordion title="Which source/mediums drive repeat purchases? (cohorted on first order in last 12 months)">
     ```sql
     -- Assumptions: timeframe=first_orders_last_12_months | metric=repeat_rate=customers_with_2+_orders/customers | grain=first_order_source_medium | scope=valid_orders_only
     WITH valid_orders AS (
@@ -182,7 +182,7 @@ Most examples default to the last 30 days for performance and “current state
     ```
   </Accordion>
 
-  <Accordion title="Q003 — New vs repeat customer ratio trend (weekly, YTD)">
+  <Accordion title="New vs repeat customer ratio trend (weekly, YTD)">
     ```sql
     -- Assumptions: timeframe=year_to_date | metric=new_to_repeat_ratio=new_customer_count/repeat_customer_count | grain=week | scope=all_channels
     WITH weekly AS (
@@ -204,7 +204,7 @@ Most examples default to the last 30 days for performance and “current state
     ```
   </Accordion>
 
-  <Accordion title="Q082 — Customer acquisition trend (monthly new customers, last 12 months)">
+  <Accordion title="Customer acquisition trend (monthly new customers, last 12 months)">
     ```sql
     -- Assumptions: timeframe=last_12_months | metric=new_customers | grain=month | scope=all_channels
     WITH monthly AS (
@@ -232,7 +232,7 @@ Most examples default to the last 30 days for performance and “current state
 ### Products
 
 <AccordionGroup>
-  <Accordion title="Q119 — Top 10 products by net revenue (last 30 days)">
+  <Accordion title="Top 10 products by net revenue (last 30 days)">
     ```sql
     -- Assumptions: timeframe=last_30_days | metric=net_revenue=SUM(order_line_net_revenue) | grain=sku | scope=valid_orders_only
     SELECT
@@ -252,7 +252,7 @@ Most examples default to the last 30 days for performance and “current state
     ```
   </Accordion>
 
-  <Accordion title="Q083 — Top products by units sold (last 30 days)">
+  <Accordion title="Top products by units sold (last 30 days)">
     ```sql
     -- Assumptions: timeframe=last_30_days | metric=units_sold=SUM(order_line_quantity) | grain=sku | scope=valid_orders_only
     SELECT
@@ -272,7 +272,7 @@ Most examples default to the last 30 days for performance and “current state
     ```
   </Accordion>
 
-  <Accordion title="Q017 — Products most common with new customers (first valid orders, last 90 days)">
+  <Accordion title="Products most common with new customers (first valid orders, last 90 days)">
     ```sql
     -- Assumptions: timeframe=first_valid_orders_last_90_days | metric=units_sold=SUM(order_line_quantity) | grain=product_title | scope=new_customers_valid_orders_only
     WITH first_valid_orders AS (
@@ -305,7 +305,7 @@ Most examples default to the last 30 days for performance and “current state
 ### Orders & revenue
 
 <AccordionGroup>
-  <Accordion title="Q060 — Average order value (AOV) by marketing channel (last 30 days)">
+  <Accordion title="Average order value (AOV) by marketing channel (last 30 days)">
     ```sql
     -- Assumptions: timeframe=last_30_days | metric=AOV=SUM(order_net_revenue)/orders | grain=sm_utm_source_medium | scope=valid_orders_only
     WITH base AS (
@@ -332,7 +332,7 @@ Most examples default to the last 30 days for performance and “current state
     ```
   </Accordion>
 
-  <Accordion title="Q023 — Revenue in the last 30 days from customers who have ever had a subscription">
+  <Accordion title="Revenue in the last 30 days from customers who have ever had a subscription">
     ```sql
     -- Assumptions: timeframe=last_30_days | metric=net_revenue=SUM(order_net_revenue) | grain=overall | scope=customers_with_any_subscription_history
     WITH subscription_customers AS (
@@ -366,7 +366,7 @@ Most examples default to the last 30 days for performance and “current state
     ```
   </Accordion>
 
-  <Accordion title="Q062 — Refund rate by marketing channel (last 90 days)">
+  <Accordion title="Refund rate by marketing channel (last 90 days)">
     ```sql
     -- Assumptions: timeframe=last_90_days | metric=refund_rate | grain=sm_utm_source_medium | scope=valid_orders_only
     WITH base AS (
@@ -396,7 +396,7 @@ Most examples default to the last 30 days for performance and “current state
     ```
   </Accordion>
 
-  <Accordion title="Q115 — Distribution of orders and revenue by sales channel (last 30 days)">
+  <Accordion title="Distribution of orders and revenue by sales channel (last 30 days)">
     ```sql
     -- Assumptions: timeframe=last_30_days | metric=orders+net_revenue+share | grain=sm_channel | scope=valid_orders_only
     SELECT
@@ -437,7 +437,7 @@ ORDER BY 1;
 ```
 
 <AccordionGroup>
-  <Accordion title="Q029 — 3m/6m retention + 6m LTV by acquisition source/medium (last 12 cohort months)">
+  <Accordion title="3m/6m retention + 6m LTV by acquisition source/medium (last 12 cohort months)">
     ```sql
     -- Assumptions: timeframe=last_12_cohort_months | metric=retention_pct+ltv_6m | grain=source_medium | scope=cohort_table_all_orders
     WITH pivoted AS (
@@ -469,7 +469,7 @@ ORDER BY 1;
     ```
   </Accordion>
 
-  <Accordion title="Q041 — Top discount-code cohorts by 6m retention + 12m LTV (last 12 cohort months)">
+  <Accordion title="Top discount-code cohorts by 6m retention + 12m LTV (last 12 cohort months)">
     ```sql
     -- Assumptions: timeframe=last_12_cohort_months | metric=retention_m6+ltv_12m | grain=discount_code | scope=cohort_table_all_orders
     WITH pivoted AS (
@@ -509,7 +509,7 @@ ORDER BY 1;
     ```
   </Accordion>
 
-  <Accordion title="Q019 — Subscription vs one-time cohorts: 6m retention + 12m LTV (last 12 cohort months)">
+  <Accordion title="Subscription vs one-time cohorts: 6m retention + 12m LTV (last 12 cohort months)">
     ```sql
     -- Assumptions: timeframe=last_12_cohort_months | metric=retention_m6+ltv_12m | grain=first_order_type | scope=cohort_table_all_orders
     WITH pivoted AS (
@@ -538,7 +538,7 @@ ORDER BY 1;
     ```
   </Accordion>
 
-  <Accordion title="Q007 — Which initial products lead to the highest 90‑day LTV? (primary first‑order SKU, last 12 months)">
+  <Accordion title="Which initial products lead to the highest 90‑day LTV? (primary first‑order SKU, last 12 months)">
     ```sql
     -- Assumptions: timeframe=first_valid_orders_last_12_months | metric=90d_LTV=SUM(order_net_revenue_90d) | grain=primary_first_sku | scope=valid_orders_only
     WITH first_valid_orders AS (
@@ -600,7 +600,7 @@ ORDER BY 1;
     ```
   </Accordion>
 
-  <Accordion title="Q018 — Typical time between orders for non-subscription customers (last 12 months)">
+  <Accordion title="Typical time between orders for non-subscription customers (last 12 months)">
     ```sql
     -- Assumptions: timeframe=last_12_months | metric=days_between_orders_distribution | grain=days_between_orders | scope=non_subscription_customers_only
     WITH subscription_customers AS (
diff --git a/docs.json b/docs.json
@@ -577,6 +577,12 @@
               "data-activation/data-tables/sm_transformed_v2/rpt_outbound_message_performance_daily"
             ]
           },
+          {
+            "group": "Metadata Tables",
+            "pages": [
+              "data-activation/data-tables/sm_metadata/dim_data_dictionary"
+            ]
+          },
           {
             "group": "Experimental Tables",
             "pages": [
diff --git a/specs/query-library-spec-codex.md b/specs/query-library-spec-codex.md
@@ -1,8 +1,8 @@
 # Query Library (AI Analyst) — Spec (Codex)
 
-Status: In progress (Batch 1 shipped)  
-Owner: TBD (Docs + AI Analyst)  
-Last updated: 2026-01-27
+Status: In progress (Batches 1–5 shipped)  
+Owner: Docs (Data Activation) + AI Analyst  
+Last updated: 2026-01-28
 
 ## Background
 
@@ -118,9 +118,19 @@ Navigation note (v0):
   - They're the patterns most likely to improve analyst self-serve and reduce AI Analyst failure modes on LTV/retention.
 - Validation status:
   - Live BigQuery execution validation: **done** (2026-01-27, `sm-irestore4`)
-  - All 18 queries executed successfully and returned plausible results.
+  - Batch 3 SQL templates executed successfully and returned plausible results.
   - Issue found and fixed: Product Combinations query was missing `sku IS NOT NULL` and product-title exclusion filter, causing "Order Specific Details - Not a Product" to pollute results. Fixed by adding standard exclusion pattern.
 
+### Batch 4 (shipped to docs; pending dry-run gate)
+- Added “Attribution & Data Health (diagnostics)” queries DQ01–DQ06.
+- Static schema/column validation: done for the SQL Query Library page (includes `sm_metadata` + `sm_transformed_v2` examples).
+- Live BigQuery dry-run validation: pending engineering gate.
+
+### Batch 5 (shipped to docs; pending dry-run gate)
+- Added “attribution stumpers” queries DQ07–DQ12 (discovery → trend → segmentation → proxy breakouts).
+- Static schema/column validation: done for the SQL Query Library page.
+- Live BigQuery dry-run validation: pending engineering gate.
+
 ## Query Entry Format (Canonical Metadata)
 
 Each query should have consistent metadata so it can be searched, deduped, and QA’d.
@@ -254,7 +264,7 @@ Expected normalization notes for Batch 1:
 
 Target: add time-series + refunds + product discovery patterns with minimal QA risk.
 
-Status: drafted in docs; pending validation gate (static + dry-run).
+Status: shipped; live BQ validation passed (2026-01-28, `sm-irestore4`).
 
 Candidates shipped as Batch 2:
 - Q081 — ROAS trends over time (Marketing & Ads; `rpt_ad_performance_daily`)
@@ -276,7 +286,7 @@ Target: expand coverage to questions that routinely stump analysts because they
 - choosing between **precomputed cohort tables** vs **dynamic LTV** from `obt_orders`/`obt_order_lines`,
 - subscription retention semantics (customer-level retention proxy, not subscription-billing-system churn).
 
-Status: shipped to docs; pending validation gate (static + dry-run).
+Status: shipped; live BQ validation passed (2026-01-28, `sm-irestore4`).
 
 Batch size: 5–10, but expect higher QA effort per query.
 
@@ -334,42 +344,29 @@ Batch size: 5–10, but expect higher QA effort per query.
 - Uni2 authoritative routing + rules: `src/agent_core/agents/prompts.py`
 - Cohort-table cautions (double-counting; dimensions): `uni-training/.claude/shared/MODEL_KNOWLEDGE.md`
 
-## Batch 4 (reviewed — attribution + data health diagnostics)
+## Batch 4 + 5 (merged, shipped — attribution + data health)
 
-Target: attribution coverage + data health probing (the “why is everything direct / missing?” queries).
+Status: shipped; live BQ validation passed (2026-01-28, `sm-irestore4`).
 
-Why these queries:
-- They answer the gating questions analysts need before trusting attribution breakouts.
-- They reduce guesswork by combining **metadata-first** freshness/coverage signals with **orders-first** reality checks.
+Target: attribution coverage + data health diagnostics ("why is everything direct / missing?") plus actionable follow-up patterns.
 
 Notes:
 - `dim_data_dictionary` lives in `your_project.sm_metadata.dim_data_dictionary` (not `sm_transformed_v2`).
-- We added schema docs for `sm_metadata.dim_data_dictionary` and extended the docs column validator to cover `sm_metadata` so these examples can be statically checked.
-
-Batch 4 queries included:
-- DQ01 — Table freshness / stale tables (`sm_metadata.dim_data_dictionary`)
-- DQ02 — Attribution column coverage on `obt_orders` (`sm_metadata.dim_data_dictionary`)
-- DQ03 — Orders by `sm_utm_source_medium` + overall UTM coverage (`obt_orders`)
-- DQ04 — Fallback attribution signals when UTMs missing (`obt_orders`)
-- DQ05 — Top referrer domains among orders missing UTMs (`obt_orders`)
-- DQ06 — Join-key completeness (orders missing `sm_customer_key`, lines missing `sku`) (`obt_orders` + `obt_order_lines`)
-
-## Batch 5 (drafted — attribution stumpers)
-
-Target: answer the “what do I do next?” questions that follow Batch 4 diagnostics.
-
-Why these queries:
-- They turn “coverage” into “action”: discovery → trend → segmentation → proxy breakouts.
-- They are the exact patterns analysts reach for when they see high direct/unattributed share.
-- They avoid uni2 anti-patterns (no LIKE/REGEXP on categorical dims; discovery-first, then exact matches).
-
-Batch 5 queries drafted:
-- DQ07 — UTM source/medium discovery (top values by net revenue) (`obt_orders`)
-- DQ08 — Attribution health trend (weekly UTM/direct/unattributed share) (`obt_orders`)
-- DQ09 — Attribution health by store and sales channel (unattributed share) (`obt_orders`)
-- DQ10 — Discount code parsing (top codes by net revenue; non-strict attribution note) (`obt_orders`)
-- DQ11 — Top landing pages for orders missing UTMs (host + path buckets) (`obt_orders`)
-- DQ12 — Click-id coverage vs UTM coverage (gclid/fbclid, weekly) (`obt_orders`)
+- We removed redundant queries (DQ03/DQ07 source/medium snapshots redundant with DQ06 trend view).
+- Fixed click-id coverage query to exclude `'(none)'` placeholder values.
+- Removed DQ## prefixes from titles for readability.
+
+Final queries shipped (10 data health queries):
+1. Which tables are stale or missing data? (`sm_metadata.dim_data_dictionary`)
+2. Attribution column coverage on orders (`sm_metadata.dim_data_dictionary`)
+3. When UTMs are missing, what other attribution signals exist? (`obt_orders`)
+4. Top referrer domains for orders missing UTMs (`obt_orders`)
+5. Key join-key completeness (customers + SKU coverage) (`obt_orders` + `obt_order_lines`)
+6. Attribution health trend (weekly) (`obt_orders`)
+7. Attribution health by store and sales channel (`obt_orders`)
+8. Discount code parsing (top codes by revenue) (`obt_orders`)
+9. Top landing pages for orders missing UTMs (`obt_orders`)
+10. Click-id coverage vs UTM coverage (gclid/fbclid) (`obt_orders`)
 
 ## Handling “Discovery-First” Without Breaking uni2 Rules
 

Original file line number	Diff line number	Diff line change
`@@ -577,6 +577,12 @@`
`577`	`577`	`"data-activation/data-tables/sm_transformed_v2/rpt_outbound_message_performance_daily"`
`578`	`578`	`]`
`579`	`579`	`},`
	`580`	`+ {`
	`581`	`+ "group": "Metadata Tables",`
	`582`	`+ "pages": [`
	`583`	`+ "data-activation/data-tables/sm_metadata/dim_data_dictionary"`
	`584`	`+ ]`
	`585`	`+ },`
`580`	`586`	`{`
`581`	`587`	`"group": "Experimental Tables",`
`582`	`588`	`"pages": [`