Skip to content

Commit 05449fb

Browse files
lordhumunguzclaude
andcommitted
docs: remove Q### prefixes + live BQ validation complete
- Remove all Q### and DQ## prefixes from query titles - All 25+ queries validated via live BQ dry-run (sm-irestore4) - Update spec: batches 1-5 validation complete - Add sm_metadata nav group to docs.json Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent de31fe4 commit 05449fb

File tree

3 files changed

+59
-56
lines changed

3 files changed

+59
-56
lines changed

data-activation/template-resources/sql-query-library.mdx

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Most examples default to the last 30 days for performance and “current state
3131
### Marketing & Ads
3232

3333
<AccordionGroup>
34-
<Accordion title="Q011 — Average CAC (last 30 days)">
34+
<Accordion title="Average CAC (last 30 days)">
3535
```sql
3636
-- Assumptions: timeframe=last_30_days | metric=CAC=ad_spend/new_customer_count | grain=sm_channel | scope=all_channels
3737
WITH channel_rollup AS (
@@ -76,7 +76,7 @@ Most examples default to the last 30 days for performance and “current state
7676
```
7777
</Accordion>
7878

79-
<Accordion title="Q001 — Highest ROAS by platform + campaign type (last 30 days)">
79+
<Accordion title="Highest ROAS by platform + campaign type (last 30 days)">
8080
```sql
8181
-- Assumptions: timeframe=last_30_days | metric=ROAS=platform_reported_revenue/ad_spend | grain=platform+campaign_type | scope=all_stores
8282
SELECT
@@ -95,7 +95,7 @@ Most examples default to the last 30 days for performance and “current state
9595
```
9696
</Accordion>
9797

98-
<Accordion title="Q081 — ROAS trends over time (monthly, last 6 months)">
98+
<Accordion title="ROAS trends over time (monthly, last 6 months)">
9999
```sql
100100
-- Assumptions: timeframe=last_6_months | metric=ROAS=platform_reported_revenue/ad_spend | grain=month+platform | scope=all_stores
101101
WITH monthly AS (
@@ -125,7 +125,7 @@ Most examples default to the last 30 days for performance and “current state
125125
### Customers & Retention
126126

127127
<AccordionGroup>
128-
<Accordion title="Q022 — First-time vs repeat orders (last 30 days)">
128+
<Accordion title="First-time vs repeat orders (last 30 days)">
129129
```sql
130130
-- Assumptions: timeframe=last_30_days | metric=orders+customers+net_revenue | grain=first_vs_repeat | scope=valid_orders_only
131131
SELECT
@@ -141,7 +141,7 @@ Most examples default to the last 30 days for performance and “current state
141141
```
142142
</Accordion>
143143

144-
<Accordion title="Q021 — Which source/mediums drive repeat purchases? (cohorted on first order in last 12 months)">
144+
<Accordion title="Which source/mediums drive repeat purchases? (cohorted on first order in last 12 months)">
145145
```sql
146146
-- Assumptions: timeframe=first_orders_last_12_months | metric=repeat_rate=customers_with_2+_orders/customers | grain=first_order_source_medium | scope=valid_orders_only
147147
WITH valid_orders AS (
@@ -182,7 +182,7 @@ Most examples default to the last 30 days for performance and “current state
182182
```
183183
</Accordion>
184184

185-
<Accordion title="Q003 — New vs repeat customer ratio trend (weekly, YTD)">
185+
<Accordion title="New vs repeat customer ratio trend (weekly, YTD)">
186186
```sql
187187
-- Assumptions: timeframe=year_to_date | metric=new_to_repeat_ratio=new_customer_count/repeat_customer_count | grain=week | scope=all_channels
188188
WITH weekly AS (
@@ -204,7 +204,7 @@ Most examples default to the last 30 days for performance and “current state
204204
```
205205
</Accordion>
206206

207-
<Accordion title="Q082 — Customer acquisition trend (monthly new customers, last 12 months)">
207+
<Accordion title="Customer acquisition trend (monthly new customers, last 12 months)">
208208
```sql
209209
-- Assumptions: timeframe=last_12_months | metric=new_customers | grain=month | scope=all_channels
210210
WITH monthly AS (
@@ -232,7 +232,7 @@ Most examples default to the last 30 days for performance and “current state
232232
### Products
233233

234234
<AccordionGroup>
235-
<Accordion title="Q119 — Top 10 products by net revenue (last 30 days)">
235+
<Accordion title="Top 10 products by net revenue (last 30 days)">
236236
```sql
237237
-- Assumptions: timeframe=last_30_days | metric=net_revenue=SUM(order_line_net_revenue) | grain=sku | scope=valid_orders_only
238238
SELECT
@@ -252,7 +252,7 @@ Most examples default to the last 30 days for performance and “current state
252252
```
253253
</Accordion>
254254

255-
<Accordion title="Q083 — Top products by units sold (last 30 days)">
255+
<Accordion title="Top products by units sold (last 30 days)">
256256
```sql
257257
-- Assumptions: timeframe=last_30_days | metric=units_sold=SUM(order_line_quantity) | grain=sku | scope=valid_orders_only
258258
SELECT
@@ -272,7 +272,7 @@ Most examples default to the last 30 days for performance and “current state
272272
```
273273
</Accordion>
274274

275-
<Accordion title="Q017 — Products most common with new customers (first valid orders, last 90 days)">
275+
<Accordion title="Products most common with new customers (first valid orders, last 90 days)">
276276
```sql
277277
-- Assumptions: timeframe=first_valid_orders_last_90_days | metric=units_sold=SUM(order_line_quantity) | grain=product_title | scope=new_customers_valid_orders_only
278278
WITH first_valid_orders AS (
@@ -305,7 +305,7 @@ Most examples default to the last 30 days for performance and “current state
305305
### Orders & revenue
306306

307307
<AccordionGroup>
308-
<Accordion title="Q060 — Average order value (AOV) by marketing channel (last 30 days)">
308+
<Accordion title="Average order value (AOV) by marketing channel (last 30 days)">
309309
```sql
310310
-- Assumptions: timeframe=last_30_days | metric=AOV=SUM(order_net_revenue)/orders | grain=sm_utm_source_medium | scope=valid_orders_only
311311
WITH base AS (
@@ -332,7 +332,7 @@ Most examples default to the last 30 days for performance and “current state
332332
```
333333
</Accordion>
334334

335-
<Accordion title="Q023 — Revenue in the last 30 days from customers who have ever had a subscription">
335+
<Accordion title="Revenue in the last 30 days from customers who have ever had a subscription">
336336
```sql
337337
-- Assumptions: timeframe=last_30_days | metric=net_revenue=SUM(order_net_revenue) | grain=overall | scope=customers_with_any_subscription_history
338338
WITH subscription_customers AS (
@@ -366,7 +366,7 @@ Most examples default to the last 30 days for performance and “current state
366366
```
367367
</Accordion>
368368

369-
<Accordion title="Q062 — Refund rate by marketing channel (last 90 days)">
369+
<Accordion title="Refund rate by marketing channel (last 90 days)">
370370
```sql
371371
-- Assumptions: timeframe=last_90_days | metric=refund_rate | grain=sm_utm_source_medium | scope=valid_orders_only
372372
WITH base AS (
@@ -396,7 +396,7 @@ Most examples default to the last 30 days for performance and “current state
396396
```
397397
</Accordion>
398398

399-
<Accordion title="Q115 — Distribution of orders and revenue by sales channel (last 30 days)">
399+
<Accordion title="Distribution of orders and revenue by sales channel (last 30 days)">
400400
```sql
401401
-- Assumptions: timeframe=last_30_days | metric=orders+net_revenue+share | grain=sm_channel | scope=valid_orders_only
402402
SELECT
@@ -437,7 +437,7 @@ ORDER BY 1;
437437
```
438438

439439
<AccordionGroup>
440-
<Accordion title="Q029 — 3m/6m retention + 6m LTV by acquisition source/medium (last 12 cohort months)">
440+
<Accordion title="3m/6m retention + 6m LTV by acquisition source/medium (last 12 cohort months)">
441441
```sql
442442
-- Assumptions: timeframe=last_12_cohort_months | metric=retention_pct+ltv_6m | grain=source_medium | scope=cohort_table_all_orders
443443
WITH pivoted AS (
@@ -469,7 +469,7 @@ ORDER BY 1;
469469
```
470470
</Accordion>
471471

472-
<Accordion title="Q041 — Top discount-code cohorts by 6m retention + 12m LTV (last 12 cohort months)">
472+
<Accordion title="Top discount-code cohorts by 6m retention + 12m LTV (last 12 cohort months)">
473473
```sql
474474
-- Assumptions: timeframe=last_12_cohort_months | metric=retention_m6+ltv_12m | grain=discount_code | scope=cohort_table_all_orders
475475
WITH pivoted AS (
@@ -509,7 +509,7 @@ ORDER BY 1;
509509
```
510510
</Accordion>
511511

512-
<Accordion title="Q019 — Subscription vs one-time cohorts: 6m retention + 12m LTV (last 12 cohort months)">
512+
<Accordion title="Subscription vs one-time cohorts: 6m retention + 12m LTV (last 12 cohort months)">
513513
```sql
514514
-- Assumptions: timeframe=last_12_cohort_months | metric=retention_m6+ltv_12m | grain=first_order_type | scope=cohort_table_all_orders
515515
WITH pivoted AS (
@@ -538,7 +538,7 @@ ORDER BY 1;
538538
```
539539
</Accordion>
540540

541-
<Accordion title="Q007 — Which initial products lead to the highest 90‑day LTV? (primary first‑order SKU, last 12 months)">
541+
<Accordion title="Which initial products lead to the highest 90‑day LTV? (primary first‑order SKU, last 12 months)">
542542
```sql
543543
-- Assumptions: timeframe=first_valid_orders_last_12_months | metric=90d_LTV=SUM(order_net_revenue_90d) | grain=primary_first_sku | scope=valid_orders_only
544544
WITH first_valid_orders AS (
@@ -600,7 +600,7 @@ ORDER BY 1;
600600
```
601601
</Accordion>
602602

603-
<Accordion title="Q018 — Typical time between orders for non-subscription customers (last 12 months)">
603+
<Accordion title="Typical time between orders for non-subscription customers (last 12 months)">
604604
```sql
605605
-- Assumptions: timeframe=last_12_months | metric=days_between_orders_distribution | grain=days_between_orders | scope=non_subscription_customers_only
606606
WITH subscription_customers AS (

docs.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -577,6 +577,12 @@
577577
"data-activation/data-tables/sm_transformed_v2/rpt_outbound_message_performance_daily"
578578
]
579579
},
580+
{
581+
"group": "Metadata Tables",
582+
"pages": [
583+
"data-activation/data-tables/sm_metadata/dim_data_dictionary"
584+
]
585+
},
580586
{
581587
"group": "Experimental Tables",
582588
"pages": [

specs/query-library-spec-codex.md

Lines changed: 34 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Query Library (AI Analyst) — Spec (Codex)
22

3-
Status: In progress (Batch 1 shipped)
4-
Owner: TBD (Docs + AI Analyst)
5-
Last updated: 2026-01-27
3+
Status: In progress (Batches 1–5 shipped)
4+
Owner: Docs (Data Activation) + AI Analyst
5+
Last updated: 2026-01-28
66

77
## Background
88

@@ -118,9 +118,19 @@ Navigation note (v0):
118118
- They're the patterns most likely to improve analyst self-serve and reduce AI Analyst failure modes on LTV/retention.
119119
- Validation status:
120120
- Live BigQuery execution validation: **done** (2026-01-27, `sm-irestore4`)
121-
- All 18 queries executed successfully and returned plausible results.
121+
- Batch 3 SQL templates executed successfully and returned plausible results.
122122
- Issue found and fixed: Product Combinations query was missing `sku IS NOT NULL` and product-title exclusion filter, causing "Order Specific Details - Not a Product" to pollute results. Fixed by adding standard exclusion pattern.
123123

124+
### Batch 4 (shipped to docs; pending dry-run gate)
125+
- Added “Attribution & Data Health (diagnostics)” queries DQ01–DQ06.
126+
- Static schema/column validation: done for the SQL Query Library page (includes `sm_metadata` + `sm_transformed_v2` examples).
127+
- Live BigQuery dry-run validation: pending engineering gate.
128+
129+
### Batch 5 (shipped to docs; pending dry-run gate)
130+
- Added “attribution stumpers” queries DQ07–DQ12 (discovery → trend → segmentation → proxy breakouts).
131+
- Static schema/column validation: done for the SQL Query Library page.
132+
- Live BigQuery dry-run validation: pending engineering gate.
133+
124134
## Query Entry Format (Canonical Metadata)
125135

126136
Each query should have consistent metadata so it can be searched, deduped, and QA’d.
@@ -254,7 +264,7 @@ Expected normalization notes for Batch 1:
254264

255265
Target: add time-series + refunds + product discovery patterns with minimal QA risk.
256266

257-
Status: drafted in docs; pending validation gate (static + dry-run).
267+
Status: shipped; live BQ validation passed (2026-01-28, `sm-irestore4`).
258268

259269
Candidates shipped as Batch 2:
260270
- Q081 — ROAS trends over time (Marketing & Ads; `rpt_ad_performance_daily`)
@@ -276,7 +286,7 @@ Target: expand coverage to questions that routinely stump analysts because they
276286
- choosing between **precomputed cohort tables** vs **dynamic LTV** from `obt_orders`/`obt_order_lines`,
277287
- subscription retention semantics (customer-level retention proxy, not subscription-billing-system churn).
278288

279-
Status: shipped to docs; pending validation gate (static + dry-run).
289+
Status: shipped; live BQ validation passed (2026-01-28, `sm-irestore4`).
280290

281291
Batch size: 5–10, but expect higher QA effort per query.
282292

@@ -334,42 +344,29 @@ Batch size: 5–10, but expect higher QA effort per query.
334344
- Uni2 authoritative routing + rules: `src/agent_core/agents/prompts.py`
335345
- Cohort-table cautions (double-counting; dimensions): `uni-training/.claude/shared/MODEL_KNOWLEDGE.md`
336346

337-
## Batch 4 (reviewed — attribution + data health diagnostics)
347+
## Batch 4 + 5 (merged, shipped — attribution + data health)
338348

339-
Target: attribution coverage + data health probing (the “why is everything direct / missing?” queries).
349+
Status: shipped; live BQ validation passed (2026-01-28, `sm-irestore4`).
340350

341-
Why these queries:
342-
- They answer the gating questions analysts need before trusting attribution breakouts.
343-
- They reduce guesswork by combining **metadata-first** freshness/coverage signals with **orders-first** reality checks.
351+
Target: attribution coverage + data health diagnostics ("why is everything direct / missing?") plus actionable follow-up patterns.
344352

345353
Notes:
346354
- `dim_data_dictionary` lives in `your_project.sm_metadata.dim_data_dictionary` (not `sm_transformed_v2`).
347-
- We added schema docs for `sm_metadata.dim_data_dictionary` and extended the docs column validator to cover `sm_metadata` so these examples can be statically checked.
348-
349-
Batch 4 queries included:
350-
- DQ01 — Table freshness / stale tables (`sm_metadata.dim_data_dictionary`)
351-
- DQ02 — Attribution column coverage on `obt_orders` (`sm_metadata.dim_data_dictionary`)
352-
- DQ03 — Orders by `sm_utm_source_medium` + overall UTM coverage (`obt_orders`)
353-
- DQ04 — Fallback attribution signals when UTMs missing (`obt_orders`)
354-
- DQ05 — Top referrer domains among orders missing UTMs (`obt_orders`)
355-
- DQ06 — Join-key completeness (orders missing `sm_customer_key`, lines missing `sku`) (`obt_orders` + `obt_order_lines`)
356-
357-
## Batch 5 (drafted — attribution stumpers)
358-
359-
Target: answer the “what do I do next?” questions that follow Batch 4 diagnostics.
360-
361-
Why these queries:
362-
- They turn “coverage” into “action”: discovery → trend → segmentation → proxy breakouts.
363-
- They are the exact patterns analysts reach for when they see high direct/unattributed share.
364-
- They avoid uni2 anti-patterns (no LIKE/REGEXP on categorical dims; discovery-first, then exact matches).
365-
366-
Batch 5 queries drafted:
367-
- DQ07 — UTM source/medium discovery (top values by net revenue) (`obt_orders`)
368-
- DQ08 — Attribution health trend (weekly UTM/direct/unattributed share) (`obt_orders`)
369-
- DQ09 — Attribution health by store and sales channel (unattributed share) (`obt_orders`)
370-
- DQ10 — Discount code parsing (top codes by net revenue; non-strict attribution note) (`obt_orders`)
371-
- DQ11 — Top landing pages for orders missing UTMs (host + path buckets) (`obt_orders`)
372-
- DQ12 — Click-id coverage vs UTM coverage (gclid/fbclid, weekly) (`obt_orders`)
355+
- We removed redundant queries (DQ03/DQ07 source/medium snapshots redundant with DQ06 trend view).
356+
- Fixed click-id coverage query to exclude `'(none)'` placeholder values.
357+
- Removed DQ## prefixes from titles for readability.
358+
359+
Final queries shipped (10 data health queries):
360+
1. Which tables are stale or missing data? (`sm_metadata.dim_data_dictionary`)
361+
2. Attribution column coverage on orders (`sm_metadata.dim_data_dictionary`)
362+
3. When UTMs are missing, what other attribution signals exist? (`obt_orders`)
363+
4. Top referrer domains for orders missing UTMs (`obt_orders`)
364+
5. Key join-key completeness (customers + SKU coverage) (`obt_orders` + `obt_order_lines`)
365+
6. Attribution health trend (weekly) (`obt_orders`)
366+
7. Attribution health by store and sales channel (`obt_orders`)
367+
8. Discount code parsing (top codes by revenue) (`obt_orders`)
368+
9. Top landing pages for orders missing UTMs (`obt_orders`)
369+
10. Click-id coverage vs UTM coverage (gclid/fbclid) (`obt_orders`)
373370

374371
## Handling “Discovery-First” Without Breaking uni2 Rules
375372

0 commit comments

Comments
 (0)