Jaffle Shop Expanded

A medium-sized dbt project (~1,000 models) built on top of dbt-labs/jaffle-shop. Designed as a realistic mock data warehouse for testing dbt tools, PR review workflows, and data platform capabilities.

Quick Start

# Install dependencies
uv run --with dbt-duckdb dbt deps --profiles-dir .

# Load seed data and build everything
uv run --with dbt-duckdb dbt seed --profiles-dir . --target duckdb --vars 'load_source_data: true'
uv run --with dbt-duckdb dbt build --full-refresh --profiles-dir . --target duckdb --vars 'load_source_data: true'

# Generate and serve docs
uv run --with dbt-duckdb dbt docs generate --profiles-dir . --target duckdb
uv run --with dbt-duckdb dbt docs serve --profiles-dir .

Project Stats

Resource	Count
SQL models	1,058
Schema tests	~800
Seeds (CSV)	58
Macros	22
Snapshots	2
Analyses	3
Data tests	3
Exposures	6
Groups	7
Custom generic tests	4
Docs blocks	11
Source tables	58

Business Domains

The project models a coffee chain with five business domains, each with staging, intermediate, and mart layers:

Domain	Description	Sources
Core (ecom)	Orders, customers, products, stores, supplies	6 tables
Finance	Invoices, refunds, payments, gift cards, budgets, expenses	10 tables
Supply Chain	Suppliers, purchase orders, inventory, warehouses, waste	10 tables
Marketing	Campaigns, coupons, loyalty program, email, social, referrals	10 tables
HR & Operations	Employees, shifts, payroll, training, equipment, maintenance	12 tables
Product & Menu	Recipes, ingredients, menu items, nutrition, pricing, reviews	10 tables

Model Architecture

seeds/ (58 CSVs)
  jaffle-data/        Original ecom data (935 customers, 62K orders)
  finance-data/       10 seed files
  supply-chain-data/  10 seed files
  marketing-data/     10 seed files
  hr-ops-data/        12 seed files
  product-data/       10 seed files

models/
  staging/            88 models (6 domain subdirectories + derived)
  intermediate/       194 models (13 subdirectories)
  marts/              714 models (28 subdirectories)
  utilities/          4 models (date spine, fiscal periods, calendars)

Mart Subdirectories

Directory	Prefix	Models	Description
`finance/`	`fct_`, `dim_`, `rpt_`	29	Core finance facts, dims, reports
`supply_chain/`	`fct_`, `dim_`, `rpt_`	29	Supply chain analytics
`marketing/`	`fct_`, `dim_`, `rpt_`	29	Marketing and loyalty
`hr_ops/`	`fct_`, `dim_`, `rpt_`	29	HR and operations
`product/`	`fct_`, `dim_`, `rpt_`	29	Product and menu
`cross_domain/`	`dim_`, `rpt_`, `int_`	21	Customer 360, store economics
`metrics/`	`met_`	18	Pre-aggregated time-series metrics
`scoring/`	`scr_`	6	Entity health/risk scores (0-100)
`executive/`	`exec_`	6	C-level dashboards
`cohorts/`	`coh_`	8	Cohort retention analysis
`funnels/`	`fnl_`	6	Conversion funnels
`comparisons/`	`cmp_`	8	Period/entity comparisons
`ml_features/`	`ml_`	6	ML feature store tables
`kpis/`	`kpi_`	35	Standalone KPI models
`trends/`	`trend_`	40	Trend analysis with moving averages
`rankings/`	`rank_`	30	Entity league tables
`alerts/`	`alert_`	30	Threshold-based monitoring
`distributions/`	`dist_`	25	Percentile/histogram models
`summaries/`	`sum_`	30	Pre-computed aggregates
`geo/`	`geo_`	25	Geographic/location analysis
`role_views/`	`view_`	30	Persona-specific views (CFO, COO, etc.)
`incremental/`	`inc_`	15	Incremental materializations
`reverse_etl/`	`rev_etl_`	10	Reverse ETL staging
`wide_tables/`	`wide_`	20	Denormalized BI tables
`mega_wide/`	`mega_wide_`	3	80+ column master tables
`advanced_sql/`	`adv_`	30	Advanced SQL patterns
`narrow/`	`narrow_`	25	Single-column/metric models
`analytics/`	`rpt_`	30	Cross-domain analytics
`data_quality/`	`dq_`	8	Data quality checks
`*_advanced/`	`fin_`, `sc_`, etc.	120	Deep domain analytics
`period_comparison/`	`poc_`	35	Period-over-period

Column Variety

Models range from 1 column (narrow metrics) to 80+ columns (mega-wide master tables):

Column Range	Models
1-3 columns	~60
4-8 columns	~520
9-15 columns	~410
16-30 columns	~55
31-80+ columns	~13

Advanced SQL Patterns

The advanced_sql/ directory demonstrates 30 SQL techniques:

Recursive CTEs (org hierarchy, referral trees)
GROUPING SETS / CUBE / ROLLUP
Window frame tricks (excluding current row, cumulative with reset)
Gap-and-island detection
Correlated subqueries
Array operations
NOT EXISTS patterns
Self-joins for graph/network analysis
Complex CASE business rule engines

Reusable Macros

Macro	Description
`rolling_average`	Configurable rolling window average
`safe_divide`	Division with zero/null safety
`growth_rate`	Period-over-period growth percentage
`bucket_values`	Numeric bucketing into labeled ranges
`percentile_score`	Ntile-based percentile scoring
`flag_outlier`	Standard deviation-based outlier detection
`weighted_score`	Weighted composite scores
`classify_trend`	Trend direction classification
`pivot_column`	Row-to-column pivoting
`unpivot_columns`	Column-to-row melting
`running_total`	Cumulative sum
`deduplicate`	Row-number deduplication
`surrogate_key_hash`	MD5 hash key generation
`day_of_week_number`	Cross-database day-of-week extraction

dbt Features Used

Models (view, table, incremental)
Seeds with schema routing
Sources with freshness checks
Schema tests (unique, not_null, accepted_values, relationships)
Custom generic tests (positive_value, not_in_future, valid_percentage, referential_integrity)
Data tests (custom SQL assertions)
Snapshots (SCD Type 2)
Analyses (ad-hoc compiled queries)
Exposures (dashboards, apps, ML models)
Groups (team ownership)
Docs blocks (business concept documentation)
Meta properties (PII flags, ownership)
Model contracts (enforced column types)
Tags (6 categories, 55+ unique tags)
Adapter dispatch (DuckDB + Snowflake compatible)
Semantic models and metrics (from original jaffle-shop)
Unit tests (from original jaffle-shop)

Database Support

Target	Status	Profile
DuckDB	Full build passes (1918/1918)	`--target duckdb` (default)
DuckDB (Recce base)	Same file, `base` schema	`--target duckdb-base`
DuckDB (Recce current)	Same file, `current` schema	`--target duckdb-current`
Snowflake	SQL compatible, needs credentials	`--target snowflake`

Recce Setup

Recce compares two dbt environments to catch data impact during PR review. This project includes two DuckDB targets (duckdb-base and duckdb-current) that share the same database file but use separate schemas.

# 1. Build the base environment (e.g., from the main branch)
uv run --with dbt-duckdb dbt seed --profiles-dir . --target duckdb-base --target-path target-base --vars 'load_source_data: true'
uv run --with dbt-duckdb dbt build --full-refresh --profiles-dir . --target duckdb-base --target-path target-base --vars 'load_source_data: true'
uv run --with dbt-duckdb dbt docs generate --profiles-dir . --target duckdb-base --target-path target-base

# 2. Switch to your feature branch, then build the current environment
uv run --with dbt-duckdb dbt seed --profiles-dir . --target duckdb-current --vars 'load_source_data: true'
uv run --with dbt-duckdb dbt build --full-refresh --profiles-dir . --target duckdb-current --vars 'load_source_data: true'
uv run --with dbt-duckdb dbt docs generate --profiles-dir . --target duckdb-current

# 3. Start Recce server
uv run --with recce recce server --target-base-path target-base

Pre-built base artifacts (manifest.json, catalog.json) are included in target-base/ so you can skip step 1 if you just want to explore.

Intentional SQL Mistakes

20 subtle logic errors are planted across mart models for testing PR review tools. These are syntactically valid but produce incorrect results:

Wrong join types (INNER vs LEFT)
Off-by-one date filters
Wrong aggregation (COUNT vs COUNT DISTINCT)
Double-counting from fan-out joins
Hardcoded values that go stale
Wrong column in calculations

The answer key is available for validation.

Project Structure

jaffle-shop-expand/
├── analyses/              3 ad-hoc analysis queries
├── data-tests/            3 custom data test assertions
├── dbt_project.yml        Project config with tags
├── macros/                22 reusable macros
│   └── tests/             4 custom generic tests
├── models/
│   ├── _groups.yml        Team ownership groups
│   ├── docs.md            Business concept documentation
│   ├── staging/           88 staging models (6 domains + derived)
│   ├── intermediate/      194 intermediate models
│   ├── marts/             714 mart models (28 subdirectories)
│   └── utilities/         4 utility models
├── packages.yml           dbt_utils, dbt_date, dbt_audit_helper
├── profiles.yml           DuckDB (default) + Snowflake targets
├── seeds/                 58 CSV seed files (6 directories)
└── snapshots/             2 snapshot definitions

Origin

Extended from dbt-labs/jaffle-shop (v3.0.0). The original 14 models and 6 seeds are preserved; all extensions are additive.

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
analyses		analyses
data-tests		data-tests
macros		macros
models		models
seeds		seeds
snapshots		snapshots
.gitignore		.gitignore
.sqlfluff		.sqlfluff
.sqlfluffignore		.sqlfluffignore
README.md		README.md
dbt_project.yml		dbt_project.yml
package-lock.yml		package-lock.yml
packages.yml		packages.yml
profiles.yml		profiles.yml
pyproject.toml		pyproject.toml
requirements.in		requirements.in
requirements.txt		requirements.txt
target-base		target-base
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jaffle Shop Expanded

Quick Start

Project Stats

Business Domains

Model Architecture

Mart Subdirectories

Tags

Column Variety

Advanced SQL Patterns

Reusable Macros

dbt Features Used

Database Support

Recce Setup

Intentional SQL Mistakes

Project Structure

Origin

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Jaffle Shop Expanded

Quick Start

Project Stats

Business Domains

Model Architecture

Mart Subdirectories

Tags

Column Variety

Advanced SQL Patterns

Reusable Macros

dbt Features Used

Database Support

Recce Setup

Intentional SQL Mistakes

Project Structure

Origin

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages