Fully Local Data Transformation with dbt and DuckDB

A demonstration of local data transformation and reverse ETL pipelines using DuckDB and dbt with the dbt-duckdb adapter. This project showcases how to build a complete data pipeline that processes Dutch railway services and geographic data into a structured data mart.

📖 Based on: Fully Local Data Transformation with dbt and DuckDB - DuckDB Blog Post

🚀 Features

Fully Local Processing: No cloud dependencies - all data transformation happens locally
Spatial Data Integration: Combines railway data with geographic information using DuckDB's spatial extensions
Modern Data Stack: Uses dbt for transformation
Multiple Export Options: Export to files (CSV, JSON, Parquet) or directly to PostgreSQL
Dimensional Modeling: Implements a star schema with fact and dimension tables
Incremental Processing: Supports both full refresh and incremental data loads
Data visualization: Add metabase for adhoc analysis

📊 Data Model

The project creates a dimensional model with:

dim_nl_provinces - Dutch provinces dimension table
dim_nl_municipalities - Dutch municipalities dimension table (linked to provinces)
dim_nl_train_stations - Train stations dimension table (linked to municipalities)
fact_services - Train services fact table (linked to stations)

🛠 Tech Stack

DuckDB - In-process analytical database
dbt - Data transformation framework
dbt-duckdb - DuckDB adapter for dbt
Metabase - Business intelligence tool
Spatial Extensions - For geographic data processing
PostgreSQL - Optional target for reverse ETL

📋 Prerequisites

Make sure you have:

uv: Python package and project manager (instructions)
Docker or a docker container manager (use colima for macOS)
dbeaver optional database tool to manage your database or pgadmin

🚀 Quick Start

1. Install Dependencies

make install

2. Run the Pipeline

In a terminal, we will spin up postgres and metabase

# Open a terminal on the root folder of the project
make docker-up

Now in the same terminal

# Export environment variables in the shell so dbt can use them
set -o allexport && source .env && set + allexport
### Change directory
cd railway_analytics
# Run all models
dbt run

# Run with tests
dbt run && dbt test

# Run specific model groups
dbt run --select "+transformation"
dbt run --select "+reverse_etl"
dbt run --select "+exports "

📁 Project Structure

.
├── docker
├── example.env
├── Makefile
├── pyproject.toml
├── railway_analytics
│   ├── analyses
│   ├── data
│   │   ├── exports
│   │   └── railway_analytics.duckdb
│   ├── dbt_project.yml
│   ├── macros
│   ├── models
│   │   ├── exports
│   │   ├── reverse_etl
│   │   ├── sources.yml
│   │   └── transformation
│   ├── package-lock.yml
│   ├── packages.yml
│   ├── profiles.yml
│   ├── README.md
│   ├── seeds
│   ├── snapshots
│   └── tests
├── README.md
└── uv.lock

🔧 Configuration

Data Sources

The project uses multiple data sources:

External DuckDB Database: Railway services data hosted on Cloudflare
Remote GeoJSON: Dutch provinces data from cartomap
Local GeoJSON: Dutch municipalities data included in the project

Materialization Options

table: Full refresh replacement
incremental: Append or delete+insert strategies
external: Export to external files (CSV, JSON, Parquet)
view: Creates database views

Example Model Configuration

-- Full refresh table
{{ config(materialized='table') }}

-- Incremental model
{{ config(
    materialized='incremental',
    incremental_strategy='delete+insert',
    unique_key='service_date, station_sk'
) }}

-- External file export
{{ config(
    materialized='external',
    location='data/exports/output',
    options={
        "partition_by": "year, month",
        "overwrite": True
    }
) }}

🐘 PostgreSQL Integration

To enable reverse ETL to PostgreSQL:

Add PostgreSQL extension to your profile:

extensions:
  - spatial
  - httpfs
  - postgres

Configure database attachment:

attach:
  - path: "postgresql://user:password@localhost:5432/dbname"
    type: postgres
    alias: internal_db

Run reverse ETL models:

dbt run --models +reverse_etl

🧪 Testing

The project includes comprehensive data quality tests:

# Run all tests
dbt test

# Run tests for specific models
dbt test --models dim_nl_provinces

📈 Performance

Performance characteristics on a MacBook Pro (12GB RAM):

Total execution time: 40-45 seconds
Data processed: 400MB+ from external sources
Models: 10 transformation models
Tests: 20 data quality tests
Processing: ~30 seconds for main transformations
PostgreSQL export: ~4 seconds

🎯 Use Cases

This project demonstrates:

Local data lake/warehouse creation
Spatial data processing and analysis
Modern data transformation patterns
Multi-format data export capabilities
Reverse ETL to operational databases
Data quality testing and validation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fully Local Data Transformation with dbt and DuckDB

🚀 Features

📊 Data Model

🛠 Tech Stack

📋 Prerequisites

🚀 Quick Start

1. Install Dependencies

2. Run the Pipeline

📁 Project Structure

🔧 Configuration

Data Sources

Materialization Options

Example Model Configuration

🐘 PostgreSQL Integration

🧪 Testing

📈 Performance

🎯 Use Cases

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docker		docker
logs		logs
railway_analytics		railway_analytics
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
example.env		example.env
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Fully Local Data Transformation with dbt and DuckDB

🚀 Features

📊 Data Model

🛠 Tech Stack

📋 Prerequisites

🚀 Quick Start

1. Install Dependencies

2. Run the Pipeline

📁 Project Structure

🔧 Configuration

Data Sources

Materialization Options

Example Model Configuration

🐘 PostgreSQL Integration

🧪 Testing

📈 Performance

🎯 Use Cases

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages