This dbt (Data Build Tool) project transforms raw, operational Airbnb data into a clean, tested, and documented analytics-ready dataset. It was developed as the culmination of the dbt Fundamentals course, demonstrating core competencies in modern data transformation practices.
The primary goal of this project is to model raw data from an Airbnb-style source into a structured star schema, enabling powerful analysis on hosts, listings, and guest reviews. The pipeline implements slowly changing dimensions (SCDs) for host data and efficiently handles high-volume review data with incremental modeling.
Stephen Stephan — Data Engineer & Analytics Practitioner
Jacksonville, FL | Open to remote, hybrid, or local roles
Focused on building reliable, well-documented data pipelines that serve clear business needs.
DBTLEARN is more than a technical exercise—it’s a demonstration of how modern data transformation tools like dbt can bring software engineering best practices to analytics. Every model is built for clarity, every transformation is documented, and every decision is made with the future maintainer in mind.
This project follows a medallion architecture, structuring the transformations into distinct layers for clarity and reliability.
Model Layers:
-
src(Ephemeral Models): Lightly transforms raw source data from theAIRBNB.RAWschema. This includes column renaming, basic type casting, and filtering.src_hosts: Creates a cleansed view of host data.src_listings: Creates a cleansed view of listing data, preparing theprice_strfor conversion to a numeric type.src_reviews: Creates a cleansed view of review data.
-
dim(Dimension Models): Builds reusable dimension tables using views and tables.dim_hosts_cleansed(View): A cleansed dimension of hosts, implementing business logic to handle nullhost_names.dim_listings_cleansed(View): A cleansed dimension of listings, enforcing business rules forminimum_nightsand parsing thepricefield.dim_listings_w_hosts(Table): An enriched dimension table that joins listings with host information (likeis_superhost), creating a one-stop-shop for listing attributes.
-
fct(Fact Model): Builds an incremental fact table for event-based data.fct_reviews(Incremental Table): A fact table for reviews. Implements complex incremental logic to efficiently process only new data, using a surrogate key generated from natural keys for robust unique identification.
-
mart(Data Mart): A business-specific dataset answering a particular analytical question.mart_fullmoon_reviews(Table): An example analytical mart that enriches review data with information from a seed file (seed_full_moon_dates) to analyze if review sentiment correlates with full moons.
Data Lineage: The transformation flow follows this path: raw_hosts → src_hosts → dim_hosts_cleansed -─┐ ├─> dim_listings_w_hosts raw_listings → src_listings → dim_listings_cleansed ─┘ │ raw_reviews → src_reviews → fct_reviews → mart_fullmoon_reviews │ seed_full_moon_dates ────────┘
Key architectural choices made for performance, clarity, and maintainability:
srcmodels are ephemeral to reduce warehouse clutter and avoid building unnecessary tables for simple transformations.dim_listings_w_hostsis materialized as a table to support frequent downstream joins and benefit from caching for better performance.- Incremental logic in
fct_reviewsuses conditional logic withstart_dateandend_datevariables to allow for targeted, scoped refreshes of data, optimizing run times and cost. - Full moon enrichment is offset by one day (
DATEADD(DAY, 1, fm.full_moon_date)) to reflect the hypothesis that sentiment spikes may occur after the event itself.
- Data Transformation: dbt Core 1.9.0
- Data Warehouse: Snowflake
- Source Data:
AIRBNB.RAWschema (RAW_HOSTS, RAW_LISTINGS, RAW_REVIEWS) - Development: SQL, Jinja, YAML
This project leverages the power of dbt's package ecosystem:
dbt_utils: Used for reliable surrogate key generation (generate_surrogate_key) to ensure unique identifiers for fact records.dbt_expectations(Suggested): This package provides declarative data tests, enhancing maintainability and clarity for data quality checks.dbt_date(Suggested): Useful for date spine generation and advanced date logic, complementing time-based analysis.
dbtlearn/ ├── 📂 models/ │ ├── 📂 staging/ # Source models │ │ ├── src_hosts.sql │ │ ├── src_listings.sql │ │ ├── src_reviews.sql │ │ └── sources.yml # Defines the raw source tables │ ├── 📂 dim/ # Dimension models │ │ ├── dim_hosts_cleansed.sql │ │ ├── dim_listings_cleansed.sql │ │ └── dim_listings_w_hosts.sql │ ├── 📂 fct/ # Fact models │ │ └── fct_reviews.sql │ └── 📂 mart/ # Data marts │ └── mart_fullmoon_reviews.sql ├── 📂 seeds/ # Static data files │ └── full_moon_dates.csv ├── 📂 snapshots/ # Slowly Changing Dimensions (SCDs) │ └── scd_raw_hosts.sql ├── 📂 macros/ # Custom Jinja macros ├── 📂 tests/ # Custom data tests ├── 📂 analysis/ # Ad-hoc analysis queries ├── dbt_project.yml # Main configuration file └── packages.yml # Defines dbt utility packages
- Access to a Snowflake data warehouse with the
AIRBNB.RAWschema populated. - dbt Core installed and configured on your machine.
- A configured
~/.dbt/profiles.ymlfile pointing to your Snowflake dev environment.
- Clone this repository:
git clone <your-repo-url> cd dbtlearn
- Install dbt dependencies (includes
dbt_utils):dbt deps
- Validate your connection:
dbt debug
- Run all models:
dbt run dbt run --select dim/* dbt test dbt docs generate dbt docs serve Navigate to http://localhost:8080 to explore complete documentation and data lineage.
📜 Certification This project was completed as part of the requirements for the official dbt Fundamentals Certification.
Note: This project uses a simulated Airbnb dataset for educational purposes.