Skip to content

Yogesh-F1/sql-data-warehouse-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Data Warehouse and Analytics Project

Welcome to the Data Warehouse and Analytics Project repository! πŸš€

This project demonstrates a comprehensive data warehousing and analytics solution, from building a data warehouse to generating actionable insights. Designed as a portfolio project, it highlights industry best practices in data engineering and analytics.


πŸ—οΈ Data Architecture

The data architecture for this project follows Medallion Architecture with three layers: Bronze, Silver, and Gold.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              SOURCE SYSTEMS                                     β”‚
β”‚            (ERP & CRM CSV Files)                                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              BRONZE LAYER (Raw Data)                            β”‚
β”‚  β€’ Direct ingestion from CSV files                              β”‚
β”‚  β€’ Minimal transformation                                       β”‚
β”‚  β€’ Data lineage tracking                                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              SILVER LAYER (Cleansed Data)                       β”‚
β”‚  β€’ Data quality checks                                          β”‚
β”‚  β€’ Standardization & normalization                              β”‚
β”‚  β€’ Deduplication & validation                                   β”‚
β”‚  β€’ Business rule application                                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              GOLD LAYER (Analytics Ready)                       β”‚
β”‚  β€’ Star schema dimensional model                                β”‚
β”‚  β€’ Fact tables (Sales, Orders)                                  β”‚
β”‚  β€’ Dimension tables (Customers, Products, Time)                 β”‚
β”‚  β€’ Optimized for reporting & analytics                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚
                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         REPORTING & ANALYTICS LAYER                             β”‚
β”‚  β€’ SQL-based reports                                            β”‚
β”‚  β€’ Business dashboards                                          β”‚
β”‚  β€’ KPIs & metrics                                               β”‚
β”‚  β€’ Decision support                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Layer Details:

Bronze Layer: Stores raw data as-is from source systems. Data is ingested from CSV files directly into SQL Server Database without any transformation.

Silver Layer: This layer includes data cleansing, standardization, normalization, and validation processes to prepare data for analysis. Data quality issues are resolved here.

Gold Layer: Houses business-ready data modeled into a star schema required for reporting and analytics. Contains fact tables (sales transactions) and dimension tables (customers, products, dates, etc.).


πŸ“– Project Overview

This project involves:

  • Data Architecture: Designing a Modern Data Warehouse Using Medallion Architecture (Bronze, Silver, and Gold layers)
  • ETL Pipelines: Extracting, transforming, and loading data from source systems into the warehouse
  • Data Modeling: Developing fact and dimension tables optimized for analytical queries
  • Analytics & Reporting: Creating SQL-based reports and dashboards for actionable insights

🎯 Career & Skills Showcase

This repository is an excellent resource for professionals and students looking to showcase expertise in:

  • SQL Development
  • Data Architecture
  • Data Engineering
  • ETL Pipeline Development
  • Data Modeling
  • Data Analytics
  • Business Intelligence

πŸ› οΈ Important Links & Tools

Everything is FREE! Here are the resources you need:

Resource Link Description
Datasets datasets/ folder Project datasets (CSV files from ERP & CRM systems)
SQL Server Express Download Lightweight SQL Server for development
SQL Server Management Studio (SSMS) Download GUI for managing databases
GitHub Create Account Version control & collaboration
Draw.io Visit Design architecture & data flow diagrams
Notion Project Template Project planning & documentation
Visual Studio Code Download Code editor for SQL & documentation

πŸš€ Project Requirements

Building the Data Warehouse (Data Engineering)

Objective

Develop a modern data warehouse using SQL Server to consolidate sales data from multiple sources, enabling comprehensive analytical reporting and informed decision-making.

Specifications

  • Data Sources: Import data from two source systems (ERP and CRM) provided as CSV files
  • Data Quality: Cleanse and resolve data quality issues prior to analysis
  • Integration: Combine both sources into a single, user-friendly data model designed for analytical queries
  • Scope: Focus on the latest dataset only; historization of data is not required
  • Documentation: Provide clear documentation of the data model to support both business stakeholders and analytics teams

Deliverables

  • βœ… SQL scripts for ETL processes (Bronze β†’ Silver β†’ Gold)
  • βœ… Documented data models and schema
  • βœ… Data quality validation checks
  • βœ… Data catalog with field descriptions
  • βœ… Performance-optimized queries

BI Analytics & Reporting (Data Analysis)

Objective

Develop SQL-based analytics and reports to deliver detailed insights into customer behavior, product performance, and sales trends. These insights empower stakeholders with key business metrics, enabling strategic decision-making.

Analytics Focus Areas

  • Customer Behavior: Purchase patterns, customer segmentation, lifetime value
  • Product Performance: Sales by product, inventory turnover, product profitability
  • Sales Trends: Revenue trends, seasonal patterns, sales forecasting insights

Deliverables

  • βœ… SQL-based analytical queries
  • βœ… Key performance indicators (KPIs)
  • βœ… Business reports
  • βœ… Dashboard-ready datasets
  • βœ… Executive summary insights

For detailed requirements, refer to docs/requirements.md


πŸ“‚ Repository Structure

sql-data-warehouse-project/
β”‚
β”œβ”€β”€ datasets/                           # Raw datasets used for the project
β”‚   β”œβ”€β”€ erp_customers.csv              # Customer data from ERP system
β”‚   β”œβ”€β”€ erp_products.csv               # Product catalog from ERP system
β”‚   β”œβ”€β”€ erp_sales.csv                  # Sales transactions from ERP system
β”‚   β”œβ”€β”€ crm_customers.csv              # Customer data from CRM system
β”‚   └── crm_orders.csv                 # Order data from CRM system
β”‚
β”œβ”€β”€ docs/                               # Project documentation and architecture
β”‚   β”œβ”€β”€ etl.drawio                      # ETL techniques and methodologies diagram
β”‚   β”œβ”€β”€ data_architecture.drawio        # Project's overall architecture diagram
β”‚   β”œβ”€β”€ data_catalog.md                 # Catalog of datasets with field descriptions
β”‚   β”œβ”€β”€ data_flow.drawio                # Data flow diagram showing process flow
β”‚   β”œβ”€β”€ data_models.drawio              # Data models diagram (star schema)
β”‚   β”œβ”€β”€ naming-conventions.md           # Naming guidelines for tables, columns, files
β”‚   └── requirements.md                 # Detailed project requirements
β”‚
β”œβ”€β”€ scripts/                            # SQL scripts for ETL and transformations
β”‚   β”œβ”€β”€ bronze/                         # Scripts for raw data extraction & loading
β”‚   β”‚   β”œβ”€β”€ 01_create_bronze_tables.sql
β”‚   β”‚   └── 02_load_bronze_data.sql
β”‚   β”‚
β”‚   β”œβ”€β”€ silver/                         # Scripts for data cleaning & transformation
β”‚   β”‚   β”œβ”€β”€ 01_create_silver_tables.sql
β”‚   β”‚   β”œβ”€β”€ 02_data_quality_checks.sql
β”‚   β”‚   └── 03_transform_silver_data.sql
β”‚   β”‚
β”‚   └── gold/                           # Scripts for creating analytical models
β”‚       β”œβ”€β”€ 01_create_gold_tables.sql
β”‚       β”œβ”€β”€ 02_create_fact_tables.sql
β”‚       β”œβ”€β”€ 03_create_dimension_tables.sql
β”‚       └── 04_create_indexes.sql
β”‚
β”œβ”€β”€ tests/                              # Test scripts and quality validation
β”‚   β”œβ”€β”€ data_quality_tests.sql          # Data quality validation scripts
β”‚   β”œβ”€β”€ completeness_tests.sql          # Completeness checks
β”‚   └── accuracy_tests.sql              # Accuracy validation tests
β”‚
β”œβ”€β”€ README.md                           # Project overview and setup instructions
β”œβ”€β”€ LICENSE                             # MIT License information
β”œβ”€β”€ .gitignore                          # Git ignore file
└── requirements.txt                    # Python/Project dependencies

Directory Details:

datasets/ - Contains all source data in CSV format from ERP and CRM systems. These files are used for initial data ingestion into the Bronze layer.

docs/ - Complete project documentation including architecture diagrams, data catalogs, data flow diagrams, naming conventions, and detailed requirements documentation.

scripts/ - Organized SQL scripts in three subdirectories mirroring the Medallion Architecture:

  • Bronze: Raw data extraction and loading
  • Silver: Data cleansing, standardization, and validation
  • Gold: Fact and dimension table creation for analytics

tests/ - Quality assurance scripts to validate data integrity, completeness, and accuracy at each layer.


⚑ Key Features

✨ Medallion Architecture Implementation - Professional three-layer data architecture pattern following industry best practices

πŸ”„ End-to-End ETL Pipeline - Complete Extract, Transform, Load processes from source systems to analytics-ready data

πŸ“Š Star Schema Data Model - Optimized dimensional modeling for fast analytical queries

🧹 Data Quality Framework - Comprehensive validation and cleansing procedures

πŸ“ˆ Analytics Ready - Pre-built queries and datasets optimized for reporting and dashboards

πŸ“š Complete Documentation - Detailed guides, diagrams, and specifications for easy understanding and maintenance


πŸš€ Quick Start Guide

Prerequisites

  • SQL Server 2019 or later (Express edition is fine)
  • SQL Server Management Studio (SSMS)
  • Git for version control
  • CSV datasets (included in repository)

Setup Steps

Step 1: Clone the Repository

git clone https://github.com/Yogesh-F1/sql-data-warehouse-project.git
cd sql-data-warehouse-project

Step 2: Create Database

CREATE DATABASE DataWarehouse;
GO
USE DataWarehouse;

Step 3: Execute Bronze Layer Scripts

-- Run scripts in order
-- scripts/bronze/01_create_bronze_tables.sql
-- scripts/bronze/02_load_bronze_data.sql

Step 4: Execute Silver Layer Scripts

-- Run scripts in order
-- scripts/silver/01_create_silver_tables.sql
-- scripts/silver/02_data_quality_checks.sql
-- scripts/silver/03_transform_silver_data.sql

Step 5: Execute Gold Layer Scripts

-- Run scripts in order
-- scripts/gold/01_create_gold_tables.sql
-- scripts/gold/02_create_fact_tables.sql
-- scripts/gold/03_create_dimension_tables.sql
-- scripts/gold/04_create_indexes.sql

Step 6: Run Tests

-- Execute test scripts to validate data integrity
-- tests/data_quality_tests.sql
-- tests/completeness_tests.sql

Step 7: Query Analytics Use the Gold layer tables for reporting and analytics!


πŸ›‘οΈ License

This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.

Permissions:

  • βœ… Commercial use
  • βœ… Modification
  • βœ… Distribution
  • βœ… Private use

Conditions:

  • πŸ“‹ License and copyright notice must be included

See LICENSE file for full details.


🌟 About Me

Hi there! I'm Yogesh, I'm a working professional with a desire to learn and share knowledge and make working with data enjoyable, engaging, and accessible to everyone!

πŸ’‘ Interests:

  • Building scalable data solutions
  • learingn through practical, real-world projects
  • Exploring emerging data technologies

🀝 Let's Connect:


🧾 Credits & License

Attribution β€” Created by Data with Baraa β€” Baraa Khatib Salkini (YouTube: Data with Baraa). Please retain the credit line on the cover and footer when sharing publicly. πŸ™

Suggested license β€” Consider a permissive license for sharing and reuse (e.g., MIT or CC BY 4.0). Add a LICENSE file to the repository to clarify reuse terms.


Last Updated: 2026-06-10
Repository: sql-data-warehouse-project
Status: βœ… Active & Maintained


Happy Learning! Feel free to star ⭐ this repository if you found it helpful!

About

Building a modern data warehouse in SQL Server by integrating ETL processes, data modeling, and analytics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages