Welcome to the Data Warehouse and Analytics Project repository! π
This project demonstrates a comprehensive data warehousing and analytics solution, from building a data warehouse to generating actionable insights. Designed as a portfolio project, it highlights industry best practices in data engineering and analytics.
The data architecture for this project follows Medallion Architecture with three layers: Bronze, Silver, and Gold.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SOURCE SYSTEMS β
β (ERP & CRM CSV Files) β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BRONZE LAYER (Raw Data) β
β β’ Direct ingestion from CSV files β
β β’ Minimal transformation β
β β’ Data lineage tracking β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SILVER LAYER (Cleansed Data) β
β β’ Data quality checks β
β β’ Standardization & normalization β
β β’ Deduplication & validation β
β β’ Business rule application β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GOLD LAYER (Analytics Ready) β
β β’ Star schema dimensional model β
β β’ Fact tables (Sales, Orders) β
β β’ Dimension tables (Customers, Products, Time) β
β β’ Optimized for reporting & analytics β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REPORTING & ANALYTICS LAYER β
β β’ SQL-based reports β
β β’ Business dashboards β
β β’ KPIs & metrics β
β β’ Decision support β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Bronze Layer: Stores raw data as-is from source systems. Data is ingested from CSV files directly into SQL Server Database without any transformation.
Silver Layer: This layer includes data cleansing, standardization, normalization, and validation processes to prepare data for analysis. Data quality issues are resolved here.
Gold Layer: Houses business-ready data modeled into a star schema required for reporting and analytics. Contains fact tables (sales transactions) and dimension tables (customers, products, dates, etc.).
This project involves:
- Data Architecture: Designing a Modern Data Warehouse Using Medallion Architecture (Bronze, Silver, and Gold layers)
- ETL Pipelines: Extracting, transforming, and loading data from source systems into the warehouse
- Data Modeling: Developing fact and dimension tables optimized for analytical queries
- Analytics & Reporting: Creating SQL-based reports and dashboards for actionable insights
This repository is an excellent resource for professionals and students looking to showcase expertise in:
- SQL Development
- Data Architecture
- Data Engineering
- ETL Pipeline Development
- Data Modeling
- Data Analytics
- Business Intelligence
Everything is FREE! Here are the resources you need:
| Resource | Link | Description |
|---|---|---|
| Datasets | datasets/ folder |
Project datasets (CSV files from ERP & CRM systems) |
| SQL Server Express | Download | Lightweight SQL Server for development |
| SQL Server Management Studio (SSMS) | Download | GUI for managing databases |
| GitHub | Create Account | Version control & collaboration |
| Draw.io | Visit | Design architecture & data flow diagrams |
| Notion | Project Template | Project planning & documentation |
| Visual Studio Code | Download | Code editor for SQL & documentation |
Develop a modern data warehouse using SQL Server to consolidate sales data from multiple sources, enabling comprehensive analytical reporting and informed decision-making.
- Data Sources: Import data from two source systems (ERP and CRM) provided as CSV files
- Data Quality: Cleanse and resolve data quality issues prior to analysis
- Integration: Combine both sources into a single, user-friendly data model designed for analytical queries
- Scope: Focus on the latest dataset only; historization of data is not required
- Documentation: Provide clear documentation of the data model to support both business stakeholders and analytics teams
- β SQL scripts for ETL processes (Bronze β Silver β Gold)
- β Documented data models and schema
- β Data quality validation checks
- β Data catalog with field descriptions
- β Performance-optimized queries
Develop SQL-based analytics and reports to deliver detailed insights into customer behavior, product performance, and sales trends. These insights empower stakeholders with key business metrics, enabling strategic decision-making.
- Customer Behavior: Purchase patterns, customer segmentation, lifetime value
- Product Performance: Sales by product, inventory turnover, product profitability
- Sales Trends: Revenue trends, seasonal patterns, sales forecasting insights
- β SQL-based analytical queries
- β Key performance indicators (KPIs)
- β Business reports
- β Dashboard-ready datasets
- β Executive summary insights
For detailed requirements, refer to docs/requirements.md
sql-data-warehouse-project/
β
βββ datasets/ # Raw datasets used for the project
β βββ erp_customers.csv # Customer data from ERP system
β βββ erp_products.csv # Product catalog from ERP system
β βββ erp_sales.csv # Sales transactions from ERP system
β βββ crm_customers.csv # Customer data from CRM system
β βββ crm_orders.csv # Order data from CRM system
β
βββ docs/ # Project documentation and architecture
β βββ etl.drawio # ETL techniques and methodologies diagram
β βββ data_architecture.drawio # Project's overall architecture diagram
β βββ data_catalog.md # Catalog of datasets with field descriptions
β βββ data_flow.drawio # Data flow diagram showing process flow
β βββ data_models.drawio # Data models diagram (star schema)
β βββ naming-conventions.md # Naming guidelines for tables, columns, files
β βββ requirements.md # Detailed project requirements
β
βββ scripts/ # SQL scripts for ETL and transformations
β βββ bronze/ # Scripts for raw data extraction & loading
β β βββ 01_create_bronze_tables.sql
β β βββ 02_load_bronze_data.sql
β β
β βββ silver/ # Scripts for data cleaning & transformation
β β βββ 01_create_silver_tables.sql
β β βββ 02_data_quality_checks.sql
β β βββ 03_transform_silver_data.sql
β β
β βββ gold/ # Scripts for creating analytical models
β βββ 01_create_gold_tables.sql
β βββ 02_create_fact_tables.sql
β βββ 03_create_dimension_tables.sql
β βββ 04_create_indexes.sql
β
βββ tests/ # Test scripts and quality validation
β βββ data_quality_tests.sql # Data quality validation scripts
β βββ completeness_tests.sql # Completeness checks
β βββ accuracy_tests.sql # Accuracy validation tests
β
βββ README.md # Project overview and setup instructions
βββ LICENSE # MIT License information
βββ .gitignore # Git ignore file
βββ requirements.txt # Python/Project dependencies
datasets/ - Contains all source data in CSV format from ERP and CRM systems. These files are used for initial data ingestion into the Bronze layer.
docs/ - Complete project documentation including architecture diagrams, data catalogs, data flow diagrams, naming conventions, and detailed requirements documentation.
scripts/ - Organized SQL scripts in three subdirectories mirroring the Medallion Architecture:
- Bronze: Raw data extraction and loading
- Silver: Data cleansing, standardization, and validation
- Gold: Fact and dimension table creation for analytics
tests/ - Quality assurance scripts to validate data integrity, completeness, and accuracy at each layer.
β¨ Medallion Architecture Implementation - Professional three-layer data architecture pattern following industry best practices
π End-to-End ETL Pipeline - Complete Extract, Transform, Load processes from source systems to analytics-ready data
π Star Schema Data Model - Optimized dimensional modeling for fast analytical queries
π§Ή Data Quality Framework - Comprehensive validation and cleansing procedures
π Analytics Ready - Pre-built queries and datasets optimized for reporting and dashboards
π Complete Documentation - Detailed guides, diagrams, and specifications for easy understanding and maintenance
- SQL Server 2019 or later (Express edition is fine)
- SQL Server Management Studio (SSMS)
- Git for version control
- CSV datasets (included in repository)
Step 1: Clone the Repository
git clone https://github.com/Yogesh-F1/sql-data-warehouse-project.git
cd sql-data-warehouse-projectStep 2: Create Database
CREATE DATABASE DataWarehouse;
GO
USE DataWarehouse;Step 3: Execute Bronze Layer Scripts
-- Run scripts in order
-- scripts/bronze/01_create_bronze_tables.sql
-- scripts/bronze/02_load_bronze_data.sqlStep 4: Execute Silver Layer Scripts
-- Run scripts in order
-- scripts/silver/01_create_silver_tables.sql
-- scripts/silver/02_data_quality_checks.sql
-- scripts/silver/03_transform_silver_data.sqlStep 5: Execute Gold Layer Scripts
-- Run scripts in order
-- scripts/gold/01_create_gold_tables.sql
-- scripts/gold/02_create_fact_tables.sql
-- scripts/gold/03_create_dimension_tables.sql
-- scripts/gold/04_create_indexes.sqlStep 6: Run Tests
-- Execute test scripts to validate data integrity
-- tests/data_quality_tests.sql
-- tests/completeness_tests.sqlStep 7: Query Analytics Use the Gold layer tables for reporting and analytics!
This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.
- β Commercial use
- β Modification
- β Distribution
- β Private use
- π License and copyright notice must be included
See LICENSE file for full details.
Hi there! I'm Yogesh, I'm a working professional with a desire to learn and share knowledge and make working with data enjoyable, engaging, and accessible to everyone!
- Building scalable data solutions
- learingn through practical, real-world projects
- Exploring emerging data technologies
- πΌ LinkedIn: [linkedin.com/in/yogesh-hanwate]
- π Portfolio: [yogeshhanwate.com]
Attribution β Created by Data with Baraa β Baraa Khatib Salkini (YouTube: Data with Baraa). Please retain the credit line on the cover and footer when sharing publicly. π
Suggested license β Consider a permissive license for sharing and reuse (e.g., MIT or CC BY 4.0). Add a LICENSE file to the repository to clarify reuse terms.
Last Updated: 2026-06-10
Repository: sql-data-warehouse-project
Status: β
Active & Maintained
Happy Learning! Feel free to star β this repository if you found it helpful!