This project demonstrates the design and implementation of a modern data warehouse using the Medallion Architecture, structured into Bronze, Silver, and Gold layers. The objective is to build a scalable, maintainable, and analytics-ready data platform that transforms raw source data into trusted business insights.
The project follows industry best practices in data engineering, ETL development, and dimensional modeling, making it suitable for real-world enterprise use cases as well as professional portfolios.
The data warehouse is organized into three logical layers, each serving a distinct purpose:
- Stores raw data ingested directly from source systems (CSV files, external feeds, etc.)
- Minimal to no transformations applied
- Preserves original data structure for traceability and auditing
- Serves as the system of record for downstream processing
- Performs data cleansing, standardization, and validation
- Applies business rules and data quality checks
- Resolves data type issues, duplicates, and inconsistencies
- Prepares structured datasets optimized for analytical modeling
- Represents the final analytical layer
- Implements Star Schema design with Fact and Dimension tables
- Aggregated, enriched, and optimized for high-performance queries
- Designed to support reporting, dashboards, and decision-making
The project includes end-to-end ETL pipelines that:
- Extract data from external source systems
- Load raw data into the Bronze layer
- Transform and cleanse data into the Silver layer
- Model and publish analytics-ready datasets in the Gold layer
ETL logic is implemented using SQL-based transformations, ensuring transparency, maintainability, and performance.
- Fact tables capture measurable business events
- Dimension tables provide descriptive business context
- Schema design is optimized for analytical workloads and BI tools
-
Gold-layer views and tables are directly queryable
-
Designed to support:
- Ad-hoc analysis
- SQL-based reporting
- BI dashboards
-
Enables fast, reliable access to trusted business metrics
This repository showcases hands-on expertise in:
- SQL Development
- Data Engineering
- ETL Pipeline Design
- Modern Data Architecture
- Medallion Architecture Implementation
- Dimensional Data Modeling
- Analytics & Reporting
This project is an excellent resource for:
- Data Engineers
- Data Architects
- ETL Developers
- SQL Developers
- Data Analysts
- Students building real-world data engineering portfolios
- Datasets: Access to the project dataset (csv files).
- SQL Server Express: Lightweight server for hosting your SQL database.
- SQL Server Management Studio: GUI for managing and interacting with databases.
- Notion: Get the Project Template from Notion
- Draw.io: Design data architecture, models, flows, and diagrams.
- GIT Repository: Set up a GitHub account and repository to manage, version, and collaborate on your code efficiently.
\🚀 Project Requirements Building the Data Warehouse (Data Engineering) Objective Develop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making.
Specifications
- Data Sources: Import data from two source systems (ERP and CRM) provided as CSV files.
- Data Quality: Cleanse and resolve data quality issues prior to analysis.
- Integration: Combine both sources into a single, user-friendly data model designed for analytical queries.
- Scope: Focus on the latest dataset only; historization of data is not required.
- Documentation: Provide clear documentation of the data model to support both business stakeholders and analytics teams
This implementation reflects real enterprise data warehouse patterns and demonstrates the ability to design, build, and maintain a scalable analytics platform from raw data to business insights.