Welcome to the Data Warehouse and Analytics Project repository! π
This project demonstrates a complete end-to-end data solution β from designing a modern SQL data warehouse using Medallion Architecture to delivering actionable business insights through an interactive Power BI dashboard.
It also includes a comprehensive collection of SQL scripts for data exploration, analytics, and reporting. These scripts cover various analyses such as database exploration, measures and metrics, time-based trends, cumulative analytics, segmentation, and more. This repository contains SQL queries designed to help data analysts and BI professionals quickly explore, segment, and analyze data within a relational database. Each script focuses on a specific analytical theme and demonstrates best practices for SQL queries.
Designed as a portfolio-grade project, it showcases best practices in:
- Data Engineering
- ETL Pipelines
- Dimensional Modeling
- SQL Analytics
- Power BI Visualization
- Business Intelligence
The data architecture for this project follows Medallion Architecture Bronze, Silver, and Gold layers:

- Bronze Layer: Stores raw data as-is from the source systems. Data is ingested from CSV Files into SQL Server Database.
- Silver Layer: This layer includes data cleansing, standardization, and normalization processes to prepare data for analysis.
- Gold Layer: Houses business-ready data modeled into a star schema required for reporting and analytics.
This project involves:
- Data Architecture: Designing a Modern Data Warehouse Using Medallion Architecture Bronze, Silver, and Gold layers.
- ETL Pipelines: Extracting, transforming, and loading data from source systems into the warehouse.
- Data Modeling: Developing fact and dimension tables optimized for analytical queries.
- Analytics & Reporting: Creating SQL-based reports and dashboards for actionable insights.
- Power BI Dashboarding: Transforming Gold Layer data into interactive executive, customer, and product dashboards.
π― This repository is an excellent resource for professionals and students looking to showcase expertise in:
- SQL Development
- Data Architect
- Data Engineering
- ETL Pipeline Developer
- Data Modeling
- Data Analytics
- Power BI Dashboard Development
- DAX
- Business Intelligence
- Data Visualization
Everything is for Free!
- Datasets: Access to the project dataset (csv files).
- SQL Server Express: Lightweight server for hosting your SQL database.
- SQL Server Management Studio (SSMS): GUI for managing and interacting with databases.
- Git Repository: Set up a GitHub account and repository to manage, version, and collaborate on your code efficiently.
- Microsoft PowerBI Desktop: Design interactive dashboards for analysis and insights
- DrawIO: Design data architecture, models, flows, and diagrams.
- Notion: Get the Project Template from Notion
- Notion Project Steps: Access to All Project Phases and Tasks.
Develop a modern data warehouse using SQL Server to consolidate sales data, enabling analytical reporting and informed decision-making.
- Data Sources: Import data from two source systems (ERP and CRM) provided as CSV files.
- Data Quality: Cleanse and resolve data quality issues prior to analysis.
- Integration: Combine both sources into a single, user-friendly data model designed for analytical queries.
- Scope: Focus on the latest dataset only; historization of data is not required.
- Documentation: Provide clear documentation of the data model to support both business stakeholders and analytics teams.
Develop SQL-based analytics to deliver detailed insights into:
- Customer Behavior
- Product Performance
- Sales Trends
These insights empower stakeholders with key business metrics, enabling strategic decision-making.
This project includes an interactive Power BI dashboard built on the Gold Layer star schema to transform warehouse data into business-ready insights.
After building the Bronze, Silver, and Gold layers, this project includes SQL-based EDA to uncover business insights:
- Database structure exploration
- Customer demographics
- Product category analysis
- Sales performance metrics
- Revenue contribution by country/category/customer
- Top & bottom product rankings
- Customer segmentation
data-warehouse-project/
β
βββ datasets/ # Raw datasets used for the project (ERP and CRM data)
β
βββ docs/ # Project documentation and architecture details
β βββ data_architecture.png # Snapshot of Draw.io file showing the project's architecture
β βββ data_catalog.md # Catalog of datasets, including field descriptions and metadata
β βββ data_flow.png # Snapshot of Draw.io file for the data flow diagram
β βββ data_models.drawio # Snapshot of Draw.io file for data models (star schema)
β βββ data_integration.pmg # Snapshot of Draw.io file for data integration
β βββ naming-conventions.md # Consistent naming guidelines for tables, columns, and files
β
βββ scripts/ # SQL scripts for ETL and transformations
β βββ bronze/ # Scripts for extracting and loading raw data
β βββ silver/ # Scripts for cleaning and transforming data
β βββ gold/ # Scripts for creating analytical models
β βββ analysis/ # Script for exploratory data analysis
β βββ reports/ # Scripts for consolidated reports on customer and product key insights
β
βββ tests/ # Test scripts and quality files
β
βββ dashboard/ # Power BI Dashboard for sales, customer & product analysis
βββsales_dashboard.pbix # PowerBI(.pbix) file for dashboard
βββdashboard_screenshots/ # Snapshots of the dashboard pages
βββREADME.md # Dashboard overview
β
βββ README.md # Project overview and instructions
βββ LICENSE # License information for the repository
βββ .gitignore # Files and directories to be ignored by Git
- Identified top-performing customers by lifetime revenue
- Segmented customers into value tiers for retention strategy
- Evaluated product category performance and contribution
- Tracked revenue patterns across multiple years
- Compared product volume vs profitability for strategic planning
This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.
Hi there! I'm Akarsh Kapoor. I'm a curious data enthusiast who loves exploring SQL, analytics, and real-world problem solving. I enjoy turning raw data into useful insights while building projects that make learning data both practical and fun!
Let's stay in touch! Feel free to connect with me :
