Skip to content

ChojoGebutsu/PostgreSQL-Data-Warehouse-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PostgreSQL Data Warehouse Project

This project showcases a complete data warehousing and analytics solution, including building a PostgreSQL data warehouse and generating insights from integrated data sources. It is intended as a portfolio project and demonstrates practical data engineering and analytics skills.

Credits: Project inspired by and based on content from Data with Baraa.


📖 Project Overview

This project includes:

  1. Data Architecture: Building a data warehouse using the Medallion Architecture with Bronze, Silver, and Gold layers.
  2. ETL Pipelines: Extracting, transforming, and loading data from multiple source systems into the warehouse.
  3. Data Modeling: Designing fact and dimension tables for efficient analytical queries.
  4. Data Analysis: Using SQL queries to generate insights.

🛠️ Tools Used

  • Datasets: Access the project datasets (CSV files).
  • PostgreSQL: Open-source relational database for hosting your data warehouse.
  • VS Code: A go-to GUI for PostgreSQL database management and executing SQL queries.
  • Notion Project Steps: Access all project phases and tasks.

📂 Repository Structure

PostgreSQL-Data-Warehouse-Project/
│
├── datasets/ # Raw datasets used for the project (ERP and CRM data)
│
├── docs/ # Project documentation and architecture details
│ ├── ETL_process.png # A diagram created in Draw.io showing ETL techniques and methods used in this project
│ ├── data_architecture.png # A diagram created in Draw.io showing the project's architecture
│ ├── data_catalogue.md # Catalog of datasets, including field descriptions and metadata
│ ├── data_flow.png # A diagram created in Draw.io for the data flow diagram
│ ├── data_integration_model.png # A diagram created in Draw.io that shows how tables are related to each other.
│ ├── gold_layer_data_model.png # A diagram created in Draw.io that shows gold layer data model (star schema)
│ ├── naming-conventions.md # Consistent naming guidelines for tables, columns, and files
│
├── scripts/ # SQL scripts for ETL and transformations
│ ├── bronze/ # Scripts for extracting and loading raw data
│ ├── silver/ # Scripts for cleaning and transforming data
│ ├── gold/ # Scripts for creating analytical models
│
├── tests/ # Test scripts and quality files
│
├── README.md # Project overview and instructions
├── LICENSE # License information for the repository
├── .gitignore # Files and directories to be ignored by Git

🚀 Project Requirements

Building the Data Warehouse

Objective

Build a PostgreSQL data warehouse that combines sales data for reporting and analysis.

Tasks

  • Load data from two CSV sources (ERP and CRM).
  • Clean and handle data quality issues before analysis.
  • Integrate both sources into a single, analytics-friendly data model.
  • Use only the latest dataset (no historical tracking required).
  • Document the data model for business users and analytics teams.

BI: Analytics & Reporting (Data Analysis)

Objective

Develop SQL-based analytics to deliver granular insights into:

  • Customer Behavior
  • Product Performance
  • Sales Trends

These insights empower stakeholders with critical business metrics, enabling strategic decision-making.


🏗️ Data Architecture

The data architecture for this project adheres to the Medallion Architecture, comprising Bronze, Silver, and Gold layers:

  1. Bronze Layer: Stores raw data in its original form from the source systems. Data is ingested from CSV files into the PostgreSQL database.
  2. Silver Layer: Involves data cleansing, standardization, and normalization processes to prepare data for analysis.
  3. Gold Layer: Hosts business-ready data modeled into a star schema for reporting and analytics.

Data Architecture



🛡️ License

This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.

About

Building a Data Warehouse with PostgreSQL, including ETL processes, Data Modeling, and Analytics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors