Welcome to the Data Warehouse Pipeline with Medallion Architecture repository !

A PostgreSQL-based implementation following modern data engineering best practices

High Level Architecture of the ETL pipeline Flow

📌 Project Overview This project ingests two flat-file data sources from external systems into a PostgreSQL data warehouse, implementing a Medallion Architecture (Landing → Bronze → Silver → Gold) with:

Automated data quality checks
End-to-end lineage tracking
Documented data models (Conceptual → Logical → Physical)
Business-ready Gold layer

Medallion Architecture Flow

Built with:

PostgreSQL
Python
Great Expectations

🚀 Next Steps & Roadmap

Pipeline Enhancement

Orchestration Implementation

Set up Airflow to automate end-to-end Silver→Gold layer execution
Configure task dependencies to ensure proper sequencing
Add data quality gates between transformation stages

Incremental Processing

Implement Change Data Capture (CDC) for efficient updates
Design merge strategies for SCD Type 2 dimensions

Multi-Source Integration

Phase 1: Add API-based CRM data (REST endpoints)
Phase 2: Stream Open Source free data via Kafka
Phase 3: Automate data pipleline components using DBT

📚 Learning Credits This project is developed with guidance from:

Architecture: Inspired by Medallion Architecture - Databricks

Tutorials: Special thanks to Data with Baraa YouTube channel for Amazing Youtube Videos:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
datasets		datasets
scripts		scripts
tests		tests
.DS_Store		.DS_Store
ETL_pipeline_Flow.png		ETL_pipeline_Flow.png
High_level_Architecture.png		High_level_Architecture.png
README.md		README.md
create_project_structure.py		create_project_structure.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the Data Warehouse Pipeline with Medallion Architecture repository !

A PostgreSQL-based implementation following modern data engineering best practices

High Level Architecture of the ETL pipeline Flow

Medallion Architecture Flow

🚀 Next Steps & Roadmap

Pipeline Enhancement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Welcome to the Data Warehouse Pipeline with Medallion Architecture repository !

A PostgreSQL-based implementation following modern data engineering best practices

High Level Architecture of the ETL pipeline Flow

Medallion Architecture Flow

🚀 Next Steps & Roadmap

Pipeline Enhancement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages