📌 Project Overview This project ingests two flat-file data sources from external systems into a PostgreSQL data warehouse, implementing a Medallion Architecture (Landing → Bronze → Silver → Gold) with:
-
Automated data quality checks
-
End-to-end lineage tracking
-
Documented data models (Conceptual → Logical → Physical)
-
Business-ready Gold layer
Built with:
- PostgreSQL
- Python
- Great Expectations
- Orchestration Implementation
- Set up Airflow to automate end-to-end Silver→Gold layer execution
- Configure task dependencies to ensure proper sequencing
- Add data quality gates between transformation stages
- Incremental Processing
- Implement Change Data Capture (CDC) for efficient updates
- Design merge strategies for SCD Type 2 dimensions
- Multi-Source Integration
- Phase 1: Add API-based CRM data (REST endpoints)
- Phase 2: Stream Open Source free data via Kafka
- Phase 3: Automate data pipleline components using DBT
📚 Learning Credits This project is developed with guidance from:
Architecture: Inspired by Medallion Architecture - Databricks
Tutorials: Special thanks to Data with Baraa YouTube channel for Amazing Youtube Videos:

