Skip to content

Latest commit

 

History

History
150 lines (101 loc) · 4.36 KB

File metadata and controls

150 lines (101 loc) · 4.36 KB

Data Engineering Portfolio

End‑to‑end data pipelines built with modern data engineering tools


🧰 Tech Stack

Python Airflow dbt BigQuery Docker Kafka GCP


👨‍💻 About Me

I’m transitioning into Data Engineering with a strong focus on building real-world, production-style data systems.

Instead of following tutorials, I designed pipelines that reflect how modern companies operate:

  • Batch ETL & ELT workflows
  • Data warehouse modeling
  • Real-time event streaming
  • Cloud-native ingestion
  • Fully Dockerized, reproducible environments

Each project is structured like a real engineering repository.


📚 Table of Contents

🏗 Portfolio Projects

Below are the four projects that make up my portfolio.
Each link takes you directly to the project’s README.


1️⃣ Marketing ETL Pipeline (Airflow + BigQuery + dbt)

API ingestion → Raw → Staging → Mart → Dashboard

  • Daily API extraction (Python)
  • BigQuery raw + mart layers
  • dbt transformations
  • Airflow orchestration (Dockerized)

📂 View Project: Marketing ETL Pipeline


2️⃣ E‑Commerce Data Warehouse (BigQuery + dbt)

Star schema → Fact tables → Dimensions → Cohort analysis

  • Dimensional modeling
  • dbt staging + marts
  • LTV, retention, and cohort metrics

📂 View Project: E-Commerce Data Warehouse


3️⃣ Real‑Time Event Pipeline (Kafka + Python + BigQuery)

Simulated events → Kafka producer → Consumer → Warehouse

  • Kafka streaming ingestion
  • Python consumer
  • Near real-time analytics

📂 View Project: Real-Time Event Pipeline


4️⃣ Cloud‑Native Pipeline (GCP Functions + BigQuery)

Serverless ingestion → Cloud Storage → BigQuery

  • Cloud Functions
  • Scheduled ingestion
  • Serverless transformations

📂 View Project: Cloud-Native Pipeline


🧠 Skills Demonstrated

🏗 Data Engineering

  • Workflow orchestration (Airflow)
  • ELT pipelines (Python → BigQuery → dbt)
  • Data modeling (staging, marts, star schema)
  • Streaming ingestion (Kafka)
  • Cloud-native design (GCP)

⚙️ Engineering Practices

  • Dockerized development
  • Version control (Git)
  • Dependency pinning
  • Modular code structure
  • Logging & monitoring

🔍 Data Quality

  • dbt tests (unique, not null, relationships)
  • Incremental models
  • Idempotent loads
  • Retry logic in Airflow

📊 Analytics Engineering

  • Metric definitions (CTR, CPC, ROAS, etc.)
  • Dashboard-ready tables
  • Partitioning & clustering
  • Cost optimization in BigQuery

🚀 How to Use This Portfolio

This repository is the landing page for all my data engineering work.
Each project is fully documented and reproducible.


👤 Author

Data & Marketing professional transitioning into Data Engineering.