Skip to content

nibble-stack/data-engineering-portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 

Repository files navigation

Data Engineering Portfolio

End‑to‑end data pipelines built with modern data engineering tools


🧰 Tech Stack

Python Airflow dbt BigQuery Docker Kafka GCP


👨‍💻 About Me

I’m transitioning into Data Engineering with a strong focus on building real-world, production-style data systems.

Instead of following tutorials, I designed pipelines that reflect how modern companies operate:

  • Batch ETL & ELT workflows
  • Data warehouse modeling
  • Real-time event streaming
  • Cloud-native ingestion
  • Fully Dockerized, reproducible environments

Each project is structured like a real engineering repository.


📚 Table of Contents

🏗 Portfolio Projects

Below are the four projects that make up my portfolio.
Each link takes you directly to the project’s README.


1️⃣ Marketing ETL Pipeline (Airflow + BigQuery + dbt)

API ingestion → Raw → Staging → Mart → Dashboard

  • Daily API extraction (Python)
  • BigQuery raw + mart layers
  • dbt transformations
  • Airflow orchestration (Dockerized)

📂 View Project: Marketing ETL Pipeline


2️⃣ E‑Commerce Data Warehouse (BigQuery + dbt)

Star schema → Fact tables → Dimensions → Cohort analysis

  • Dimensional modeling
  • dbt staging + marts
  • LTV, retention, and cohort metrics

📂 View Project: E-Commerce Data Warehouse


3️⃣ Real‑Time Event Pipeline (Kafka + Python + BigQuery)

Simulated events → Kafka producer → Consumer → Warehouse

  • Kafka streaming ingestion
  • Python consumer
  • Near real-time analytics

📂 View Project: Real-Time Event Pipeline


4️⃣ Cloud‑Native Pipeline (GCP Functions + BigQuery)

Serverless ingestion → Cloud Storage → BigQuery

  • Cloud Functions
  • Scheduled ingestion
  • Serverless transformations

📂 View Project: Cloud-Native Pipeline


🧠 Skills Demonstrated

🏗 Data Engineering

  • Workflow orchestration (Airflow)
  • ELT pipelines (Python → BigQuery → dbt)
  • Data modeling (staging, marts, star schema)
  • Streaming ingestion (Kafka)
  • Cloud-native design (GCP)

⚙️ Engineering Practices

  • Dockerized development
  • Version control (Git)
  • Dependency pinning
  • Modular code structure
  • Logging & monitoring

🔍 Data Quality

  • dbt tests (unique, not null, relationships)
  • Incremental models
  • Idempotent loads
  • Retry logic in Airflow

📊 Analytics Engineering

  • Metric definitions (CTR, CPC, ROAS, etc.)
  • Dashboard-ready tables
  • Partitioning & clustering
  • Cost optimization in BigQuery

🚀 How to Use This Portfolio

This repository is the landing page for all my data engineering work.
Each project is fully documented and reproducible.


👤 Author

Data & Marketing professional transitioning into Data Engineering.

About

Building end-to-end data pipelines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors