Skip to content

BelhsanHmida/NY-Taxi-Data-Engineering-Project

Repository files navigation

New-York Taxi Data Engineering Project

This project involves processing and analyzing New-York Taxi data using Google Cloud services including Compute Engine (VMs), BigQuery, and Looker to create a dashboard for visualization.

Project structure

Structure

The project is organized into the following components:

  • ETL Pipeline:

    • Python scripts for extracting, transforming, and loading New-York Taxi data into BigQuery.
  • Data Analysis and Modeling:

    • Jupyter Notebooks for exploratory data analysis (EDA) and data modeling using Lucid.
  • Dashboard Creation:

    • Utilization of Looker to design and implement a dashboard for visualizing analytical insights.

Tools and Technologies Used

  • Google Cloud Platform (GCP):

    • Google Storage Buckets
    • Compute Engine (VMs) for running data processing tasks.
    • BigQuery for storing and querying large datasets.
    • Looker Studio for building interactive dashboards.
  • Modern Data Pipeine Tool - https://www.mage.ai/

Dataset Used

Data Model

Data Model

ETL pipeline

ETL Pipeline

LockerStudio Dashboard

Future improvements

  • Integration of real-time data processing capabilities.
  • Implementation of advanced analytics and machine learning models.
  • DashBoard improvement

About

Data engineering project leveraging Google Cloud, VMs, mage-ai, BigQuery, and Looker to process and analyze NY-taxi, culminating in a dashboard for visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors