This project involves processing and analyzing New-York Taxi data using Google Cloud services including Compute Engine (VMs), BigQuery, and Looker to create a dashboard for visualization.
The project is organized into the following components:
-
ETL Pipeline:
- Python scripts for extracting, transforming, and loading New-York Taxi data into BigQuery.
-
Data Analysis and Modeling:
- Jupyter Notebooks for exploratory data analysis (EDA) and data modeling using Lucid.
-
Dashboard Creation:
- Utilization of Looker to design and implement a dashboard for visualizing analytical insights.
-
Google Cloud Platform (GCP):
- Google Storage Buckets
- Compute Engine (VMs) for running data processing tasks.
- BigQuery for storing and querying large datasets.
- Looker Studio for building interactive dashboards.
-
Modern Data Pipeine Tool - https://www.mage.ai/
- The TLC trip record data for yellow and green taxis contains information such as the pick-up and drop-off times and locations, the distances traveled during the trips, itemized fare details, rate categories, payment methods, and the number of passengers reported by the driver.
- Data dictionary : https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf
- Integration of real-time data processing capabilities.
- Implementation of advanced analytics and machine learning models.
- DashBoard improvement