Skip to content

aryxnn/customer_behaviour_analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer Behaviour Data Pipeline & Analytics

An end-to-end data engineering + analytics project that transforms raw customer shopping data into actionable business insights using Python, MySQL, and Power BI.


Dashboard Preview

Dashboard

Interactive Power BI dashboard showing KPIs, customer segmentation, and revenue insights.


Project Overview

This project demonstrates a complete data pipeline:

  • Raw data ingestion from CSV
  • Data cleaning & preprocessing using Python (Pandas)
  • Data storage and querying using MySQL
  • Interactive dashboard creation using Power BI

The objective is to analyze customer purchasing behavior, revenue trends, and business performance.


Tech Stack

  • Python (Pandas) → Data cleaning & transformation
  • MySQL → Data storage & analytical queries
  • SQLAlchemy → Database connection
  • Power BI → Data visualization
  • Jupyter Notebook → Development environment

Project Structure

 Customer-Behaviour-Analytics
├── data/
│   └── cust_shopping_behavior.csv
├── notebooks/
│   └── project.ipynb
├── sql/
│   └── customer_behaviour.sql
├── dashboard/
│   └── customer_behaviour_dashboard.pbix
├── assets/
│   └── dashboard.png
├── README.md

Data Pipeline Workflow

1. Data Cleaning (Python)

  • Handled missing values (median imputation)
  • Converted data types
  • Ensured data consistency and quality

2. Data Migration (MySQL)

  • Created database: customer_behaviour
  • Loaded cleaned data into SQL tables
  • Used SQLAlchemy for seamless integration

3. Data Analysis (SQL)

Revenue by Gender

SELECT gender, SUM(purchase_amount) AS revenue
FROM customer
GROUP BY gender;

Top Rated Products

SELECT item_purchased, ROUND(AVG(review_rating),2)
FROM customer
GROUP BY item_purchased
ORDER BY AVG(review_rating) DESC
LIMIT 5;

Discount Effectiveness

SELECT item_purchased,
ROUND(100.0 * SUM(CASE WHEN discount_applied='Yes' THEN 1 ELSE 0 END)/COUNT(*),2)
FROM customer
GROUP BY item_purchased
ORDER BY 2 DESC
LIMIT 5;

More queries available in: sql/customer_behaviour.sql


Key Insights

  • Young adults generate the highest revenue
  • Clothing category dominates both revenue & sales
  • Discounts significantly influence purchasing behavior
  • Subscription status impacts customer spending patterns

How to Run

1. Clone Repository

git clone https://github.com/your-username/customer-behaviour-analytics.git
cd customer-behaviour-analytics

2. Install Dependencies

pip install pandas sqlalchemy pymysql

3. Run Notebook

jupyter notebook notebooks/project.ipynb

4. Setup Database

  • Create MySQL database
  • Run SQL file:
source sql/customer_behaviour.sql;

5. Open Dashboard

  • Open customer_behaviour_dashboard.pbix in Power BI

Author

Aryan Srivastava

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors