Skip to content

Analizing customer purchase behaviour per cohorts using Online Retail II Dataset from Kaggle.

Notifications You must be signed in to change notification settings

adalloret/data_analytics

Repository files navigation

🛒 Customer Cohort Analysis — Online Retail II Dataset

This project performs a Cohort Analysis using the Online Retail II dataset from Kaggle. It analyzes customer purchase behavior over time to identify retention patterns, revenue trends, and average revenue per user (ARPU) by cohort.

📈 Objective

The goal of this analysis is to understand:

How customer activity evolves over their lifetime.

Whether newer cohorts perform better or worse in terms of revenue.

How average revenue per user changes month by month.

🧠 Methodology

Data Cleaning

Removed rows with missing CustomerID.

Calculated Revenue as Quantity × Price.

Cohort Creation

Defined the first purchase month (FirstOrderMonth) for each customer.

Calculated the purchase month (InvoiceDateMonth) for each transaction.

Grouped by (FirstOrderMonth, InvoiceDateMonth) to compute:

Unique number of customers.

Total revenue per cohort and month.

Cohort Lifetime

Computed CohortLifetime = difference in months between the order month and the cohort’s first order month.

Metrics

ARPU (Average Revenue per User): Revenue / Customers

Retention: Number of active customers per cohort over time.

Total Revenue: Sum of revenues per cohort and lifetime month.

Visualization

Created heatmaps using Seaborn for:

ARPU over time

Active customers over time

Total revenue per cohort

📊 Key Visualizations

Metric Description Color Palette ARPU Over Time Shows average revenue per user per cohort. YlGnBu Number of Active Customers Retention behavior of each cohort. Purples Total Revenue by Cohort Total monthly revenue per cohort. OrRd

⚙️ Technologies Used

Python

Pandas

NumPy

Seaborn / Matplotlib

KaggleHub (to download the dataset directly)

▶️ How to Run

Clone this repository:

git clone https://github.com//cohort-analysis-online-retail.git cd cohort-analysis-online-retail

Install dependencies:

pip install pandas numpy seaborn matplotlib kagglehub

Run the analysis:

python cohort_analysis_with_the_online_retail_ii_dataset.py

About

Analizing customer purchase behaviour per cohorts using Online Retail II Dataset from Kaggle.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages