This project performs a Pareto (80/20) analysis on transactional pet shop sales data to determine:
How many customers generate 80% of total revenue?
The analysis uses SQL window functions, views, and common table expressions (CTEs) to rank customers by revenue and compute cumulative contribution.
File: data/pet_shop_sales.csv
The dataset contains individual sales transactions with the following key fields:
CustomerID— unique customer identifierQuantity— number of items soldUnitPrice— price per itemInvoiceDate— transaction dateCountry— customer location
Revenue per transaction is calculated as:
revenue = Quantity × UnitPrice
The analysis follows these steps:
- Calculate revenue per transaction
- Aggregate revenue by customer
- Rank customers by descending revenue
- Compute cumulative revenue using window functions
- Calculate cumulative revenue share
- Identify the minimum number of customers required to reach 80% of total revenue
Two implementations are provided:
- Views pipeline — modular, reusable layered queries
- CTE pipeline — single-query analytical workflow
sql/01_views.sql
Creates layered analytical views:
- Transaction-level revenue calculations
- Customer revenue aggregation
- Ranked customers with cumulative metrics
- Final Pareto percentage metrics
sql/03_cte_pareto_analysis.sql
A single CTE-driven query that performs the full Pareto analysis and returns the final result.
This version demonstrates:
- Window functions
- Analytical ranking
- Cumulative aggregation
- Revenue segmentation
From the dataset:
150 out of 261 customers (~57%) generate 80% of total revenue
See:
outputs/pareto_results.csv
Key output metrics include:
- cumulative revenue
- total revenue
- percentage of customers needed to reach 80%
- cumulative sales share
- SQL aggregation and grouping
- Window functions (
ROW_NUMBER,SUM OVER,COUNT OVER) - Common Table Expressions (CTEs)
- Analytical ranking and segmentation
- Pareto analysis
- Modular query design
- Load
pet_shop_sales.csvinto your SQL environment - Run either:
sql/01_views.sql
sql/02_views_final.sql
sql/03_cte_pareto_analysis.sql
This project was built using BigQuery SQL syntax. Minor adjustments may be required for other SQL dialects.
The analysis illustrates how Pareto principles can be applied to customer segmentation and revenue concentration analysis.