Skip to content

Tarun110/Revenue-retail-optimization-sql

Repository files navigation

🛍️ Retail Revenue Optimization System (SQL)

This project analyzes sales performance for a retail company and identifies where revenue and profit are being won or lost across products, discounts, marketing channels, and customer behavior. The goal is to use data-driven insights to improve pricing strategy, promotion effectiveness, and overall profit performance.

📌 Overview
The primary objective of this project is to evaluate a retail revenue engine end-to-end — from customer acquisition channels to product-level profitability. By applying advanced SQL analytics techniques, the project uncovers revenue drivers, margin leaks, and optimization opportunities across the business.

This project simulates a real-world business intelligence and revenue analytics use case commonly encountered in e-commerce, retail, and marketplace companies.


🚀 Key Features

📊 Dataset
Uses a real-world retail transaction dataset (Online Retail) containing invoice-level line items, including product purchases, quantities, prices, customer IDs, and country information.
A lightweight sample CSV is included for reproducibility.
The full dataset can be downloaded using instructions in docs/project_notes.md.

🧠 Relational Data Modeling
Implements a clean analytics-ready schema with:

  • Staging table for raw ingestion
  • Dimension tables (customers, products, countries)
  • Fact table for sales transactions

📈 Revenue & Profit Metrics
Calculates:

  • Monthly net revenue trends
  • Gross revenue vs returns
  • Average order value (AOV)
  • Basket size distribution
  • Customer lifetime value (CLV)
  • Gross margin and profitability

🏷️ Return & Discount Analysis
Identifies:

  • Return rates by product and country
  • High-refund SKUs
  • Revenue leakage from returns
  • Price band performance

📣 Geographic Performance
Evaluates revenue, AOV, and customer concentration by country.

🧩 RFM Segmentation
Segments customers into:

  • High-value
  • Loyal
  • At-risk
  • Churn-risk
    Using Recency, Frequency, and Monetary scoring logic.

💻 SQL-First Analytics
Implements all analytics using SQL:

  • CTEs
  • Window functions
  • Aggregations
  • CASE statements
  • Reusable BI views

📝 Documentation
Includes structured insights and business recommendations.

🔁 Reproducibility
Provides schema + CSV loader scripts + analytics queries.


1️⃣ Data Preparation

Loaded raw retail transaction data from the CSV file.
Standardized data types and timestamps.
Filtered invalid records (negative quantity, null prices).
Prepared staging tables for clean ingestion into the analytics schema.


2️⃣ Schema Design & Ingestion

Designed a normalized PostgreSQL schema:

  • Staging table for raw ingestion
  • Dimension tables:
    • dim_customer
    • dim_product
    • dim_country
  • Fact table:
    • fact_sales

Created all tables, keys, and indexes in sql/schema.sql.
Loaded raw CSV data and populated dimensions and facts using sql/load_csv_postgres.sql.


3️⃣ Analytics Layer (SQL Queries)

Built production-grade analytics queries in sql/analytics.sql, including:

Monthly Revenue Trends
Calculates net revenue and growth trends month-over-month.

Return Rate Analysis
Identifies products and countries with the highest return ratios.

Top Products by Revenue
Ranks SKUs based on total contribution to revenue.

Country-Level AOV
Computes average order value by geography.

Basket Size Analysis
Measures average units per order and distribution.

Price Band Performance
Groups products into price tiers and evaluates revenue contribution.

RFM Segmentation
Scores customers based on:

  • Recency of purchase
  • Purchase frequency
  • Total monetary value

Customer Lifetime Value (CLV)
Estimates long-term customer revenue potential.

Revenue Concentration (80/20 Rule)
Identifies how much revenue is driven by top customers and products.


📈 Key Insights

A small percentage of products contribute to the majority of total revenue, confirming an 80/20 revenue distribution.
High-return products significantly reduce net revenue and should be reviewed for quality or fulfillment issues.
Certain countries generate higher AOV but lower purchase frequency, indicating premium customer segments.
Loyal customers drive a disproportionate share of revenue and should be prioritized for retention strategies.
Low-priced products drive volume but often underperform on profitability.


💡 Business Recommendations

Prioritize high-margin, low-return products for promotional campaigns.
Introduce quality control and fulfillment audits for high-return SKUs.
Launch loyalty programs targeting high-value and loyal customer segments.
Re-evaluate pricing for low-margin, high-volume products.
Expand marketing spend in high-AOV geographic regions.
Use RFM segmentation to personalize promotions and retention offers.


📊 Dataset Details

This project uses the Online Retail transactional dataset (invoice-level line items) originally distributed in the public Spark: The Definitive Guide repository.
The dataset captures detailed customer purchase behavior across products, quantities, prices, and countries.

Key Columns:

  • InvoiceNo (cancellations often start with 'C')
  • StockCode, Description
  • Quantity, UnitPrice
  • InvoiceDate
  • CustomerID (may be missing)
  • Country

📥 Full Dataset Access

The complete retail dataset exceeds GitHub’s file size limit.

To download the full dataset:
https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx

Steps:

  1. Download the Excel file
  2. Convert it to CSV
  3. Rename it to: online_retail.csv
  4. Place it in: data/raw/
  5. Run: sql/load_csv_postgres.sql

A lightweight sample CSV is included in this repository for reproducibility and quick testing.


🧠 Modeling Choices

  • Load the raw file into a staging table: stg_online_retail
  • Normalize into:
    • dim_product
    • dim_customer
    • Line-item fact table: fact_sales
  • Use invoice_no LIKE 'C%' and/or quantity < 0 as return/cancellation signals
  • Retain cancellations for net revenue and return-rate calculations

🎯 What to Highlight

  • Clean schema design (staging + star schema)
  • Business KPIs:
    • Monthly net revenue
    • Return rate
    • Top products
    • Average order value (AOV)
    • Country performance
  • RFM segmentation using SQL window functions

🚀 How to Run the Project (Optional)

If you choose to execute the project locally in PostgreSQL:

-- Step 1: Create schema and tables
sql/schema.sql

-- Step 2: Load CSV data
sql/load_csv_postgres.sql

-- Step 3: Run analytics queries
sql/analytics.sql

About

End-to-end SQL analytics system analyzing retail revenue, promotions, channels, and customer behavior using real transaction data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors