Skip to content

thed700/ecommerce-statistical-analysis

Repository files navigation

πŸ“Š E-Commerce Statistical Analysis Dashboard

Portfolio Project β€” Applying core statistical distributions and theorems to solve real-world e-commerce business problems.

Dashboard


🎯 Project Overview

This project demonstrates how statistical theory translates into business decisions in an e-commerce context. Using 1,200 synthetic customer transactions, four key statistical concepts are modeled, visualized, and interpreted through a business lens.

Tech Stack: Python Β· NumPy Β· Pandas Β· SciPy Β· Matplotlib


πŸ“ Project Structure

ecommerce-statistical-analysis/
β”‚
β”œβ”€β”€ ecommerce_statistical_analysis.py   # Main analysis script
β”œβ”€β”€ ecommerce_stats_dashboard.png       # Output dashboard (auto-generated)
└── README.md

πŸ”’ Dataset

Synthetic dataset of 1,200 customer transactions with the following schema:

Column Type Description
Customer_ID string Unique customer identifier (e.g. CUST_00001)
Purchase_Amount float Transaction value in USD
Conversion_Success int (0/1) Whether the session led to a purchase
Arrival_Time int Number of orders arriving in a given hour

πŸ“ Statistical Analyses

β‘  Normal Distribution & Z-Score

Goal: Detect anomalous purchase amounts (outliers).

  • Purchase amounts modeled as N(ΞΌ=105.26, Οƒ=101.40)
  • Customers with |Z| > 3 flagged as statistical outliers
  • 22 outliers detected out of 1,200 transactions

πŸ’Ό Business Insight: Flagged customers are candidates for VIP upselling programs or fraud review queues β€” protecting both revenue growth and loss prevention simultaneously.


β‘‘ Binomial Distribution

Goal: Model conversion probability across a batch of sessions.

  • Parameters: n = 200 sessions, p = 0.05 conversion rate
  • Calculates P(X = k) for all k using scipy.stats.binom
  • Example: P(X = 8) = 0.1137

πŸ’Ό Business Insight: Knowing the exact probability of hitting a conversion count helps marketing teams set data-backed KPIs and allocate ad spend without over- or under-estimating campaign outcomes.


β‘’ Poisson Distribution

Goal: Model hourly order arrival rates and predict peak surges.

  • Average rate: Ξ» = 12 orders/hour
  • Probability of a surge (>20 orders/hour): P(X > 20) = 0.0116

πŸ’Ό Business Insight: Even a ~1% surge probability at scale means hundreds of understaffed hours per year β€” Poisson modeling lets logistics teams proactively schedule warehouse capacity before crunch hits.


β‘£ Central Limit Theorem (CLT)

Goal: Demonstrate that sample means converge to Normal regardless of source distribution.

  • Source: Exponential distribution (heavily right-skewed)
  • Simulated 2,000 samples at sizes: n = 5, 30, 100, 500
  • As n increases, the sample mean distribution converges to N(ΞΌβ‰ˆ50, Οƒβ†’0)

πŸ’Ό Business Insight: CLT justifies using small customer surveys to estimate population-wide spending patterns β€” enabling confident business decisions without surveying every customer.


πŸš€ How to Run

1. Clone the repository:

git clone https://github.com/thed700/ecommerce-statistical-analysis.git
cd ecommerce-statistical-analysis

2. Install dependencies:

pip install numpy pandas scipy matplotlib

3. Run the analysis:

python ecommerce_statistical_analysis.py

The script will:

  • Generate the synthetic dataset
  • Print statistical results to console
  • Save ecommerce_stats_dashboard.png to the current directory

πŸ“Š Key Results Summary

Analysis Key Metric Business Value
Normal + Z-Score 22 outliers detected ( Z
Binomial P(X=8 | n=200, p=0.05) = 0.1137 Realistic KPI setting
Poisson P(surge>20/hr) = 0.0116 Proactive staffing
CLT Exponential β†’ Normal as nβ†’βˆž Survey-based inference

πŸ›  Skills Demonstrated

  • Statistical distribution modeling (Normal, Binomial, Poisson)
  • Hypothesis-driven outlier detection with Z-scores
  • Monte Carlo simulation for CLT demonstration
  • Publication-quality data visualization with Matplotlib
  • Translating statistical outputs into business recommendations

πŸ‘€ Author

Akmal Raxmatov

  • GitHub: @thed700
  • Focus: Data Analytics Β· Economic Analysis Β· Statistical Modeling

This project is part of a self-directed data analytics portfolio targeting Junior Data Analyst roles.

About

πŸ“ˆ E-Commerce Statistical Analysis: Applying core probability distributions (Normal, Binomial, Poisson) and Central Limit Theorem to solve real-world retail problems like outlier detection and demand forecasting.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages