Portfolio Project β Applying core statistical distributions and theorems to solve real-world e-commerce business problems.
This project demonstrates how statistical theory translates into business decisions in an e-commerce context. Using 1,200 synthetic customer transactions, four key statistical concepts are modeled, visualized, and interpreted through a business lens.
Tech Stack: Python Β· NumPy Β· Pandas Β· SciPy Β· Matplotlib
ecommerce-statistical-analysis/
β
βββ ecommerce_statistical_analysis.py # Main analysis script
βββ ecommerce_stats_dashboard.png # Output dashboard (auto-generated)
βββ README.md
Synthetic dataset of 1,200 customer transactions with the following schema:
| Column | Type | Description |
|---|---|---|
Customer_ID |
string | Unique customer identifier (e.g. CUST_00001) |
Purchase_Amount |
float | Transaction value in USD |
Conversion_Success |
int (0/1) | Whether the session led to a purchase |
Arrival_Time |
int | Number of orders arriving in a given hour |
Goal: Detect anomalous purchase amounts (outliers).
- Purchase amounts modeled as N(ΞΌ=105.26, Ο=101.40)
- Customers with |Z| > 3 flagged as statistical outliers
- 22 outliers detected out of 1,200 transactions
πΌ Business Insight: Flagged customers are candidates for VIP upselling programs or fraud review queues β protecting both revenue growth and loss prevention simultaneously.
Goal: Model conversion probability across a batch of sessions.
- Parameters: n = 200 sessions, p = 0.05 conversion rate
- Calculates P(X = k) for all k using
scipy.stats.binom - Example: P(X = 8) = 0.1137
πΌ Business Insight: Knowing the exact probability of hitting a conversion count helps marketing teams set data-backed KPIs and allocate ad spend without over- or under-estimating campaign outcomes.
Goal: Model hourly order arrival rates and predict peak surges.
- Average rate: Ξ» = 12 orders/hour
- Probability of a surge (>20 orders/hour): P(X > 20) = 0.0116
πΌ Business Insight: Even a ~1% surge probability at scale means hundreds of understaffed hours per year β Poisson modeling lets logistics teams proactively schedule warehouse capacity before crunch hits.
Goal: Demonstrate that sample means converge to Normal regardless of source distribution.
- Source: Exponential distribution (heavily right-skewed)
- Simulated 2,000 samples at sizes: n = 5, 30, 100, 500
- As n increases, the sample mean distribution converges to N(ΞΌβ50, Οβ0)
πΌ Business Insight: CLT justifies using small customer surveys to estimate population-wide spending patterns β enabling confident business decisions without surveying every customer.
1. Clone the repository:
git clone https://github.com/thed700/ecommerce-statistical-analysis.git
cd ecommerce-statistical-analysis2. Install dependencies:
pip install numpy pandas scipy matplotlib3. Run the analysis:
python ecommerce_statistical_analysis.pyThe script will:
- Generate the synthetic dataset
- Print statistical results to console
- Save
ecommerce_stats_dashboard.pngto the current directory
| Analysis | Key Metric | Business Value |
|---|---|---|
| Normal + Z-Score | 22 outliers detected ( | Z |
| Binomial | P(X=8 | n=200, p=0.05) = 0.1137 | Realistic KPI setting |
| Poisson | P(surge>20/hr) = 0.0116 | Proactive staffing |
| CLT | Exponential β Normal as nββ | Survey-based inference |
- Statistical distribution modeling (Normal, Binomial, Poisson)
- Hypothesis-driven outlier detection with Z-scores
- Monte Carlo simulation for CLT demonstration
- Publication-quality data visualization with Matplotlib
- Translating statistical outputs into business recommendations
Akmal Raxmatov
- GitHub: @thed700
- Focus: Data Analytics Β· Economic Analysis Β· Statistical Modeling
This project is part of a self-directed data analytics portfolio targeting Junior Data Analyst roles.
