Retail Sales & Demand Forecasting Pipeline
This project is a modular Machine Learning framework designed to solve the "cold start" and "seasonal trend" challenges in retail sales. It transforms raw transactional data into actionable insights through an automated pipeline that handles everything from feature engineering to model evaluation.
Architecture & Logic The project is structured to separate logic from parameters:
Configuration-Driven: All hyperparameters, lag periods (1, 7, 28 days), and seasonal cycles are managed via config.yaml.
Feature Engineering: Automatically generates rolling averages, standard deviations, and Fourier transforms for seasonality.
Model Zoo: Supports multiple algorithms to compare performance across different product families:
Prophet: For robust handling of holidays and yearly seasonality.
SARIMAX: For statistical univariate time-series modeling.
Random Forest: For capturing non-linear relationships using engineered features.
Technical Stack
Core: Python 3.x.
ML Libraries: scikit-learn (Random Forest), prophet, statsmodels (SARIMAX).
Data Science: pandas, numpy, scipy.
Workflow: PyYAML for configuration management.
This project builds an interactive sales forecasting dashboard using Streamlit and time series models (Prophet, ARIMA, SARIMAX, Random Forest).
It uses the Corporación Favorita Store Sales dataset (train/test/transactions/oil/holidays/stores) to forecast daily sales and visualize results.
salesforecast/ ├── data/ │ ├── raw/ # Original dataset files │ │ ├── train.csv │ │ ├── test.csv │ │ ├── transactions.csv │ │ ├── oil.csv │ │ ├── holidays_events.csv │ │ ├── stores.csv │ │ └── sample_submission.csv │ └── processed/ # Generated features and forecast │ ├── sales_features.csv │ └── sales_forecast.csv │ ├── models/ # Saved trained models │ ├── prophet_model.json │ ├── rf_model.pkl │ └── sarimax_model.pkl │ ├── notebooks/ # Jupyter notebooks for EDA and modeling │ ├── 01_eda.ipynb │ ├── 02_features.ipynb │ └── 03_models.ipynb │ ├── reports/ # Business reports and visual outputs │ ├── figures/ │ └── business_summary.md │ ├── src/ # Source code modules │ ├── config_loader.py │ ├── data_prep.py │ ├── evaluation.py │ ├── exports_pdf.py │ ├── features.py │ ├── models.py │ ├── pipeline.py │ ├── pipeline_rf.py │ ├── pipeline_compare.py │ ├── report_generator.py │ └── visualization.py │ ├── config.yaml # Configuration file ├── dashboard.py # Streamlit dashboard ├── main.py # Forecasting pipeline ├── requirements.txt # Python dependencies └── README.md # Project documentation
This project builds a full-stack sales forecasting system using Streamlit, Prophet, ARIMA, SARIMAX, and Random Forest.
It uses the Corporación Favorita Store Sales dataset to predict daily sales and visualize results interactively.
See folder breakdown above for data, models, notebooks, reports, and modular source code.
git clone cd salesforecast
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows
pip install -r requirements.txt
Prophet requires cmdstan backend: pip install prophet cmdstanpy
python -c "import cmdstanpy; cmdstanpy.install_cmdstan()"
streamlit run main.py
This will:
- Aggregate daily sales from train.csv
- Train Prophet (fallback to ARIMA if needed)
- Save sales_features.csv and sales_forecast.csv
streamlit run dashboard.py
This will:
- Load the forecast
- Display charts, metrics, and tables interactively
- Daily sales forecasting using Prophet
- ARIMA fallback for robustness
- Interactive Streamlit dashboard
- Saved models: Prophet, SARIMAX, Random Forest
- Modular pipeline with config and reporting
- Extendable with regressors (oil, holidays, promotions)
- Add filters for store_nbr and product family
- Compare model outputs side-by-side
- Deploy on Streamlit Cloud or Heroku
- Improve accuracy with external regressors
Developed by Rajinder Kumar
Backend Developer & Aspiring Founder | Python, Django, FastAPI, Streamlit