Skip to content

rcyberly/FUTURE_ML_01

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retail Sales & Demand Forecasting Pipeline

This project is a modular Machine Learning framework designed to solve the "cold start" and "seasonal trend" challenges in retail sales. It transforms raw transactional data into actionable insights through an automated pipeline that handles everything from feature engineering to model evaluation.

Architecture & Logic The project is structured to separate logic from parameters:

Configuration-Driven: All hyperparameters, lag periods (1, 7, 28 days), and seasonal cycles are managed via config.yaml.

Feature Engineering: Automatically generates rolling averages, standard deviations, and Fourier transforms for seasonality.

Model Zoo: Supports multiple algorithms to compare performance across different product families:

Prophet: For robust handling of holidays and yearly seasonality.

SARIMAX: For statistical univariate time-series modeling.

Random Forest: For capturing non-linear relationships using engineered features.

Technical Stack

Core: Python 3.x.

ML Libraries: scikit-learn (Random Forest), prophet, statsmodels (SARIMAX).

Data Science: pandas, numpy, scipy.

Workflow: PyYAML for configuration management.

Overview

This project builds an interactive sales forecasting dashboard using Streamlit and time series models (Prophet, ARIMA, SARIMAX, Random Forest).
It uses the Corporación Favorita Store Sales dataset (train/test/transactions/oil/holidays/stores) to forecast daily sales and visualize results.


Project Structure

salesforecast/ ├── data/ │ ├── raw/ # Original dataset files │ │ ├── train.csv │ │ ├── test.csv │ │ ├── transactions.csv │ │ ├── oil.csv │ │ ├── holidays_events.csv │ │ ├── stores.csv │ │ └── sample_submission.csv │ └── processed/ # Generated features and forecast │ ├── sales_features.csv │ └── sales_forecast.csv │ ├── models/ # Saved trained models │ ├── prophet_model.json │ ├── rf_model.pkl │ └── sarimax_model.pkl │ ├── notebooks/ # Jupyter notebooks for EDA and modeling │ ├── 01_eda.ipynb │ ├── 02_features.ipynb │ └── 03_models.ipynb │ ├── reports/ # Business reports and visual outputs │ ├── figures/ │ └── business_summary.md │ ├── src/ # Source code modules │ ├── config_loader.py │ ├── data_prep.py │ ├── evaluation.py │ ├── exports_pdf.py │ ├── features.py │ ├── models.py │ ├── pipeline.py │ ├── pipeline_rf.py │ ├── pipeline_compare.py │ ├── report_generator.py │ └── visualization.py │ ├── config.yaml # Configuration file ├── dashboard.py # Streamlit dashboard ├── main.py # Forecasting pipeline ├── requirements.txt # Python dependencies └── README.md # Project documentation


Store Sales Forecasting Dashboard

Overview

This project builds a full-stack sales forecasting system using Streamlit, Prophet, ARIMA, SARIMAX, and Random Forest.
It uses the Corporación Favorita Store Sales dataset to predict daily sales and visualize results interactively.


Project Structure

See folder breakdown above for data, models, notebooks, reports, and modular source code.


Setup Instructions

1. Clone the repository

git clone cd salesforecast

2. Create a virtual environment

python -m venv venv source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows

3. Install dependencies

pip install -r requirements.txt

Prophet requires cmdstan backend: pip install prophet cmdstanpy
python -c "import cmdstanpy; cmdstanpy.install_cmdstan()"


Running the App

Step 1: Generate processed data & forecast

streamlit run main.py

This will:

  • Aggregate daily sales from train.csv
  • Train Prophet (fallback to ARIMA if needed)
  • Save sales_features.csv and sales_forecast.csv

Step 2: Launch the dashboard

streamlit run dashboard.py

This will:

  • Load the forecast
  • Display charts, metrics, and tables interactively

Features

  • Daily sales forecasting using Prophet
  • ARIMA fallback for robustness
  • Interactive Streamlit dashboard
  • Saved models: Prophet, SARIMAX, Random Forest
  • Modular pipeline with config and reporting
  • Extendable with regressors (oil, holidays, promotions)

Next Steps

  • Add filters for store_nbr and product family
  • Compare model outputs side-by-side
  • Deploy on Streamlit Cloud or Heroku
  • Improve accuracy with external regressors

Author

Developed by Rajinder Kumar
Backend Developer & Aspiring Founder | Python, Django, FastAPI, Streamlit

About

Enterprise-grade Machine Learning pipeline for retail demand forecasting. Features a configuration-driven architecture, automated feature engineering, and a comparative model tournament (Prophet, SARIMAX, Random Forest) for high-accuracy store-level predictions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors