Sales Vision 360

1. Introduction

Welcome to Sales Vision 360, an end-to-end data analytics solution designed to provide comprehensive insights into sales performance. This project integrates various technologies to create a robust data pipeline, from raw data acquisition to advanced analytics and interactive visualization. The primary goal is to empower businesses with data-driven decision-making capabilities for sales forecasting and customer segmentation.

Sales Vision 360 automates the entire process, ensuring data quality, consistency, and accessibility, ultimately transforming raw sales data into actionable business intelligence.

2. High-Level Architecture

The Sales Vision 360 architecture is designed for scalability and maintainability, leveraging cloud-native services and modern data stack tools. The pipeline orchestrates data flow from external sources, through a centralized data warehouse, into transformed models, and finally into analytical and visualization layers.

graph TD
    A[External Data Sources] --> B{Data Ingestion};
    B -- Web Scraper --> C[Raw Data Storage];
    B -- Snowflake Connector --> C;
    C --> D[Snowflake Data Warehouse];
    D -- dbt Transformations --> E[Transformed Data Models];
    E -- ML Pipelines --> F[Forecasting & Segmentation Outputs];
    F --> G[Streamlit Dashboard];
    E --> H[Power BI Dashboard];

    subgraph Data Flow
        C;
        D;
        E;
        F;
    end

    subgraph Analytics & Visualization
        G;
        H;
    end

3. Data Pipeline Components

3.1 Web Scraping

The initial stage of the pipeline involves gathering product data from various online marketplaces.

Purpose: To collect product information, including titles, prices, ratings, and reviews, from e-commerce platforms like Amazon and Flipkart.
Technology: Python-based web scraper utilizing playwright for browser automation.
Output: Raw data is initially saved in .txt files (JSONL format) and then converted into structured .csv files. This includes products_catalog.csv and sales_transactions.csv, which are enriched with synthetic sales transaction data.
Relevant Files:
- sales_pipeline_scrapper.py: Python script for scraping and synthetic data generation.
- data/sales_raw.txt: Raw scraped data.
- data/products_catalog.csv: Cleaned product catalog.
- data/sales_transactions.csv: Generated synthetic sales transactions.

3.2 Snowflake Data Warehousing

Snowflake serves as the central data warehouse for all raw and transformed data.

Purpose: To store large volumes of structured and semi-structured data, providing a scalable and high-performance environment for analytics.
Technology: Snowflake Cloud Data Warehouse.
Process: The sales_transactions.csv generated by the scraper is uploaded to Snowflake into the MYDATABASE.ABSSALESVISION360.SALES_TRANSACTIONS table.
Relevant Files:
- snowflake_connector.py: Python script to connect to Snowflake and upload CSV data.
- csv_download.py: Python script to connect to Snowflake and download data (e.g., SALES_TRANS.csv).
- snowflake_credits_checker.py: Utility script to check Snowflake connection details and available resources.
- .env: Stores Snowflake connection credentials securely.

3.3 dbt Data Transformation

dbt (data build tool) is used to transform raw data in Snowflake into clean, consistent, and analysis-ready data models.

Purpose: To define, manage, and execute data transformations, ensuring data quality and creating a single source of truth for downstream analytics.
Technology: dbt (Data Build Tool) with Snowflake adapter.
Process: Raw SALES_TRANSACTIONS data is transformed into various dimension and fact tables, such as CUSTOMER_DIM, PRODUCT_DIM, ORDER_DIM, MARKETING_ENGAGEMENT_DIM, SHIPPING_FULFILLMENT_DIM, METADATA, and SALES_FACT.
Relevant Files:
- sales_vision_360/dbt_project.yml: dbt project configuration.
- sales_vision_360/models/sales_transactions_transform.sql: Core transformation model.
- sales_vision_360/models/dimensions/: Directory containing dimension models.
- sales_vision_360/models/fact/financial_fact.sql: Fact model for sales.
- sales_vision_360/models/source/sources.yml: dbt source definition for raw sales data.
- sales_vision_360/logs/dbt.log: dbt execution logs.
- sales_vision_360/README.md: dbt project specific README.

3.4 Machine Learning Pipelines

Machine learning models are integrated into the pipeline to generate predictive insights.

Purpose: To perform sales forecasting and customer segmentation based on the transformed data. Churn prediction, though initially explored, was removed due to non-predictive data signals.
Technology: Python with scikit-learn, pandas, numpy, matplotlib, seaborn, plotly.
Process:
- Sales Forecasting: Uses GradientBoostingRegressor to predict daily revenue based on historical trends and calendar features.
- Customer Segmentation: Employs KMeans clustering on RFM (Recency, Frequency, Monetary) and other customer behavioral features to identify distinct customer segments (e.g., High-Value, Dormant, Frequent, Regular).
Output: Forecasting results (forecast_results.csv), customer segments (customer_segments.csv), and a performance summary (performance_summary.csv) are saved. Visualizations like forecast_plot.png and segmentation_plot.png are also generated.
Relevant Files:
- modeling/ml_pipelines_all.py: Python script implementing ML pipelines.
- modeling/churn_eda_diagnostic.py: EDA script (note: churn prediction is not part of the final ML pipeline).
- modeling/outputs/: Directory for ML outputs (CSVs, PNGs, joblib models).

3.5 Visualization and Dashboards

The final stage presents the insights through interactive dashboards.

Purpose: To provide business users with an intuitive interface to explore sales forecasts, customer segments, and key performance indicators.
Technology: Streamlit for interactive web dashboards and Power BI for detailed business intelligence reports.
Relevant Files:
- modeling/dashboard_app.py: Streamlit application for visualizing ML outputs.
- Sales vision 360 dashboard.pbix: Power BI desktop file for detailed sales analytics.

4. Project Structure

The project repository is organized as follows:

graph TD  
    A[Sales Vision 360 Project] --> B[Data Layer]  
    A --> C[Transformation Layer]  
    A --> D[ML & Analytics Layer]  
    A --> E[Visualization Layer]  
    A --> F[Infrastructure]
      
    B --> B1[data/]  
    B1 --> B2[products_catalog.csv]  
    B1 --> B3[sales_raw.txt]  
    B1 --> B4[sales_transactions.csv]
      
    C --> C1[sales_vision_360/ - dbt]  
    C1 --> C2[dimensions/]  
    C1 --> C3[fact/]  
    C1 --> C4[sales_transactions_transform.sql]  
    C2 --> C5[customer_dim.sql]  
    C2 --> C6[product_dim.sql]  
    C2 --> C7[order_dim.sql]  
    C3 --> C8[financial_fact.sql]
      
    D --> D1[modeling/]  
    D1 --> D2[ml_pipelines_all.py]  
    D1 --> D3[churn_eda_diagnostic.py]  
    D1 --> D4[dashboard_app.py]  
    D1 --> D5[outputs/]  
    D5 --> D6[Models & Results]
      
    E --> E1[Sales vision 360 dashboard.pbix]  
    E --> E2[Streamlit Dashboard]
      
    F --> F1[csv_download.py]  
    F --> F2[snowflake_connector.py]  
    F --> F3[snowflake_credits_checker.py]
      
    style A fill:#e1f5ff  
    style B fill:#fff4e1  
    style C fill:#e8f5e9  
    style D fill:#f3e5f5  
    style E fill:#fce4ec  
    style F fill:#f5f5f5

5. Getting Started

To set up and run the Sales Vision 360 project, follow these steps:

5.1 Prerequisites

Before you begin, ensure you have the following installed and configured:

Python 3.8+: Download and install from python.org.
Snowflake Account: Access to a Snowflake data warehouse with appropriate permissions.
dbt Core: Install dbt-snowflake adapter: pip install dbt-snowflake.
Environment Variables (.env): A .env file in the root directory of the project with your Snowflake credentials:
```
SNOWFLAKE_USER=your_username
SNOWFLAKE_PASSWORD=your_password
SNOWFLAKE_ACCOUNT=your_account_identifier
```
Browser for Scraping: Google Chrome is used by playwright.

5.2 Installation and Setup

Clone the Repository:

git clone https://github.com/your-username/SalesVision360.git
cd SalesVision360

Set up Python Environment:

python -m venv venv
source venv/bin/activate  # On Windows: `venv\Scripts\activate`
pip install -r requirements.txt # (assuming a requirements.txt is provided or create one from scripts)

Configure dbt Profile: Ensure your dbt profiles.yml (typically located at ~/.dbt/profiles.yml) is configured to connect to your Snowflake instance using the sales_vision_360 profile. An example profiles.yml might look like this:

sales_vision_360:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: "{{ env_var('SNOWFLAKE_ACCOUNT') }}"
      user: "{{ env_var('SNOWFLAKE_USER') }}"
      password: "{{ env_var('SNOWFLAKE_PASSWORD') }}"
      role: SALES_360 # Or your specific role
      warehouse: SALES_VISION_360 # Or your specific warehouse
      database: MYDATABASE # Or your specific database
      schema: ABSSALESVISION360 # Or your specific schema
      threads: 4
      client_session_keep_alive: False
      query_tag: dbt-sales-vision-360

You can verify your dbt connection using dbt debug from within the sales_vision_360 directory.

Run the Data Pipeline: Follow the detailed instructions in the Detailed Setup Guide for executing the scraping, data loading, dbt transformations, and ML pipelines.

6. Further Documentation

For more in-depth information on specific aspects of the project, please refer to the following documents:

Architectural Overview: Detailed explanation of the system architecture and design principles.
Data Ingestion & Snowflake Setup: Guide on setting up Snowflake and ingesting raw data.
dbt Data Transformation Guide: Comprehensive guide to the dbt project, models, and transformations.
ML Pipelines (Forecasting & Segmentation): Details on the machine learning models, feature engineering, and performance.
Dashboards & Visualization: Information on interacting with the Streamlit and Power BI dashboards.
Detailed Setup Guide: Step-by-step instructions for environment setup, dependency installation, and running the entire pipeline.

7. Glossary

dbt: Data Build Tool, an open-source tool for data transformation.
EDA: Exploratory Data Analysis.
MAPE: Mean Absolute Percentage Error, a measure of prediction accuracy.
ML: Machine Learning.
Power BI: A business intelligence tool for data visualization.
RMSE: Root Mean Square Error, a measure of prediction accuracy.
Snowflake: A cloud-based data warehousing platform.
Streamlit: An open-source app framework for machine learning and data science.
SKEY: Surrogate Key, a system-generated primary key in a data warehouse.
Snowflake Connector: Python library to connect to Snowflake.
Web Scraping: The process of extracting data from websites.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sales Vision 360

Table of Contents

1. Introduction

2. High-Level Architecture

3. Data Pipeline Components

3.1 Web Scraping

3.2 Snowflake Data Warehousing

3.3 dbt Data Transformation

3.4 Machine Learning Pipelines

3.5 Visualization and Dashboards

4. Project Structure

5. Getting Started

5.1 Prerequisites

5.2 Installation and Setup

6. Further Documentation

7. Glossary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
docs		docs
logs		logs
modeling		modeling
sales_vision_360		sales_vision_360
.env		.env
.gitignore		.gitignore
README.md		README.md
Sales vision 360 dashboard.pbix		Sales vision 360 dashboard.pbix
csv_download.py		csv_download.py
sales_pipeline_scrapper.py		sales_pipeline_scrapper.py
snowflake_connector.py		snowflake_connector.py
snowflake_credits_checker.py		snowflake_credits_checker.py

Folders and files

Latest commit

History

Repository files navigation

Sales Vision 360

Table of Contents

1. Introduction

2. High-Level Architecture

3. Data Pipeline Components

3.1 Web Scraping

3.2 Snowflake Data Warehousing

3.3 dbt Data Transformation

3.4 Machine Learning Pipelines

3.5 Visualization and Dashboards

4. Project Structure

5. Getting Started

5.1 Prerequisites

5.2 Installation and Setup

6. Further Documentation

7. Glossary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages