The Vendor Performance Analysis project is an end-to-end data analytics solution designed to evaluate vendor performance using procurement, inventory, sales, and pricing data. The project ingests raw CSV datasets into a MySQL-backed relational database, performs aggregations and KPI calculations, and produces analytical outputs suitable for business decision-making and executive dashboards.
This repository intentionally contains only source code, notebooks, and configuration files. All large datasets, database files, and output artifacts are hosted externally and referenced below.
- Consolidate vendor-related data from multiple operational sources
- Evaluate vendor performance using financial and operational KPIs
- Identify high-performing and under-performing vendors and brands
- Enable downstream visualization using Power BI connected directly to MySQL
Primary Data Source: MySQL
The Power BI dashboard connects directly to MySQL tables generated by the Python ingestion and transformation pipeline.
Data Flow
- Raw CSV files ->
data/directory (downloaded externally) - Ingestion into SQLite/MySQL using Python
- Vendor-level aggregation and KPI computation
- Final summary tables stored in MySQL
- Visualization using Power BI
vendor-performance-analysis/
│
├── data/ # Ignored (download separately)
├── logs/ # Application logs
│ └── ingestion_db.log
│
├── eda.ipynb # Exploratory Data Analysis
├── ingestion_db.ipynb # Notebook-based ingestion
├── ingestion_db.py # Raw data ingestion script
├── vendor_summary.py # Vendor KPI aggregation logic
├── performance_analysis.ipynb # Performance analysis
│
├── vendor_analysis.db # Ignored (SQLite – dev only)
├── vendor_sales_summary.csv # Ignored (final output)
├── vendor_analysis_dashboard.pbix # Ignored (Power BI file)
│
├── .gitignore
├── requirements.txt
└── README.md
All datasets and large artifacts are hosted on Google Drive.
Google Drive Link: https://drive.google.com/drive/folders/11wFFbi7JJ3dBptVhIKTwmVTtT1_B0Aay
-
Database db file – SQLite database generated via Python
-
Dataset – All raw CSV files (place inside
data/) -
Sales summary csv – Final aggregated output csv
-
Power BI files – Dashboard
.pbixfile and screenshots
The ingestion pipeline programmatically creates the following tables:
-
sales -
begin_inventory -
end_inventory -
purchase_prices -
purchases -
vendor_invoice -
vendor_sales_summary
-
Gross Profit
-
Profit Margin (%)
-
Stock Turnover
-
Sales-to-Purchase Ratio
-
Freight Cost Impact
-
All ingestion and processing steps are logged
-
Logs are stored in the logs/ directory
This project demonstrates:
- ETL & Data Engineering: Python, Pandas, SQLAlchemy, MySQL
- Database Design: Relational schema, table creation, data aggregation
- Data Analysis & Visualization: Jupyter Notebooks, KPI computation
- Business Intelligence: Power BI dashboard creation and reporting
- Version Control: Git & GitHub project management
- Logging & Debugging: Structured logging of ingestion and transformation
-
Parameterized database configuration via
.env -
Automated MySQL deployment
-
Incremental ingestion
-
Data validation checks
-
CI/CD integration
