Supplier-selection-model

Machine Learning for Optimal Supplier Selection

This project details the development of a machine learning solution for the Acme Corporation to automate and optimize its daily supplier selection process. The goal is to minimize task completion costs by predicting the most effective supplier for any given task.

Business Problem

The Acme Corporation completes a unique task each day, choosing from a pool of 64 available suppliers. The final cost of completing a task is highly dependent on the chosen supplier's effectiveness. However, estimating this cost in advance requires significant resources. To address this, Acme Corporation commissioned the development of a machine learning model to predict the optimal supplier for new tasks, thereby minimizing operational costs.

Datasets

The analysis is based on three datasets provided by the client:

tasks.csv: Contains feature data for each task (e.g., TF1, TF2, ...), with each task uniquely identified by a Task ID.
suppliers.csv: Contains feature data for each of the 64 suppliers (e.g., SF1, SF2, ...), identified by a Supplier ID.
costs.csv.zip: A compressed CSV file detailing the historical cost (in millions of dollars) of a task when performed by a specific supplier.

Analytical Methodology

The project follows a comprehensive machine learning workflow, from data processing to model deployment and evaluation.

Data Preparation: This initial stage involves preparing the data for modeling. Key steps include handling missing values, performing feature selection to identify the most impactful variables, applying feature scaling to normalize the data, and filtering out underperforming suppliers to create a more manageable dataset.
Exploratory Data Analysis (EDA): Various EDA techniques are used to understand the distributions of task features, cost data, and overall supplier performance. This phase focuses on identifying patterns in the cost data and analyzing the distribution of selection errors for different suppliers.
Model Development and Evaluation:
- The task features, supplier features, and cost data are merged into a single, unified dataset.
- The data is split into training and testing sets, with a specific grouping methodology to ensure that data for any single task does not appear in both sets simultaneously.
- A regression model is trained to predict the cost for any given task-supplier combination. The supplier with the lowest predicted cost is selected as the optimal choice.
- Model performance is evaluated against the test set using a custom scoring metric.
Cross-Validation: To ensure model robustness, a Leave-One-Group-Out cross-validation strategy is implemented. This method is grouped by Task ID and utilizes a custom scorer based on the project's specific error metric to provide a reliable performance estimate.
Hyperparameter Optimization: GridSearch is employed to systematically identify the best hyperparameter configuration for the selected machine learning model. This process also uses the Leave-One-Group-Out cross-validation strategy to prevent data leakage and ensure the optimal parameters are chosen.
Comparative Model Analysis: Additional regression models from the Scikit-learn library are selected and evaluated. The entire pipeline (Steps 3-5) is repeated for these models to compare their performance and discuss the relative strengths and weaknesses of each approach.

Technology Stack

The solution is implemented entirely in Python, leveraging the following core data science libraries:

NumPy
Pandas
Matplotlib
Scikit-learn

Machine Learning Algorithms Employed

Lasso Regression
Ridge Regression
Support Vector Regression (SVR)
Random Forest Regression
k-Nearest Neighbors Regression
Histogram Gradient Boosting Regression

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
data_preparation_1.py		data_preparation_1.py
eda_2.py		eda_2.py
machine_learning_3.py		machine_learning_3.py
main.py		main.py
report.py		report.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supplier-selection-model

Machine Learning for Optimal Supplier Selection

Business Problem

Datasets

Analytical Methodology

Technology Stack

Machine Learning Algorithms Employed

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Supplier-selection-model

Machine Learning for Optimal Supplier Selection

Business Problem

Datasets

Analytical Methodology

Technology Stack

Machine Learning Algorithms Employed

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages