Skip to content

devina-h/Supplier-selection-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supplier-selection-model

Machine Learning for Optimal Supplier Selection

This project details the development of a machine learning solution for the Acme Corporation to automate and optimize its daily supplier selection process. The goal is to minimize task completion costs by predicting the most effective supplier for any given task.

Business Problem

The Acme Corporation completes a unique task each day, choosing from a pool of 64 available suppliers. The final cost of completing a task is highly dependent on the chosen supplier's effectiveness. However, estimating this cost in advance requires significant resources. To address this, Acme Corporation commissioned the development of a machine learning model to predict the optimal supplier for new tasks, thereby minimizing operational costs.

Datasets

The analysis is based on three datasets provided by the client:

  1. tasks.csv: Contains feature data for each task (e.g., TF1, TF2, ...), with each task uniquely identified by a Task ID.
  2. suppliers.csv: Contains feature data for each of the 64 suppliers (e.g., SF1, SF2, ...), identified by a Supplier ID.
  3. costs.csv.zip: A compressed CSV file detailing the historical cost (in millions of dollars) of a task when performed by a specific supplier.

Analytical Methodology

The project follows a comprehensive machine learning workflow, from data processing to model deployment and evaluation.

  1. Data Preparation: This initial stage involves preparing the data for modeling. Key steps include handling missing values, performing feature selection to identify the most impactful variables, applying feature scaling to normalize the data, and filtering out underperforming suppliers to create a more manageable dataset.
  2. Exploratory Data Analysis (EDA): Various EDA techniques are used to understand the distributions of task features, cost data, and overall supplier performance. This phase focuses on identifying patterns in the cost data and analyzing the distribution of selection errors for different suppliers.
  3. Model Development and Evaluation:
    • The task features, supplier features, and cost data are merged into a single, unified dataset.
    • The data is split into training and testing sets, with a specific grouping methodology to ensure that data for any single task does not appear in both sets simultaneously.
    • A regression model is trained to predict the cost for any given task-supplier combination. The supplier with the lowest predicted cost is selected as the optimal choice.
    • Model performance is evaluated against the test set using a custom scoring metric.
  4. Cross-Validation: To ensure model robustness, a Leave-One-Group-Out cross-validation strategy is implemented. This method is grouped by Task ID and utilizes a custom scorer based on the project's specific error metric to provide a reliable performance estimate.
  5. Hyperparameter Optimization: GridSearch is employed to systematically identify the best hyperparameter configuration for the selected machine learning model. This process also uses the Leave-One-Group-Out cross-validation strategy to prevent data leakage and ensure the optimal parameters are chosen.
  6. Comparative Model Analysis: Additional regression models from the Scikit-learn library are selected and evaluated. The entire pipeline (Steps 3-5) is repeated for these models to compare their performance and discuss the relative strengths and weaknesses of each approach.

Technology Stack

The solution is implemented entirely in Python, leveraging the following core data science libraries:

  • NumPy
  • Pandas
  • Matplotlib
  • Scikit-learn

Machine Learning Algorithms Employed

  1. Lasso Regression
  2. Ridge Regression
  3. Support Vector Regression (SVR)
  4. Random Forest Regression
  5. k-Nearest Neighbors Regression
  6. Histogram Gradient Boosting Regression

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages