Estimating Delivery Time

Problem Statement

You are working as a data scientist at a food delivery company. The company wants to imporve its system that calculates ETA for delivery persons. Rather than relying on some fixed method/formula. The management has decided to develop intelligent software that can predict the time of arrival for the delivery persons.

Task

Develop a machine learning model that can calculate the time taken by delivery person to deliver the order, given relevant information.

Approach

E.D.A

Time taken to deliver the order is highest in Semi-Urban cities and least in Urban cities¶

To deliver the order on festival is more time consuming. There could be two reasons:
1. Large no. of orders.
2. More Traffic on roads.

Time taken to deliver the order in Fog and Cloudy Weather is greater then in other conditions

It can be observed there is a sudden drop in Time taken from Ratings 4.5 and greater.

Most of the Orders are recieved and delivered during evening(6 PM to 10PM)

Time taken to deliver the order is least in Sunny Weather with Medium Traffic and highest in Foggy Weather in Jam

Time taken to deliver the order is least in Mornings of Cloudy, Fog or Sunny Weather and maximum in evenings.

Refer to the notebook for the complete analysis.

Feature Engineering

First, Created Stratified 5 folds of the data. create_folds.py

Filled Null values in Categorical variables by "NULL"
Fixed ratings and time.

Refer to the fixing data notebook

Extracted granular features from Date and Time columns
Created bins for Order time.
Calculated Distance metrics for location data.
Computed GeoHash of the Locations.
Greedily combined pairs of categorical columns.

Refer to feature_eng.py

Feature Encoding

Applied Label Encoding on Road_traffic_density, Festival and City columns.
Applied Target Mean encoding with cross validation on the remaining categorical columns

Refer to feature_encode.py

Feature Selection

Dropped features with variance less than or equal to 0.1.
Kept the features selected by CatBoost, XGBoost and LightGBM.

Refer to feature_selection.py

Imputation

Iteratively imputed the data using LightGBM and Catboost.
The Imputed data is only used for the models, which cannot handle null values.

Refer to impute.py

Model Selection

Performed Stratified K-fold cross validation on Regression models.

Model	R2 Score	RMSE
LightGBM	0.8274	3.8982
CatBoost without categorical encoding	0.8266	3.9077
Random Forest	0.8232	3.9461
CatBoost with categorical encoding.	0.8188	3.9722
XGBoost	0.8167	4.0172
Gradient Boosting	0.7836	4.3654
AdaBoost	0.6095	5.8631
Linear Regression	0.5619	6.2112

Based on above results, Selected Model: LightGBM, CatBoost, Random Forest and XGBoost.
Also, Catboost Performs better with the data in which categorical columns are not encoded.

Hypertuning

Hypertuned the selected models using Optuna.
Results After Hypertuning:

Model	R2 Score	RMSE
Catboost	0.8319	3.8476
XGBoost	0.8318	3.8484
LightGBM	0.8284	3.8876
Random Forest	0.8274	3.8990

Refer to hypertuning.py

Best Model and Result

After Hypertuning, the best models are Catboost and XGBoost.
Combining(Avg.) their predictions gives slightly better results.

R2 Score and RMSE

Model	R2 Score	RMSE
Catboost + XGBoost	0.8351	3.811
Catboost	0.8319	3.8476
XGBoost	0.8318	3.8484

Residual Analysis

Homoscedastic Test

Conformal Predicitons

To run the project

git clone https://github.com/mohan-gupta/estimating-delivery-time.git  # clone
cd estimating-delivery-time
pip install -r requirements.txt  # install
cd app
streamlit run streamlit_app.py  #run

To train and save the best models

First follow the approach mentioned above after E.D.A to prepare the data, then run the following command.

cd src
sh run.sh  #run

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
app		app
artifacts		artifacts
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Estimating Delivery Time

Problem Statement

Task

Approach

R2 Score and RMSE

Residual Analysis

Homoscedastic Test

Conformal Predicitons

To run the project

To train and save the best models

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Estimating Delivery Time

Problem Statement

Task

Approach

R2 Score and RMSE

Residual Analysis

Homoscedastic Test

Conformal Predicitons

To run the project

To train and save the best models

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages