This repository is the submission for the AI challenge for S.E.A hosted by Grab. The selected challenge is Safety. The challenge can be found via this website.
by: Satsawat Natakarnkitkul (Net)
Email: n.satsawat@gmail.com
Country: Thailand
Motivation: This challenge is very interesting in so many ways, but as I use Grab nearly everyday. Hence this challenge is the most impact to the Grab users.
- The notebook
Grab AI Challenge_Safety_Data Exploration.ipynbis mainly used as part of data understanding and EDA for sensor data provided by Grab. You may not run this notebook, but it will provide some understandings and explanation onto telemetry data of the sensor world. - The notebook
Grab AI Challenge_ML model comparison.ipynbis purposely created to train and test ML techniques to produce the final model as well as try on feature engineering and other data transformation. - The notebook
Grab AI Challenge_GridSearchCV and Feature Engineering.ipynbshows the grid search for XGBoost algorithm; with the current model and feature engineering, it achieved the AUC score of 0.5945.
- This folder contains the final model object to be used for prediction.
- This folder contains the final python source code for manipulating, creating new features and predicting the data set.
- This folder contains the image embedded onto EDA and other notebooks.
- This folder contains the bookingID, predicted class, and probability of the prediction, this is the outcome from running the
pyscript.
The model is used to predict the safety of the trip as such the assumption is that this is not the real time prediction (online), but rather an offline (data for each booking ID is available). The transformation is the aggregation of each booking ID onto single observations and feed into the model for prediction.
To run the prediction, please use Safety_Prediction.py in the code directory.
- The script in
codefolder will read in the feature data file withindata/safety/featuresfolder.- If there's any change in the data path, please adjust the
DATA_DIRonto the correct folder respectively.
- If there's any change in the data path, please adjust the
- The script will automatically run the feature transformation and engineering.
- The script will load the XGBoost model object from
modeldirectory to make a prediction. - The script will save the prediction with bookingID onto
../output/all_prediction.csvfile. - If
LABEL_IND = Truein the script, it will attempt to run evaluation between the prediction with true label.- The true label file should be in the
data/safety/labelsfolder with the proper bookingID and label columns. - If there's any change to
labelsfolder, please adjust this toLABEL_DIRin the script respectively. - If the evaluation is not neeeded, you can turn this off by setting
LABEL_IND = Falsein the script.
- The true label file should be in the