The Machine Learning process is an iterative process that consists of several steps:
- Identifying a business problem and the related Machine Learning problem
- Data ingestion, integration and preparation
- Data visualization and analysis, feature engineering, model training and model evaluation
- Model deployment, model monitoring and debugging
The previous steps are generally repeated multiple times to better meet business goals following to changes in the source data, decrease in the perfomance of the model, etc.
The process can be represented with the following diagram:
After a model has been deployed, we might want to integrate it with our own application to provide insights to end users.
In this workshop we will go through the steps required to build a fully-fledged machine learning application on AWS. We will execute an iteration of the Machine Learning process to build, train and deploy a model using Amazon SageMaker, and then we will deploy a REST inference API with Amazon API Gateway to execute inferences from a web client. Finally we will also look at how to automate the ML workflow and how to implement ML CI/CD.
The final architecture will be:
We are going to use the AI4I 2020 Predictive Maintenance Dataset from the UCI Machine Learning Repository. It is a synthetic dataset that reflects real predictive maintenance data encountered in industry.
The dataset consists of 10000 records and 14 features, representing some measurements that have been collected on the machinery, plus the indication of failure, if any.
⚠️ Note: this is a basic dataset that oversimplifies the Predictive Maintenance task: however, it keeps this workshop very easy to execute while well representative of the various steps of the ML workflow.
Our goal is building a simple Machine Learning model that will predict whether the machinery is going to fail (Predictive Maintenance).
Following is an excerpt from the dataset:
| UDI | Product ID | Type | Air temperature [K] | Process temperature [K] | ... | Machine failure |
|---|---|---|---|---|---|---|
| 1 | M14860 | M | 298.1 | 308.6 | ... | 0 |
| 2 | L47181 | L | 298.2 | 308.7 | ... | 0 |
| 3 | L47182 | L | 298.1 | 308.5 | ... | 0 |
| 51 | L47230 | L | 298.9 | 309.1 | ... | 1 |
The target variable is the Machine failure attribute, which is binary and suggests implementing a binary classification model.
After building the model, we can host it and expose as a REST API that will respond to inference requests from client-side applications.
This workshops consists of seven modules:
- Module 01 - Open Amazon SageMaker Studio and clone the repository.
- Module 02 - Using Amazon SageMaker Studio Notebooks and standard Python libraries to execute data exploration, and then data preprocessing and feature engineering using Amazon SageMaker Processing and SKLearn. [Optional] Use AWS Glue and Amazon Athena for data exploration.
- Module 03 - Training a binary classification model with the Amazon SageMaker open-source XGBoost container; the model will predict whether the machinery is going to fail. [Optional] Use Sagemaker Debugger to monitor training progress with rules and visualize training metrics like accuracy and feature importance.
- Module 04 - Deploying the feature engineering and ML models as a pipeline using Amazon SageMaker hosting (inference pipelines). [Optional] Use Sagemaker Model Monitor to track data drift violations against the training data baseline.
- Module 05 - Buiding a REST API using Amazon API Gateway and implementing an AWS Lambda function that will invoke the Amazon SageMaker endpoint for inference.
- Module 06 - Using a web client to invoke the REST API and get inferences.
- Module 07 - Use Amazon SageMaker Pipelines to orchestrate the model build workflow and store models in model registry.
- Module 08 - Use Amazon SageMaker Projects to enable ML CI/CD. At the time being, only the code to add to the default model build and model deploy repositories is provided, based on the default project template for model building, training and deployment. Additional info: https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates-sm.html
You must comply with the order of modules, since the outputs of a module are inputs of the following one.
This workshop has been designed assuming that each participant is using an AWS account that has been provided and pre-configured by the workshop instructor(s). However, you can also choose to use your own AWS account, but you'll have to execute some preliminary configuration steps as described here.
Once you are ready to go, please start with Module 01.
The contents of this workshop are licensed under the Apache 2.0 License.
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Giuseppe A. Porcelli - Principal, ML Specialist Solutions Architect - Amazon Web Services EMEA
Antonio Duma - Sr. Startup Solutions Architect - Amazon Web Services EMEA
Hasan Poonawala - ML Specialist Solution Architect - Amazon Web Services EMEA

