This project demos a full end-to-end near real-time:
- ingestion of data
- data ETL
- model training & validation
- model deployment
- model monitoring
It captures in near real-time blockchain transactions data and by default computes, per minute, the amount of average transaction fees. These data are store in SageMaker Feature Store. An MLOps pipeline, uses the Amazon SageMaker DeepAR forecasting algorithm to train a forecating model predicting the average transaction fees with a prediction window of 30 data points (in our case 30 minutes). Although it might be irrelevant, data are aggregated and predicted using a 1 minute window in order to quickly gather enough data, get results and quickls see the MLOps pipeline automation in action. Once enough data have been captured and a first model trained, you will be able to see the model being deployed being a SageMaker API endpoint and resources being provisioned to monitor the model. If the model performance alarm threshold is breached, you will see alarms in the dashboard and the model training pipeline being automatically triggered to retrain a new model based on the lastest ingested data, thus, fully atuomating the training, deployment and monitoring lifecycle of the model.
Warning
Since the creation of this demo, AWS has deprecated the CodeCommit service which is used by the MLOps project. It is not possible anymore to create a CodeCommit repositories. The MLOps project will not work as is. An issue has been opened to update this demo: Issue #63
For this project, we decided to ingest blockchain transactions from the blockchain.com API (see documentation here). We focus on 3 simple metrics:
- The total number of transactions
- The total amount of transaction fees
- The average amount of transaction fees
These metrics are computed per minute. Although it might not be the best window period to analyze blockchain transactions, it allows us to quickly gather a lot of data points in a short period of time, avoiding to run the demo for too long which has an impact on the AWS billing.
At a high level, the data are ingested as follow:
- A container running on AWS Fargate polls the data and writes them to EventBridge.
- EventBridge sends the data to a Lambda Function which writes filtered data into a Kinesis Data Stream.
- A Managed Service for Apache Flink reads the data from the Kinesis Data Stream and aggregates the data using a 1 minute tumbling window and puts the results into a delivery Kinesis Data Stream.
- A Lambda Funtion reads the aggregated data from the delivery Kinesis Data Stream and writes them into SageMaker Feature Store.
See this documentation for more details.
The MLOps project contains 3 CodeCommit repositories with their own CodePipeline pipelines to train, deploy and monitor the model.
- The Model Build pipeline creates a SageMaker Pipeline orchestrating all the steps to train a model and, if it passes the validation threshold, registers the model, which then has to be manually approved.
- If the registered model is approved, the Model Deploy pipeline is automatically triggered and deploys the model behind 2 SageMaker API Endpoints, one for the staging environment and one for the production environment.
- If the new model baseline performance is better than the existing one, we update the SSM Parameter storing the model monitoring threshold with the new model performance. When the new model monitoring is deployed, the alarm threshold will be updated from the SSM Parameter value. Note that in a real world scenario you might not want to update the model monitoring threshold like this, as it might be dictated by business criterias. We do this in this demo to test automated retraining and improvement of the model.
- Once the endpoints are IN_SERVICE, it triggers automatically the Model Monitor pipeline, which deploys resources to monitor the pipeline
- Every hour, a SageMaker Pipeline is executed to test the model against the latest data. The performance of the model is tracked by the SageMaker Monitoring service and a custom CloudWatch Alarm. We use a custom CloudWatch Alarm because, as of now
- AWS does not support monitoring for custom metrics (like the weighted quantile loss we use for the DeepAR model) and,
- AWS does not provide any built-in mechanism to capture the alarms raised by the SageMaker Monitoring service when a model performance is breached, in order to perform automatic retraining of the model.
- If our custom metric is breached, the CloudWatch Alarm will trigger a Lambda Function, which will trigger the Model Build pipeline, retraining a new model, looping back automatically to the step one of this MLOps pipeline.
The list below is for Windows environment
- The AWS CLI (documentation)
- Docker Desktop (Documentation)
- NPM and Node.js (Documentation)
- The AWS CDK:
npm install -g aws-cdk - Configure your programmatic access to AWS for the CLI (see instructions here).
Most of the architecture is fully automated through the CDK. You do not have much to configure and can directly play with the application and watch it ingest and aggregate the data in near real time. Once you have ingested some data you can start to train a forecasting model by running the corresponding pipeline and see the entire MLOps process automation.
- Deploy the Environment & MLOps Pipelines
- The Data Ingestion
- The MLOps Pipeline
- Delete the Entire Project
This demo deploys many services (e.g. Fargate, DynamoDB, 2xKinesis Data Streams, Kinesis Firehose, Managed Service for Apache Flink, SageMaker endpoints...) and must be run for several days to collect enough data to be able to start training a model and see the model being retrained. This demo do generate costs which could be expensive, depending on your budget. The full demo costs about $480 per month in the Ireland region.


