This project uses Terraform to construct a architecture that consumes, cleans and stoesdata from the Punk API, cl. Besides that, it provides a machine learning model that could be accessed remotely to predict the IBU (International Bitterness Units) of a beer via an AWS Lambda.
To build this project, you must have terraform installed and an AWS account.
Once you have both configured, it is possible to run the project. First of all, clone the repo and open it:
git clone https://github.com/joaorobson/aws_beer_classification.git
cd aws_beer_classification
Create a Python env to install some dependencies:
python3.9 -m venv env
source env/bin/activate
pip install -r notebooks/requirements.txt
Before building the main architeture, it is necessary to create a Lambda that will be responsible to load a pre-trained model from a S3 bucket and make predictions remotely. This step was done using container images, given the memory limitations imposed by AWS regarding .zip deployment packages.
This can be done with the folowing steps:
- Set some env variables:
export AWS_REGION=us-west-2
export BUCKET_NAME="beers-linear-regressor"
export IMAGE_NAME="ibu_prediction_image"
export IMAGE_TAG="latest"
- Create the ECR repository to store the generated image:
terraform apply -target=aws_ecr_repository.ibu_prediction_repository
- Set the REGISTRY_ID AND IMAGE_URI env variables:
export REGISTRY_ID=$(aws ecr \
describe-repositories \
--query 'repositories[?repositoryName == `'$IMAGE_NAME'`].registryId' \
--output text)
export IMAGE_URI=${REGISTRY_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${IMAGE_NAME}
- Authenticate the docker client to the ECR registry using you AWS account id:
aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin [aws_account_id].dkr.ecr.$AWS_REGION.amazonaws.com
- Build and push the docker image:
cd code/model/
docker build -t $IMAGE_URI .
docker push $IMAGE_URI:$IMAGE_TAG
NOTE: Currently, the lambda function will not work properly, because it depends of a model version stored at the S3 bucket. To make it work, follow the commands in the next sections.
After that, to build the architeture in AWS, in the root directoryof the project, run:
terraform apply
This command create all the resources used by the project. The comportament is rather basic: every 5 minutes, a new beer record is retrieved and store in S3 buckets, one with the raw data and another with a cleaned version. With that, the cleaned data bucket can be used to train a machine learning model locally, which is exemplified by this notebook.
Now, it is possible to train a model given the data collected and stored by the architecture. To do that, run the notebook located here:
./env/bin/jupyter notebook
After run it, it will be possible to make a prediction via the Lambda created early using the notebook itself or via CLI:
cd notebooks
./invoke_predict_ibu.sh
- Terraform Docs - AWS Provider
- The most minimal AWS Lambda + Python + Terraform setup
- DEPLOYING AWS LAMBDA FUNCTIONS WITH TERRAFORM
- Building Lambda Functions with Terraform
- Building a serverless, containerized machine learning model API using AWS Lambda & API Gateway and Terraform
- Amazon ECR - Pushing a Docker image
- Setting Crawler Configuration Options