Set up an event-driven data pipeline using Amazon S3, Lambda, DynamoDB, and the API Gateway. The aims of this solution are:
- New data arrives in an S3 bucket. This is a JSON file with an expected format.
- This triggers a Lambda function that ingests the data, deserializes the JSON, and inserts it into a DynamoDB table.
- Finally, all data is made available via a
GETrequest from a public API.
This assumes that users have an active set of AWS credentials installed locally. Be sure that your AWS command-line keys correlate to the account you use to sign into the AWS Console.
GitHub Codespaces can be used for this project, but the AWS CLI and your keys must be installed:
$ sudo pip install awscli
Then configure the CLI with your keys with this command:
$ aws configure
This stack creates:
- An S3 bucket for new data files.
- An S3 bucket policy that allows the instructor account to
PUTnew objects in student buckets. - A DynamoDB table with partition and sort keys.
The bucket and Dynamo table will be joined by a Lambda function that you will author and publish using Chalice. The files in this repository will get you started.
- Launch the resource stack using the button above.
- Fork and clone this repository to your local computer.
- Create a virtual environment and install dependencies from the
requirements-dev.txtfile. - Finish the logic of
app.pyand then usechalice deployto iterate on versions of your code.
You can use the test-access.py file to emulate data files arriving in your bucket.
Be sure to delete your CloudFormation stack when indicated to by the instructor.

