Examples of Apache Flink® v2.1 applications showcasing the DataStream API, Table API in Java and Python, and Flink SQL, featuring AWS, GitHub, Terraform, Streamlit, and Apache Iceberg.
-
Updated
Jan 13, 2026 - Java
Examples of Apache Flink® v2.1 applications showcasing the DataStream API, Table API in Java and Python, and Flink SQL, featuring AWS, GitHub, Terraform, Streamlit, and Apache Iceberg.
Automation framework to catalog AWS data sources using Glue
Smart City Realtime Data Engineering Project
A CLI tool to back up and restore AWS Glue catalog resources such as Databases, Tables, and Connections as JSON files. Useful when you don't have AWS Backup or versioning enabled in your account.
This project repo 📺 offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.
Tool to migrate Delta Lake tables to Apache Iceberg using AWS Glue and S3
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.
Engaging, interactive visualizations crafted with Streamlit, seamlessly powered by Apache Flink in batch mode to reveal deep insights from data.
Prototype of AWS data lake reference implementation written in Python and Spark: https://aws.amazon.com/solutions/implementations/data-lake-solution/
Creating an audit table for a DynamoDB table using CloudTrail, Kinesis Data Stream, Lambda, S3, Glue and Athena and CloudFormation
End-to-end AWS data analytics pipeline for product risk detection and customer dissatisfaction analysis.
This project demonstrates how to use Terraform to enable Tableflow in Kafka to generate and store the Iceberg Table files in an AWS S3 bucket. Then, configure Snowflake to read the Iceberg Tables using AWS Glue Data Catalog and the AWS S3 bucket where Tableflow produces the Iceberg files.
Enterprise track: Step Functions/EventBridge + Glue + data quality on top of the v1 serverless ELT
Working with Glue Data Catalog and Running the Glue Crawler On Demand
Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog
Unveiling job market trends with Scrapy and AWS
🌟 Build a production-lite serverless ELT pipeline on AWS, enabling efficient data ingestion and transformation from S3 to Parquet with minimal overhead.
Developed an ETL pipeline for real-time ingestion of stock market data from the stock-market-data-manage.onrender.com API. Engineered the system to store data in Parquet format for optimized query processing and incorporated data quality checks to ensure accuracy prior to visualization.
End-to-end AWS Data Engineering project for cloud cost monitoring and automated reporting.
Add a description, image, and links to the aws-glue-data-catalog topic page so that developers can more easily learn about it.
To associate your repository with the aws-glue-data-catalog topic, visit your repo's landing page and select "manage topics."