Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
-
Updated
Jan 19, 2023 - Jupyter Notebook
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
This project aims to build a Retrieval-Augmented Generation (RAG) engine to provide context-aware recommendations based on user queries.
Builds a Spark Standalone Cluster on Docker in local with MinIO integration
Quick look into Delta Table that underpin Delta Lake
This project implements my master’s thesis on building a scalable, ACID-compliant data lakehouse architecture for IoT and industrial workloads, in a AWS-native environment.
Quick look into Iceberg Table that underpin Iceberg Data Lake
Quick look into Hudi Table that underpin Hudi Data Lake
🚀 Automate nightly builds of MinIO Community Edition binaries and Docker images for easy access to the latest releases.
Add a description, image, and links to the open-table-format topic page so that developers can more easily learn about it.
To associate your repository with the open-table-format topic, visit your repo's landing page and select "manage topics."