DP-203-Lab7

Using Delta Lake with Apache Spark in Azure Synapse Analytics

📊 Overview

Welcome to this project! In this exercise, you will learn how to use Delta Lake with Apache Spark in the context of Azure Synapse Analytics. You will build a modern Lakehouse architecture that can process both batch and streaming data, which is crucial for a scalable and efficient data engineering workflow.

📝 What You Will Do

During this exercise, you will perform the following steps:

🔧 Set Up an Azure Synapse Analytics Workspace

Create Data Lake Storage Gen2: Set up a storage account to hold your data and enable the use of Azure Data Lake Storage Gen2, which is a highly scalable and secure data storage solution.
Set Up Apache Spark Pool: Create an Apache Spark pool, which provides the necessary compute resources to run Spark-based processing jobs.

📁 Explore Data in the Data Lake

Load CSV Files as DataFrames: You will load CSV files into Spark as DataFrames, allowing you to process and transform the data.
Convert Data to Delta Tables: After loading the CSV data, you will convert it into Delta tables, which offer more advanced features like ACID transactions and schema enforcement compared to standard Parquet or CSV formats.

💾 Work with Delta Tables

Update Existing Data: Learn how to perform updates on Delta tables, including modifying records, which is essential for maintaining up-to-date datasets.
Time Travel: Utilize Delta Lake’s time travel feature to query previous versions of data. This allows you to recover older data states or debug issues by comparing different versions.
Create Catalog Tables (External and Managed): You will create external and managed Delta tables in the Synapse catalog. Managed tables are fully managed by Synapse, while external tables link to data stored outside of the system.

📡 Simulate Streaming Data

Process IoT Events with Delta as Sink: Simulate IoT events and write them in real-time to a Delta table, providing an efficient and reliable way to handle streaming data.
Store and Analyze Real-Time Data: You will learn how to store real-time streaming data and perform analysis on it, enabling real-time decision-making processes.

🧠 Use SQL

Query Delta Files Using Serverless SQL Pools: Finally, you will use Synapse’s serverless SQL pool to execute SQL queries directly on Delta tables, without needing to provision dedicated resources, providing flexibility and cost-efficiency.

🧹 Clean Up Resources

After completing the exercise, don’t forget to clean up your resources to avoid unnecessary cost.

🔗 Resources

This exercise is based on the official Microsoft Learn material for the DP-203 certification:

Microsoft Learn GitHub - dp-203-azure-data-engineer

🎯 Key Learning Outcomes

By the end of this project, you will be able to:

Create, manage, and modify Delta tables in Azure Synapse Analytics.
Write streaming data to Delta tables.
Use Delta Time Travel to query historical data versions.
Understand the differences between external and managed Delta tables.
Run SQL queries on Delta files us ing the serverless SQL pool in Synapse.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DP-203-Lab7

Using Delta Lake with Apache Spark in Azure Synapse Analytics

📊 Overview

📝 What You Will Do

🔧 Set Up an Azure Synapse Analytics Workspace

📁 Explore Data in the Data Lake

💾 Work with Delta Tables

📡 Simulate Streaming Data

🧠 Use SQL

🧹 Clean Up Resources

🔗 Resources

🎯 Key Learning Outcomes

Screenshots

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DP-203-Lab7

Using Delta Lake with Apache Spark in Azure Synapse Analytics

📊 Overview

📝 What You Will Do

🔧 Set Up an Azure Synapse Analytics Workspace

📁 Explore Data in the Data Lake

💾 Work with Delta Tables

📡 Simulate Streaming Data

🧠 Use SQL

🧹 Clean Up Resources

🔗 Resources

🎯 Key Learning Outcomes

Screenshots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages