DP-203-Lab3

📊 DP-203_Lab3 - Transforming Data with Azure Synapse Serverless SQL Pools

This project is part of my learning journey through the Microsoft DP-203 Data Engineering course. In this lab, I learned how to analyze, transform, and store raw CSV data from a Data Lake as optimized Parquet files using Serverless SQL Pools in Azure Synapse Analytics.

✨ Lab Objective

The main focus of this exercise was to apply SQL transformations to raw data stored in a Data Lake. By using CETAS (Create External Table As Select), I was able to store the results directly in an external table.

⚙️ Requirements & Tools

An active Azure account with sufficient permissions.
Azure Synapse Analytics workspace connected to a Data Lake.
Basic knowledge of SQL, PowerShell, and Azure Portal.
GitHub repository dp-203 for the required setup scripts.

🛠️ Steps Completed

🔹 1. Setup & Source Files

Used Azure Cloud Shell (PowerShell) to deploy the lab environment.
Explored the CSV files located in the sales/csv folder via Synapse Studio.

🔹 2. Exploring Data with SQL

Queried the file contents using OPENROWSET to view the structure.
Analyzed sales data including Item, Quantity, OrderDate, UnitPrice, and TaxAmount.

🔹 3. Data Transformation

Created a new dedicated database in Synapse Studio.
Defined an external data source and file format for Parquet.
Wrote a CETAS query to aggregate sales by product.

🔹 4. Custom Additions

Added a computed column to extract the sales year (YEAR(OrderDate)).
Filtered the data to include only rows with a valid EmailAddress.

🔹 5. Stored Procedure (Optional)

Built a stored procedure to automate the transformation process.
Results were written into the Data Lake as partitioned Parquet files.

🔹 6. Clean-up

Deleted the resource group in Azure Portal to avoid unnecessary charges.

📚 Resources & Inspiration

Microsoft Learn - DP-203
Azure Synapse Analytics Documentation
GitHub Repo: MicrosoftLearning/dp-203-Data-Engineer

🧠 What I Learned

This lab helped me:

Apply the fundamentals of Serverless SQL Pools in a hands-on scenario.
Understand the benefits of using Parquet for efficient storage.
Perform SQL transformations on semi-structured data.
Build data pipelines without relying on a dedicated Spark environment.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DP-203-Lab3

📊 DP-203_Lab3 - Transforming Data with Azure Synapse Serverless SQL Pools

✨ Lab Objective

⚙️ Requirements & Tools

🛠️ Steps Completed

🔹 1. Setup & Source Files

🔹 2. Exploring Data with SQL

🔹 3. Data Transformation

🔹 4. Custom Additions

🔹 5. Stored Procedure (Optional)

🔹 6. Clean-up

📚 Resources & Inspiration

🧠 What I Learned

Screenshots

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DP-203-Lab3

📊 DP-203_Lab3 - Transforming Data with Azure Synapse Serverless SQL Pools

✨ Lab Objective

⚙️ Requirements & Tools

🛠️ Steps Completed

🔹 1. Setup & Source Files

🔹 2. Exploring Data with SQL

🔹 3. Data Transformation

🔹 4. Custom Additions

🔹 5. Stored Procedure (Optional)

🔹 6. Clean-up

📚 Resources & Inspiration

🧠 What I Learned

Screenshots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages