Skip to content

senmer5/DP-203-Lab3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 

Repository files navigation

DP-203-Lab3

๐Ÿ“Š DP-203_Lab3 - Transforming Data with Azure Synapse Serverless SQL Pools

This project is part of my learning journey through the Microsoft DP-203 Data Engineering course. In this lab, I learned how to analyze, transform, and store raw CSV data from a Data Lake as optimized Parquet files using Serverless SQL Pools in Azure Synapse Analytics.


โœจ Lab Objective

The main focus of this exercise was to apply SQL transformations to raw data stored in a Data Lake. By using CETAS (Create External Table As Select), I was able to store the results directly in an external table.


โš™๏ธ Requirements & Tools

  • An active Azure account with sufficient permissions.
  • Azure Synapse Analytics workspace connected to a Data Lake.
  • Basic knowledge of SQL, PowerShell, and Azure Portal.
  • GitHub repository dp-203 for the required setup scripts.

๐Ÿ› ๏ธ Steps Completed

๐Ÿ”น 1. Setup & Source Files

  • Used Azure Cloud Shell (PowerShell) to deploy the lab environment.
  • Explored the CSV files located in the sales/csv folder via Synapse Studio.

๐Ÿ”น 2. Exploring Data with SQL

  • Queried the file contents using OPENROWSET to view the structure.
  • Analyzed sales data including Item, Quantity, OrderDate, UnitPrice, and TaxAmount.

๐Ÿ”น 3. Data Transformation

  • Created a new dedicated database in Synapse Studio.
  • Defined an external data source and file format for Parquet.
  • Wrote a CETAS query to aggregate sales by product.

๐Ÿ”น 4. Custom Additions

  • Added a computed column to extract the sales year (YEAR(OrderDate)).
  • Filtered the data to include only rows with a valid EmailAddress.

๐Ÿ”น 5. Stored Procedure (Optional)

  • Built a stored procedure to automate the transformation process.
  • Results were written into the Data Lake as partitioned Parquet files.

๐Ÿ”น 6. Clean-up

  • Deleted the resource group in Azure Portal to avoid unnecessary charges.

๐Ÿ“š Resources & Inspiration

  • Microsoft Learn - DP-203
  • Azure Synapse Analytics Documentation
  • GitHub Repo: MicrosoftLearning/dp-203-Data-Engineer

๐Ÿง  What I Learned

This lab helped me:

  • Apply the fundamentals of Serverless SQL Pools in a hands-on scenario.
  • Understand the benefits of using Parquet for efficient storage.
  • Perform SQL transformations on semi-structured data.
  • Build data pipelines without relying on a dedicated Spark environment.

Screenshots

1 2 3 4 5 7 8 9

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors