Databricks Data Engineer Associate Certification Lab

Important

This repo is archived — now part of `databricks-code-practice`

All hands-on Databricks labs now live in one unified repo: jrlasak/databricks-code-practice.

This lab's new home: https://github.com/jrlasak/databricks-code-practice/tree/main/pipeline-labs/de-associate-cert-prep

Clone the new repo: git clone https://github.com/jrlasak/databricks-code-practice

The content below still works and is preserved for anyone with existing clones or links, but future updates land only in the new unified repo.

Databricks Data Engineer Associate Certification Lab

Independent educational resource; not endorsed by Databricks, Inc. "Databricks" and "Delta Lake" are trademarks of their respective owners.

Author

Jakub Lasak — Helping you interview like seniors, execute like seniors, and think like seniors.

🔗 LinkedIn - Databricks projects and tips
📬 Substack Newsletter - Exclusive content for Data Engineers
🌐 DataEngineer.wiki - Training materials and resources
🚀 More Practice Labs - Delta Live Tables, table optimization, and more

Looking for more info on passing the Databricks Data Engineer Associate Certification exam?
Check out helpful resources including YouTube videos and official Databricks courses at dataengineer.wiki/certifications/data-engineer-associate.

High-Level Overview

This lab provides hands-on practice to prepare for the Databricks Certified Data Engineer Associate exam. You will build a production-grade, end-to-end data pipeline using real-world scenarios and datasets. The exercises are designed to be completed within the Databricks Free Community Edition, allowing you to develop practical skills without any cost.

The lab covers the entire data engineering lifecycle, including:

Ingesting raw data from various sources using Auto Loader and COPY INTO.
Implementing the Medallion Architecture (Bronze, Silver, Gold layers).
Performing data transformations, quality checks, and implementing SCD Type 2.
Orchestrating workflows with Databricks Jobs and multi-task dependencies.
Managing data governance using Unity Catalog with role-based access control.

How It Prepares You for the Exam

This lab is structured to cover the key topics outlined in the official Databricks Data Engineer Associate exam info. By completing the notebooks, you will gain practical experience in the following areas:

Section 1: Databricks Intelligence Platform

What you'll practice: Enabling features that simplify data layout decisions, understanding the value of the Data Intelligence Platform, and identifying the applicable compute for specific use cases.

Section 2: Development and Ingestion

What you'll practice: Leveraging Notebooks functionality, working with Auto Loader from various sources (JSON, CSV), using COPY INTO for batch incremental loads, and handling schema evolution.

Section 3: Data Processing & Transformations

What you'll practice: Implementing the three layers of the Medallion Architecture (Bronze, Silver, Gold), performing data quality transformations, implementing Slowly Changing Dimensions (SCD Type 2), and computing complex aggregations with PySpark window functions.

Section 4: Productionizing Data Pipelines

What you'll practice: Creating and configuring Databricks Jobs, implementing multi-task workflows with dependencies, using job parameters with widgets, implementing error handling and retry logic, and using serverless compute.

Section 5: Data Governance & Quality

What you'll practice: Understanding Unity Catalog's three-level namespace (catalog.schema.table), creating and managing catalogs, schemas, and volumes, creating managed tables, implementing access control with GRANT and REVOKE statements, and data quality validation patterns.

Additional Exam Topics

This lab focuses on core data engineering workflows. If you want to practice additional exam topics such as Delta Live Tables (DLT) pipelines or Delta table optimization techniques, check out my other hands-on labs at dataengineer.wiki/projects.

Get 50% Off Your Certification

Register for a "Virtual Learning Festival", complete the required courses in the timeline provided to automatically receive 50% off the certification.

Look for upcoming Databricks Festival here - https://community.databricks.com/t5/events/eb-p/databricks-community-events

How to Start

Create a Databricks Account
- Sign up for a Databricks Free Edition account if you don't already have one.
- Familiarize yourself with the workspace, clusters, and notebook interface.
Import this repository to Databricks
- In Databricks, go to the Workspace sidebar and click the "Repos" section, click "Add Repo".
  - Alternatively, go to your personal folder, click "create" and select "git folder".
- Paste the GitHub URL for this repository.
- Authenticate with GitHub if prompted, and select the main branch.
- The repo will appear as a folder in your workspace, allowing you to edit, run notebooks, and manage files directly from Databricks.
- For more details, see the official Databricks documentation: Repos in Databricks.
Open the notebooks/ folder and run 00_Setup_Environment.py to create the Unity Catalog infrastructure and generate data.
Follow the numbered notebooks (01 to 06) to build the pipeline.

Each exercise includes a TODO area for your code and a commented-out solution for verification.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This repo is archived — now part of `databricks-code-practice`

Databricks Data Engineer Associate Certification Lab

Author

High-Level Overview

How It Prepares You for the Exam

Section 1: Databricks Intelligence Platform

Section 2: Development and Ingestion

Section 3: Data Processing & Transformations

Section 4: Productionizing Data Pipelines

Section 5: Data Governance & Quality

Additional Exam Topics

Get 50% Off Your Certification

How to Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

This repo is archived — now part of databricks-code-practice

Databricks Data Engineer Associate Certification Lab

Author

High-Level Overview

How It Prepares You for the Exam

Section 1: Databricks Intelligence Platform

Section 2: Development and Ingestion

Section 3: Data Processing & Transformations

Section 4: Productionizing Data Pipelines

Section 5: Data Governance & Quality

Additional Exam Topics

Get 50% Off Your Certification

How to Start

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

This repo is archived — now part of `databricks-code-practice`

Packages