GitHub - Evusma/Databricks_DE_associate

In the repo, there are some questions and notes that helped me for the preparation of the Databricks Data Engineer Associate Certification.

The questions have been taken from different websites and some answers might be wrong. Copilot, ChatGPT and Databricks AI assistant have been used to explain the answers of some questions, and the information of my the notes in the .md documents. I cannot guarantee that it's 100% correct (at the end of the day, AI assistants' responses are just probability, right?).

The questions are in the document questions_databricks_DE_associate.md, the notes of the other .md document are about:

Auto Loader.md

Auto Loader
Benefits
Handling Data Inconsistencies
Options of Auto Loader
stream vs batch tables
Checkpoint
DataStreamWriter.trigger
Summary Auto Loader:

clusters.md TO DO

delta_table_vs_delta_live_tables.md Difference between Delta Tables and Delta Live Tables

Delta Tables
Delta Live Tables
Delta Table Example (Storage Layer)
Delta Live Tables example (Pipeline Framework)
The bronze table of DLT
Stream vs batch
Batch mode in DLT

dlt_vs_workflows.md

From June 2025, Databricks introduced new naming for both DLT and Jobs/Workflows.
Difference between Databricks DLT (Delta Live Tables) and Workflows.
Delta Live Tables (DLT)
Databricks Workflows
Key Difference
Example Use Case
Languages Supported in DLT
Languages Supported in Databricks Workflows
Databricks UI
Jobs vs Pipelines

other_topics

Reason to restart cluster in a databricks notebook which is using multiple languages
%run magic command
How does the concept of a metastore contribute to data governance in databricks
Deduplicate rows
Databricks runtime version
DBFS
transaction log
VACUUM
Z-ordering
Service principal
Time travel feature

streaming_vs_batch_delta_tables.md

Stream vs Batch Tables
Example Python
Example SQL
spark.read.format vs spark.read.table
Stream and batch outside DLT
Outside DLT: Plain Spark / Structured Streaming
Using them (Plain Spark / Structured Streaming) in Workflows (Jobs)
Difference (Plain Spark / Structured Streaming) vs DLT
Structured Streaming job
Example: Python Structured Streaming Job
Example: SQL Structured Streaming Job
spark.table() vs spark.read.table()
spark.table() vs spark.readStream.format("delta").table("sales")
spark.readStream.format("cloudFiles")

volumes.md

Volume
Files in the volume
Files outside the volume
How querying a volume file directly compares with creating a Delta table
Updating a Delta table created from volume files or from files in a external cloud storage
Managed Delta table from CSV in a Volume or from CSV in an external storage
External Delta table from CSV in an external storage or from CSV in a Volume
External table
USING DELTA vs USING CSV
External table and Databricks catalog

Websites:

Other websites:

Website to learn Spark: sparkbyexamples

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
images		images
practices		practices
.gitignore		.gitignore
Auto Loader.md		Auto Loader.md
clusters.md		clusters.md
delta_tables_vs_lakeflow_declarative_pipelines.md		delta_tables_vs_lakeflow_declarative_pipelines.md
external_DB_JDBC.md		external_DB_JDBC.md
lakeflow_declarative_pipelines_vs_workflows.md		lakeflow_declarative_pipelines_vs_workflows.md
other topics.md		other topics.md
questions_databricks_DE_associate.md		questions_databricks_DE_associate.md
readme.md		readme.md
streaming_query_windows.md		streaming_query_windows.md
streaming_vs_batch_delta_tables.md		streaming_vs_batch_delta_tables.md
volumes.md		volumes.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages