Skip to content

Evusma/Databricks_DE_associate

Repository files navigation

In the repo, there are some questions and notes that helped me for the preparation of the Databricks Data Engineer Associate Certification.

The questions have been taken from different websites and some answers might be wrong. Copilot, ChatGPT and Databricks AI assistant have been used to explain the answers of some questions, and the information of my the notes in the .md documents. I cannot guarantee that it's 100% correct (at the end of the day, AI assistants' responses are just probability, right?).

The questions are in the document questions_databricks_DE_associate.md, the notes of the other .md document are about:

Auto Loader.md

  • Auto Loader
  • Benefits
  • Handling Data Inconsistencies
  • Options of Auto Loader
  • stream vs batch tables
  • Checkpoint
  • DataStreamWriter.trigger
  • Summary Auto Loader:

clusters.md TO DO

delta_table_vs_delta_live_tables.md Difference between Delta Tables and Delta Live Tables

  • Delta Tables
  • Delta Live Tables
  • Delta Table Example (Storage Layer)
  • Delta Live Tables example (Pipeline Framework)
  • The bronze table of DLT
  • Stream vs batch
  • Batch mode in DLT

dlt_vs_workflows.md

  • From June 2025, Databricks introduced new naming for both DLT and Jobs/Workflows.
  • Difference between Databricks DLT (Delta Live Tables) and Workflows.
  • Delta Live Tables (DLT)
  • Databricks Workflows
  • Key Difference
  • Example Use Case
  • Languages Supported in DLT
  • Languages Supported in Databricks Workflows
  • Databricks UI
  • Jobs vs Pipelines

other_topics

  • Reason to restart cluster in a databricks notebook which is using multiple languages
  • %run magic command
  • How does the concept of a metastore contribute to data governance in databricks
  • Deduplicate rows
  • Databricks runtime version
  • DBFS
  • transaction log
  • VACUUM
  • Z-ordering
  • Service principal
  • Time travel feature

streaming_vs_batch_delta_tables.md

  • Stream vs Batch Tables
  • Example Python
  • Example SQL
  • spark.read.format vs spark.read.table
  • Stream and batch outside DLT
  • Outside DLT: Plain Spark / Structured Streaming
  • Using them (Plain Spark / Structured Streaming) in Workflows (Jobs)
  • Difference (Plain Spark / Structured Streaming) vs DLT
  • Structured Streaming job
  • Example: Python Structured Streaming Job
  • Example: SQL Structured Streaming Job
  • spark.table() vs spark.read.table()
  • spark.table() vs spark.readStream.format("delta").table("sales")
  • spark.readStream.format("cloudFiles")

volumes.md

  • Volume
  • Files in the volume
  • Files outside the volume
  • How querying a volume file directly compares with creating a Delta table
  • Updating a Delta table created from volume files or from files in a external cloud storage
  • Managed Delta table from CSV in a Volume or from CSV in an external storage
  • External Delta table from CSV in an external storage or from CSV in a Volume
  • External table
  • USING DELTA vs USING CSV
  • External table and Databricks catalog

Websites:

Other websites:

Website to learn Spark: sparkbyexamples

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors