Add Auto Loader reference to SDP skill by jralfonsog · Pull Request #539 · databricks-solutions/ai-dev-kit

jralfonsog · 2026-05-19T11:40:02Z

Summary

Consolidates Auto Loader fundamentals into a single reference file in the SDP skill, with a pointer from the Structured Streaming skill for non-SDP users. Follows @QuentinAmbard's call in #ai-dev-kit-team (standalone Auto Loader skill was overkill; fundamentals belong as an SDP reference).

CREATE databricks-skills/databricks-spark-declarative-pipelines/references/auto-loader.md (957 lines) — consolidated from an earlier 8-file draft. Default examples assume SDP context (managed checkpoints, managed schema location). Includes a "raw Structured Streaming" quick-start for non-SDP usage with the explicit checkpoint-and-schema-path caveat.
UPDATE databricks-skills/databricks-spark-declarative-pipelines/references/python/2-ingestion.md — intro touch-up + pointer to ../auto-loader.md for the deep dive.
UPDATE databricks-skills/databricks-spark-structured-streaming/streaming-best-practices.md (section 2) — replaces the inline Auto Loader snippet with a pointer to the canonical SDP reference, preserving the non-SDP checkpoint caveat at the call site.

Why

Auto Loader is fundamentally an ingestion pattern, and ingestion is owned by the SDP skill. The previous structure (standalone draft) duplicated coverage and gave no obvious home for the SDP-vs-raw-streaming distinction. Single canonical reference + skill-level pointer keeps the content discoverable from both entry points without duplication.

Routing

Targeting main per the "send to main, we'll cherry-pick to experimental" pattern @calreynolds and @QuentinAmbard established on #498. Happy to rebase if a different routing is preferred.

Test plan

auto-loader.md renders cleanly (957 lines, sections: Quick start, SDP defaults, Raw Structured Streaming, Schema evolution, File listing modes, Backfill, Operational gotchas, Troubleshooting).
Internal links in 2-ingestion.md and streaming-best-practices.md resolve to ../auto-loader.md correctly.
Default SDP examples don't reference checkpointLocation / cloudFiles.schemaLocation (managed by SDP); the raw-streaming quick-start does.
Maintainer review of content fidelity and reference structure.

This pull request was AI-assisted by Isaac.

Consolidates Auto Loader fundamentals into a single reference file in the SDP skill. Default examples assume SDP context (managed checkpoints and schema location); a quick-start for raw Structured Streaming is included for non-SDP usage. Adds a pointer from the structured streaming skill so non-SDP users can find it. Co-authored-by: Isaac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Auto Loader reference to SDP skill#539

Add Auto Loader reference to SDP skill#539
jralfonsog wants to merge 1 commit into
databricks-solutions:mainfrom
jralfonsog:feat/sdp-auto-loader-reference

jralfonsog commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jralfonsog commented May 19, 2026

Summary

Why

Routing

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant