Skip to content

Add Auto Loader reference to SDP skill#539

Open
jralfonsog wants to merge 1 commit into
databricks-solutions:mainfrom
jralfonsog:feat/sdp-auto-loader-reference
Open

Add Auto Loader reference to SDP skill#539
jralfonsog wants to merge 1 commit into
databricks-solutions:mainfrom
jralfonsog:feat/sdp-auto-loader-reference

Conversation

@jralfonsog
Copy link
Copy Markdown
Collaborator

Summary

Consolidates Auto Loader fundamentals into a single reference file in the SDP skill, with a pointer from the Structured Streaming skill for non-SDP users. Follows @QuentinAmbard's call in #ai-dev-kit-team (standalone Auto Loader skill was overkill; fundamentals belong as an SDP reference).

  • CREATE databricks-skills/databricks-spark-declarative-pipelines/references/auto-loader.md (957 lines) — consolidated from an earlier 8-file draft. Default examples assume SDP context (managed checkpoints, managed schema location). Includes a "raw Structured Streaming" quick-start for non-SDP usage with the explicit checkpoint-and-schema-path caveat.
  • UPDATE databricks-skills/databricks-spark-declarative-pipelines/references/python/2-ingestion.md — intro touch-up + pointer to ../auto-loader.md for the deep dive.
  • UPDATE databricks-skills/databricks-spark-structured-streaming/streaming-best-practices.md (section 2) — replaces the inline Auto Loader snippet with a pointer to the canonical SDP reference, preserving the non-SDP checkpoint caveat at the call site.

Why

Auto Loader is fundamentally an ingestion pattern, and ingestion is owned by the SDP skill. The previous structure (standalone draft) duplicated coverage and gave no obvious home for the SDP-vs-raw-streaming distinction. Single canonical reference + skill-level pointer keeps the content discoverable from both entry points without duplication.

Routing

Targeting main per the "send to main, we'll cherry-pick to experimental" pattern @calreynolds and @QuentinAmbard established on #498. Happy to rebase if a different routing is preferred.

Test plan

  • auto-loader.md renders cleanly (957 lines, sections: Quick start, SDP defaults, Raw Structured Streaming, Schema evolution, File listing modes, Backfill, Operational gotchas, Troubleshooting).
  • Internal links in 2-ingestion.md and streaming-best-practices.md resolve to ../auto-loader.md correctly.
  • Default SDP examples don't reference checkpointLocation / cloudFiles.schemaLocation (managed by SDP); the raw-streaming quick-start does.
  • Maintainer review of content fidelity and reference structure.

This pull request was AI-assisted by Isaac.

Consolidates Auto Loader fundamentals into a single reference file in
the SDP skill. Default examples assume SDP context (managed checkpoints
and schema location); a quick-start for raw Structured Streaming is
included for non-SDP usage. Adds a pointer from the structured
streaming skill so non-SDP users can find it.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant