Skip to content

Expand Structured Streaming SKILL.md from table of contents to actionable hub#212

Open
CheeYuTan wants to merge 1 commit intodatabricks-solutions:mainfrom
CheeYuTan:feat/structured-streaming-skill-hub
Open

Expand Structured Streaming SKILL.md from table of contents to actionable hub#212
CheeYuTan wants to merge 1 commit intodatabricks-solutions:mainfrom
CheeYuTan:feat/structured-streaming-skill-hub

Conversation

@CheeYuTan
Copy link
Contributor

Summary

The Structured Streaming SKILL.md was only 66 lines — just a table of contents despite having 8 good reference files. Expanded to 247 lines with actionable content:

  • Quick starts: Kafka-to-Delta, foreachBatch MERGE, availableNow scheduled streaming
  • Trigger selection guide: realTime (DBR 16.4+), processingTime, availableNow, once
  • Watermark essentials with duration selection table
  • Stream join patterns: stream-stream with time bounds, stream-static with broadcast hints
  • Checkpoint best practices summary
  • Production checklist and common issues troubleshooting table

No new files — just an expanded SKILL.md that makes the existing reference files more discoverable and useful.

Test plan

  • All trigger syntax verified against PySpark docs
  • No install_skills.sh changes needed (no new files)
  • Cross-referenced with existing reference files for consistency

…able hub

The SKILL.md was 66 lines — just a table of contents despite having 8
good reference files. Expanded to 247 lines with:

- Actionable quick starts: Kafka-to-Delta, foreachBatch MERGE, availableNow
- Trigger selection guide (realTime, processingTime, availableNow)
- Watermark essentials with duration selection table
- Stream join patterns (stream-stream with time bounds, stream-static with broadcast)
- Checkpoint best practices summary
- Production checklist
- Common issues troubleshooting table
@CheeYuTan
Copy link
Contributor Author

Test Results

Test Status Details
CI Validation (validate_skills.py) PASS All 26 skills validated
SKILL.md Frontmatter PASS name: 40 chars, lowercase+hyphens. description: 189 chars, no XML
Reference File Cross-links PASS All 9 existing reference files linked in SKILL.md
Trigger Syntax PASS processingTime="30 seconds", availableNow=True, realTime="5 minutes" (DBR 16.4+) — all correct
Watermark Syntax PASS .withWatermark("event_time", "10 minutes") — correct
foreachBatch MERGE PASS DeltaTable.forName().merge().whenMatchedUpdateAll().execute() — correct
Stream-Stream Join PASS Uses expr() with SQL INTERVAL for time bounds — correct
install_skills.sh PASS No changes needed (no new files added)

What Changed

SKILL.md expanded from 66 → 247 lines. Was just a table of contents; now includes actionable quick starts, trigger selection guide, watermark essentials, stream join patterns, checkpoint best practices, and common issues table. All 9 existing reference files are now more discoverable.

Advisory Note

Pre-existing reference files (trigger-and-cost-optimization.md, kafka-streaming.md) still use the older realTime=True syntax. This PR's SKILL.md correctly uses realTime="5 minutes". A follow-up PR could update those reference files for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant