Skip to content

Add databricks-dqx skill for Data Quality Extensions#156

Open
aniljoshiaset wants to merge 1 commit intodatabricks-solutions:mainfrom
aniljoshiaset:feature/add-databricks-dqx-skill
Open

Add databricks-dqx skill for Data Quality Extensions#156
aniljoshiaset wants to merge 1 commit intodatabricks-solutions:mainfrom
aniljoshiaset:feature/add-databricks-dqx-skill

Conversation

@aniljoshiaset
Copy link

Summary

  • Adds comprehensive skill for Databricks Labs DQX (Data Quality Extensions) framework
  • Covers quality rule definition, profiling, auto-generation, Lakeflow/DLT integration, streaming, and check functions
  • SKILL.md (256 lines) with 7 progressive-disclosure reference files following skill authoring best practices

Skill Structure

databricks-skills/databricks-dqx/
├── SKILL.md                          # Main skill (256 lines)
├── 1-installation-setup.md           # Install methods (pip, CLI, DABs, extras)
├── 2-defining-quality-rules.md       # DQRule classes, YAML metadata, storage
├── 3-applying-checks.md              # Apply checks, split valid/invalid, save
├── 4-profiler-auto-generation.md     # Profile datasets, auto-generate rules
├── 5-lakeflow-integration.md         # DLT/Lakeflow pipeline integration
├── 6-streaming-metrics.md            # Streaming support, quality monitoring
└── 7-check-functions-reference.md    # Complete built-in check function catalog

Test plan

  • SKILL.md has valid YAML frontmatter with name and description
  • Description is third-person, includes what the skill does and when to use it
  • Under 500 lines (256)
  • Progressive disclosure with reference files one level deep
  • No time-sensitive information
  • All file paths use forward slashes
  • Updated databricks-skills/README.md skills table
  • Updated main README.md skill count (19 → 20)

🤖 Generated with Claude Code

Add comprehensive skill for Databricks Labs DQX (Data Quality Extensions)
framework covering quality rule definition, profiling, auto-generation,
Lakeflow/DLT integration, streaming, and check functions reference.

Includes SKILL.md (256 lines) with 7 progressive-disclosure reference files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@calreynolds
Copy link
Collaborator

@aniljoshiaset instead of having us build in dqx into ai-dev-kit, would it be possible to create a dqx skill \ set of skills that we can reference on install from your repo as a part of our install.sh? Want to make sure we reduce areas of maintenance as much as possible 👍

@aniljoshiaset
Copy link
Author

aniljoshiaset commented Mar 6, 2026

@calreynolds Absolutely! I've set up a standalone repo for the DQX skill here: https://github.com/aniljoshiaset/databricks-dqx-skill

The structure follows the same pattern as the MLflow skills — a databricks-dqx/ directory with SKILL.md + 7 reference guides that can be fetched via raw GitHub URLs:

databricks-dqx/
├── SKILL.md                        # Main orchestrator
├── 1-installation-setup.md
├── 2-defining-quality-rules.md
├── 3-applying-checks.md
├── 4-profiler-auto-generation.md
├── 5-lakeflow-integration.md
├── 6-streaming-metrics.md
└── 7-check-functions-reference.md

You should be able to add it to install.sh the same way you reference MLflow skills. Let me know if you need any changes to the structure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants