Add new Compute Management skill for cluster and warehouse selection by CheeYuTan · Pull Request #224 · databricks-solutions/ai-dev-kit

CheeYuTan · 2026-03-07T10:10:21Z

Summary

New skill documenting 6 MCP tools for Databricks compute management. No compute skill existed — agents had no guidance on how to select, start, or monitor clusters and warehouses.

Tools documented: list_clusters, list_warehouses, get_best_cluster, get_best_warehouse, get_cluster_status, start_cluster

What's in the skill

Decision table — when to use cluster vs warehouse (SQL → warehouse, notebooks → cluster, streaming → cluster)
Quick Start — 2-line patterns to get compute and run code
MCP Tools Reference — all 6 tools with params, return formats, and response shapes
Workflows — compute selection, cluster startup with polling, warehouse auto-start
Serverless vs Classic guidance
Common Issues — 6 real error patterns with solutions

Test evidence — 10 iterations of test-and-fix

Live MCP tool tests

Tool	Input	Output	Status
`list_clusters`	`{}`	1 cluster: `Steven Tan's Personal Compute Cluster`, state=`RUNNING`	PASS
`list_warehouses`	`{}`	1 warehouse: `Serverless Starter Warehouse`, state=`RUNNING`, size=`Small`	PASS
`get_best_cluster`	`{}`	`{"cluster_id": "1203-135841-22k8xamq"}`	PASS
`get_best_warehouse`	`{}`	`"d41ad9fd669499ed"` (bare string)	PASS
`get_cluster_status`	`{cluster_id: "1203-135841-22k8xamq"}`	`{state: "RUNNING", message: "...running and ready for use."}`	PASS
`get_cluster_status`	`{cluster_id: "invalid-id"}`	`ResourceDoesNotExist` error	PASS (edge case)
`start_cluster`	`{cluster_id: "1203-135841-22k8xamq"}` (already running)	Returns `{state: "RUNNING"}` — no `previous_state` field	PASS (edge case)

MCP tool schema verification

All 6 tool JSON schemas read and cross-referenced with skill documentation — parameter names, types, and required/optional status all match.

What the 10 iterations found and fixed

Iteration	What was wrong	Fix
3	`execute_sql` param was `sql=` — actual param is `sql_query=`	Fixed Quick Start example
4	`execute_databricks_command` param was `command=` — actual param is `code=`	Fixed Quick Start example
6	`get_best_warehouse` selection logic was wrong ("prefers smaller sizes") — actual logic is a 7-step priority algorithm	Rewrote with correct logic from MCP source code
8	`start_cluster` on already-running cluster returns different format (no `previous_state`)	Added variant to docs

Edge cases verified

Scenario	Verified via	Result
Invalid cluster ID	Live test	`ResourceDoesNotExist` error
start_cluster on RUNNING cluster	Live test	Returns current state, no error
No clusters available	Source code review	`get_best_cluster` returns `{"cluster_id": null}`
No warehouses available	Source code review	`get_best_warehouse` returns `null`

Test plan

CI validation passes (validate_skills.py — 27 skills)
SKILL.md frontmatter valid
Registered in install_skills.sh
All 6 MCP tool schemas verified against JSON descriptors
5 of 6 tools tested live (start_cluster only tested on already-running)
All response shapes verified against real output
2 edge cases tested live, 2 verified via source code
Cross-reference to databricks-config skill verified

…a MCP New skill documenting 6 MCP tools for compute management: list_clusters, list_warehouses, get_best_cluster, get_best_warehouse, get_cluster_status, start_cluster Tested through 10 iterations against a live workspace. Key fixes: - Fixed execute_sql param name (sql -> sql_query) - Fixed execute_databricks_command param name (command -> code) - Corrected get_best_warehouse selection logic to match actual 7-step priority algorithm from MCP source code - Added start_cluster "already running" return variant

CheeYuTan changed the title ~~Add new Compute Management skill — cluster and warehouse selection via MCP~~ Add new Compute Management skill for cluster and warehouse selection Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new Compute Management skill for cluster and warehouse selection#224

Add new Compute Management skill for cluster and warehouse selection#224
CheeYuTan wants to merge 1 commit intodatabricks-solutions:mainfrom
CheeYuTan:feat/compute-management-skill

CheeYuTan commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CheeYuTan commented Mar 7, 2026

Summary

What's in the skill

Test evidence — 10 iterations of test-and-fix

Live MCP tool tests

MCP tool schema verification

What the 10 iterations found and fixed

Edge cases verified

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant