Skip to content

Add new Compute Management skill for cluster and warehouse selection#224

Open
CheeYuTan wants to merge 1 commit intodatabricks-solutions:mainfrom
CheeYuTan:feat/compute-management-skill
Open

Add new Compute Management skill for cluster and warehouse selection#224
CheeYuTan wants to merge 1 commit intodatabricks-solutions:mainfrom
CheeYuTan:feat/compute-management-skill

Conversation

@CheeYuTan
Copy link
Contributor

Summary

New skill documenting 6 MCP tools for Databricks compute management. No compute skill existed — agents had no guidance on how to select, start, or monitor clusters and warehouses.

Tools documented: list_clusters, list_warehouses, get_best_cluster, get_best_warehouse, get_cluster_status, start_cluster

What's in the skill

  • Decision table — when to use cluster vs warehouse (SQL → warehouse, notebooks → cluster, streaming → cluster)
  • Quick Start — 2-line patterns to get compute and run code
  • MCP Tools Reference — all 6 tools with params, return formats, and response shapes
  • Workflows — compute selection, cluster startup with polling, warehouse auto-start
  • Serverless vs Classic guidance
  • Common Issues — 6 real error patterns with solutions

Test evidence — 10 iterations of test-and-fix

Live MCP tool tests

Tool Input Output Status
list_clusters {} 1 cluster: Steven Tan's Personal Compute Cluster, state=RUNNING PASS
list_warehouses {} 1 warehouse: Serverless Starter Warehouse, state=RUNNING, size=Small PASS
get_best_cluster {} {"cluster_id": "1203-135841-22k8xamq"} PASS
get_best_warehouse {} "d41ad9fd669499ed" (bare string) PASS
get_cluster_status {cluster_id: "1203-135841-22k8xamq"} {state: "RUNNING", message: "...running and ready for use."} PASS
get_cluster_status {cluster_id: "invalid-id"} ResourceDoesNotExist error PASS (edge case)
start_cluster {cluster_id: "1203-135841-22k8xamq"} (already running) Returns {state: "RUNNING"} — no previous_state field PASS (edge case)

MCP tool schema verification

All 6 tool JSON schemas read and cross-referenced with skill documentation — parameter names, types, and required/optional status all match.

What the 10 iterations found and fixed

Iteration What was wrong Fix
3 execute_sql param was sql= — actual param is sql_query= Fixed Quick Start example
4 execute_databricks_command param was command= — actual param is code= Fixed Quick Start example
6 get_best_warehouse selection logic was wrong ("prefers smaller sizes") — actual logic is a 7-step priority algorithm Rewrote with correct logic from MCP source code
8 start_cluster on already-running cluster returns different format (no previous_state) Added variant to docs

Edge cases verified

Scenario Verified via Result
Invalid cluster ID Live test ResourceDoesNotExist error
start_cluster on RUNNING cluster Live test Returns current state, no error
No clusters available Source code review get_best_cluster returns {"cluster_id": null}
No warehouses available Source code review get_best_warehouse returns null

Test plan

  • CI validation passes (validate_skills.py — 27 skills)
  • SKILL.md frontmatter valid
  • Registered in install_skills.sh
  • All 6 MCP tool schemas verified against JSON descriptors
  • 5 of 6 tools tested live (start_cluster only tested on already-running)
  • All response shapes verified against real output
  • 2 edge cases tested live, 2 verified via source code
  • Cross-reference to databricks-config skill verified

…a MCP

New skill documenting 6 MCP tools for compute management:
list_clusters, list_warehouses, get_best_cluster, get_best_warehouse,
get_cluster_status, start_cluster

Tested through 10 iterations against a live workspace. Key fixes:
- Fixed execute_sql param name (sql -> sql_query)
- Fixed execute_databricks_command param name (command -> code)
- Corrected get_best_warehouse selection logic to match actual 7-step
  priority algorithm from MCP source code
- Added start_cluster "already running" return variant
@CheeYuTan CheeYuTan changed the title Add new Compute Management skill — cluster and warehouse selection via MCP Add new Compute Management skill for cluster and warehouse selection Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant