diff --git a/databricks-skills/databricks-agent-bricks/SKILL.md b/databricks-skills/databricks-agent-bricks/SKILL.md index 04be7dad..0cc9e116 100644 --- a/databricks-skills/databricks-agent-bricks/SKILL.md +++ b/databricks-skills/databricks-agent-bricks/SKILL.md @@ -204,6 +204,17 @@ manage_mas( - **[databricks-model-serving](../databricks-model-serving/SKILL.md)** - Deploy custom agent endpoints used as MAS agents - **[databricks-vector-search](../databricks-vector-search/SKILL.md)** - Build vector indexes for RAG applications paired with KAs +## Common Issues + +| Issue | Solution | +|-------|----------| +| **KA endpoint stuck in PROVISIONING** | Endpoints take 5-15 minutes to provision. Use `manage_ka(action="get", tile_id="...")` to poll status. If stuck >20 min, delete and recreate | +| **KA returns generic answers ignoring documents** | Ensure documents are indexed (check knowledge source status). Add specific instructions telling the KA to cite sources | +| **MAS routes all questions to one agent** | Agent descriptions are critical for routing. Make each description specific about what that agent handles vs. doesn't handle | +| **"Endpoint not found" when querying KA** | The endpoint name follows the pattern `ka-{tile_id_prefix}-endpoint` where prefix is the first segment of the tile_id before the first hyphen | +| **Examples not being added to KA** | Examples are queued when endpoint is not ONLINE yet. They are added automatically once the endpoint becomes ready | +| **Genie space returns no results** | Verify the warehouse is running and the tables in `table_identifiers` exist and are accessible to the current user | + ## See Also - `1-knowledge-assistants.md` - Detailed KA patterns and examples diff --git a/databricks-skills/databricks-aibi-dashboards/SKILL.md b/databricks-skills/databricks-aibi-dashboards/SKILL.md index e722cb0d..220ae656 100644 --- a/databricks-skills/databricks-aibi-dashboards/SKILL.md +++ b/databricks-skills/databricks-aibi-dashboards/SKILL.md @@ -843,6 +843,17 @@ result = create_or_update_dashboard( print(result["url"]) ``` +## Common Issues + +| Issue | Solution | +|-------|----------| +| **Dashboard API returns 404** | Verify the dashboard ID is correct. Use `list_lakeview_dashboards` to find valid IDs. Draft dashboards use a different endpoint than published ones | +| **SQL query works in editor but fails in dashboard** | Dashboard queries run as the dashboard owner. Ensure the owner has `SELECT` on all referenced tables and `USE CATALOG`/`USE SCHEMA` grants | +| **Chart shows no data despite valid query** | Field names in `query.fields[].name` must exactly match `encodings[].fieldName`. See Troubleshooting section below for details | +| **Widget layout overlaps or misaligned** | Positions use a 6-column grid. Ensure `x + width <= 6` for each widget. Heights are in grid units (1 unit ≈ 40px) | +| **Filter widget not filtering other widgets** | Filters use `associatedQueries` to link to datasets. Verify the query name and column name match exactly | +| **Published dashboard shows stale data** | Published dashboards use a schedule. Update the schedule or use `execute_sql` to refresh the underlying tables | + ## Troubleshooting ### Widget shows "no selected fields to visualize" diff --git a/databricks-skills/databricks-config/SKILL.md b/databricks-skills/databricks-config/SKILL.md index 88382c33..b1f738a7 100644 --- a/databricks-skills/databricks-config/SKILL.md +++ b/databricks-skills/databricks-config/SKILL.md @@ -20,3 +20,14 @@ Use the `manage_workspace` MCP tool for all workspace operations. Do NOT edit `~ 4. Present the result. For `status`/`switch`/`login`: show host, profile, username. For `list`: formatted table with the active profile marked. > **Note:** The switch is session-scoped — it resets on MCP server restart. For permanent profile setup, use `databricks auth login -p ` and update `~/.databrickscfg` with `cluster_id` or `serverless_compute_id = auto`. + +## Common Issues + +| Issue | Solution | +|-------|----------| +| **`manage_workspace` returns "no profiles found"** | Run `databricks auth login --host https://your-workspace.cloud.databricks.com` to create a profile in `~/.databrickscfg` | +| **Switch doesn't persist after restart** | Expected — switches are session-scoped. For permanent changes, set `DATABRICKS_HOST` / `DATABRICKS_TOKEN` env vars | +| **"Token expired" errors** | Re-authenticate with `databricks auth login`. OAuth tokens from `databricks auth login` auto-refresh; PATs do not | +| **Wrong workspace after switching** | Use `action="status"` to verify which workspace is active. The MCP server may have restarted, resetting the switch | +| **Multiple profiles for same host** | Use distinct profile names. The CLI picks the first matching host if no profile is specified | +| **`DATABRICKS_CONFIG_PROFILE` not respected** | Env vars override `~/.databrickscfg` defaults. Unset conflicting env vars: `DATABRICKS_HOST`, `DATABRICKS_TOKEN` | diff --git a/databricks-skills/databricks-dbsql/SKILL.md b/databricks-skills/databricks-dbsql/SKILL.md index 24bf2694..c12e2965 100644 --- a/databricks-skills/databricks-dbsql/SKILL.md +++ b/databricks-skills/databricks-dbsql/SKILL.md @@ -298,3 +298,16 @@ Load these for detailed syntax, full parameter lists, and advanced patterns: - **Define PK/FK constraints** on dimensional models for query optimization - **Use `COLLATE UTF8_LCASE`** for user-facing string columns that need case-insensitive search - **Use MCP tools** (`execute_sql`, `execute_sql_multi`) to test and validate all SQL before deploying + +## Common Issues + +| Issue | Solution | +|-------|----------| +| **`execute_sql` times out on large queries** | Add `LIMIT` during development. For production, use `execute_sql_multi` to break into smaller statements | +| **`ai_query` returns NULL or errors** | Ensure the Foundation Model API endpoint exists and is running. Check that the prompt column is not NULL. Use `ai_query('databricks-meta-llama-...', col)` with a valid model name | +| **Pipe syntax `\|>` not recognized** | Pipe syntax requires DBR 16.2+. Check your warehouse version. Use traditional `SELECT ... FROM ... WHERE` as fallback | +| **`COLLATE` errors on string comparisons** | `COLLATE` requires DBR 16.0+. Define collation at column creation: `name STRING COLLATE UTF8_LCASE` | +| **Materialized view refresh fails** | MVs require a SQL warehouse or DLT pipeline to refresh. They cannot be refreshed from an all-purpose cluster | +| **`MERGE INTO` performance is slow** | Add `CLUSTER BY` on the merge key columns. Ensure the target table has liquid clustering enabled | +| **`http_request` blocked or returns 403** | `http_request` requires allowlisting the target domain. Contact your workspace admin to configure network access | +| **Recursive CTE hits iteration limit** | Default max recursion is 100. Add `OPTION (MAXRECURSION n)` or restructure to avoid deep recursion | diff --git a/databricks-skills/databricks-docs/SKILL.md b/databricks-skills/databricks-docs/SKILL.md index 54bb157f..237978f8 100644 --- a/databricks-skills/databricks-docs/SKILL.md +++ b/databricks-skills/databricks-docs/SKILL.md @@ -55,6 +55,16 @@ The llms.txt file is organized by category: 2. Read the specific docs to understand the feature 3. Determine which skill/tools apply, then use them +## Common Issues + +| Issue | Solution | +|-------|----------| +| **llms.txt is too large to process** | Don't fetch the entire file. Search for keywords in the URL index first, then fetch only the specific documentation pages you need | +| **Documentation page returns 404** | Databricks docs URLs change when features are renamed. Search llms.txt for the feature name to find the current URL | +| **Docs show different API than what works** | Check the DBR/runtime version. Many features require specific minimum versions (e.g., pipe syntax needs DBR 16.2+) | +| **Can't find docs for a preview feature** | Preview features may only be documented in release notes. Search for the feature name in the release notes page | +| **Conflicting information between docs pages** | Prefer the more specific page (e.g., feature-specific guide over general overview). Check the page's last-updated date | + ## Related Skills - **[databricks-python-sdk](../databricks-python-sdk/SKILL.md)** - SDK patterns for programmatic Databricks access diff --git a/databricks-skills/databricks-mlflow-evaluation/SKILL.md b/databricks-skills/databricks-mlflow-evaluation/SKILL.md index 45db5f61..a37f314d 100644 --- a/databricks-skills/databricks-mlflow-evaluation/SKILL.md +++ b/databricks-skills/databricks-mlflow-evaluation/SKILL.md @@ -139,6 +139,18 @@ For automatically improving a registered system prompt using `optimize_prompts() See `GOTCHAS.md` for complete list. +## Common Issues + +| Issue | Solution | +|-------|----------| +| **`mlflow.evaluate()` vs `mlflow.genai.evaluate()`** | Use `mlflow.genai.evaluate()` for GenAI agents. The older `mlflow.evaluate()` has a different API and doesn't support GenAI scorers | +| **`predict_fn` receives dict instead of kwargs** | The predict function receives `**unpacked` keyword arguments, not a single dict. Define it as `def predict(query, context=None)` not `def predict(inputs)` | +| **Scorer returns wrong type** | `@scorer` functions must return a `Score` object: `Score(value=0.8, rationale="...")`. Don't return raw floats or strings | +| **Dataset `inputs` format error** | Inputs must be nested: `{"inputs": {"query": "..."}}` not `{"query": "..."}`. Each row's `inputs` dict is unpacked as kwargs to `predict_fn` | +| **Built-in scorer fails with "no guidelines"** | `Guidelines` scorer requires a `guidelines` parameter. Pass it as: `Guidelines(name="helpful", guidelines="The response should be helpful")` | +| **Evaluation runs but scores are all None** | Check that your scorer handles the response format correctly. If `predict_fn` returns a dict, the scorer receives that dict as `output` | +| **MemAlign requires human labels** | MemAlign calibrates judge prompts from domain expert feedback. You need at least 20-50 labeled examples for meaningful alignment | + ## Related Skills - **[databricks-docs](../databricks-docs/SKILL.md)** - General Databricks documentation reference diff --git a/databricks-skills/databricks-python-sdk/SKILL.md b/databricks-skills/databricks-python-sdk/SKILL.md index 1365666a..54ddb691 100644 --- a/databricks-skills/databricks-python-sdk/SKILL.md +++ b/databricks-skills/databricks-python-sdk/SKILL.md @@ -614,6 +614,18 @@ If I'm unsure about a method, I should: | Secrets | https://databricks-sdk-py.readthedocs.io/en/latest/workspace/workspace/secrets.html | | DBUtils | https://databricks-sdk-py.readthedocs.io/en/latest/dbutils.html | +## Common Issues + +| Issue | Solution | +|-------|----------| +| **`ValueError: default auth` on `WorkspaceClient()`** | No valid credentials found. Run `databricks auth login --host ` or set `DATABRICKS_HOST` + `DATABRICKS_TOKEN` env vars | +| **`PermissionDenied` on API calls** | The authenticated user/SP lacks permissions. Check grants with `w.grants.get()` or ask a workspace admin | +| **SDK method signature changed** | The SDK is actively developed. Pin your version in `requirements.txt`. Check the [changelog](https://github.com/databricks/databricks-sdk-py/releases) for breaking changes | +| **`w.jobs.list()` is very slow** | The workspace may have thousands of jobs. Use `w.jobs.list(name="prefix")` to filter, or add `limit=N` | +| **Databricks Connect `SparkSession` fails** | Ensure `databricks-connect` version matches your DBR version. Use `serverless_compute_id="auto"` for serverless | +| **`ImportError` for SDK service classes** | Import from the correct submodule: `from databricks.sdk.service.workspace import ImportFormat` not `from databricks.sdk import ImportFormat` | +| **OAuth token refresh fails** | Re-run `databricks auth login`. If using a service principal, check that the client secret hasn't expired | + ## Related Skills - **[databricks-config](../databricks-config/SKILL.md)** - profile and authentication setup diff --git a/databricks-skills/databricks-spark-structured-streaming/SKILL.md b/databricks-skills/databricks-spark-structured-streaming/SKILL.md index b1f59306..a071a3e8 100644 --- a/databricks-skills/databricks-spark-structured-streaming/SKILL.md +++ b/databricks-skills/databricks-spark-structured-streaming/SKILL.md @@ -63,3 +63,16 @@ df.writeStream \ - [ ] Exactly-once verified (txnVersion/txnAppId) - [ ] Watermark configured for stateful operations - [ ] Left joins for stream-static (not inner) + +## Common Issues + +| Issue | Solution | +|-------|----------| +| **Checkpoint corruption after schema change** | Checkpoints are tied to the query plan. Schema changes require a new checkpoint location. Back up the old checkpoint before changing | +| **OOM on stateful operations** | Enable RocksDB state store: `spark.conf.set("spark.sql.streaming.stateStore.providerClass", "com.databricks.sql.streaming.state.RocksDBStateStoreProvider")` | +| **`availableNow` trigger processes no data** | Ensure the source has new data since the last checkpoint. Check that the checkpoint path is correct and accessible | +| **Stream-static join returns stale data** | The static side is read once per micro-batch by default. Use `spark.sql.streaming.forceDeleteTempCheckpointLocation` or refresh the static DataFrame | +| **`foreachBatch` MERGE has duplicates** | Use `txnVersion` and `txnAppId` for idempotent writes: `deltaTable.merge(...).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()` | +| **Auto Loader `cloudFiles` schema inference fails** | Set `cloudFiles.schemaLocation` to a persistent path. For schema evolution, use `cloudFiles.schemaEvolutionMode = "addNewColumns"` | +| **Watermark delay too aggressive** | Late data arriving after the watermark is dropped silently. Set watermark delay >= max expected lateness of your data | +| **Streaming query silently stops** | Check the Spark UI for exceptions. Add a `StreamingQueryListener` or monitor `query.lastProgress` for null batches | diff --git a/databricks-skills/databricks-unity-catalog/SKILL.md b/databricks-skills/databricks-unity-catalog/SKILL.md index 30f34e3d..a5c43f6f 100644 --- a/databricks-skills/databricks-unity-catalog/SKILL.md +++ b/databricks-skills/databricks-unity-catalog/SKILL.md @@ -113,6 +113,18 @@ mcp__databricks__execute_sql( - **[databricks-synthetic-data-gen](../databricks-synthetic-data-gen/SKILL.md)** - for generating data stored in Unity Catalog Volumes - **[databricks-aibi-dashboards](../databricks-aibi-dashboards/SKILL.md)** - for building dashboards on top of Unity Catalog data +## Common Issues + +| Issue | Solution | +|-------|----------| +| **`PERMISSION_DENIED` on system tables** | System tables require explicit grants: `GRANT USE CATALOG ON CATALOG system TO group`, then `GRANT USE SCHEMA` and `GRANT SELECT` on the specific schema | +| **System table query is slow** | Always filter by date: `WHERE event_date >= current_date() - 7`. System tables can have billions of rows | +| **`GRANT` fails with "not owner"** | Only the object owner or metastore admin can grant permissions. Use `SHOW GRANTS ON ` to check current ownership | +| **Table not visible after creation** | Check that `USE CATALOG` and `USE SCHEMA` grants exist for the user/group. Three-level namespace requires grants at each level | +| **Tags not appearing on table** | Tags are set via `ALTER TABLE ... SET TAGS`. Verify with `SELECT * FROM system.information_schema.table_tags` | +| **External location permission denied** | The storage credential must have access to the cloud path. Check `SHOW EXTERNAL LOCATIONS` and verify IAM/SAS permissions | +| **Delta Sharing recipient can't access share** | Verify the recipient's activation link was used. Check `SHOW GRANTS ON SHARE` and ensure tables are added to the share | + ## Resources - [Unity Catalog System Tables](https://docs.databricks.com/administration-guide/system-tables/) diff --git a/databricks-skills/spark-python-data-source/SKILL.md b/databricks-skills/spark-python-data-source/SKILL.md index 4f90c60c..2635cb80 100644 --- a/databricks-skills/spark-python-data-source/SKILL.md +++ b/databricks-skills/spark-python-data-source/SKILL.md @@ -136,6 +136,16 @@ Implement a batch writer for Snowflake with staged uploads Write a data source for REST API with OAuth2 authentication and pagination ``` +## Common Issues + +| Issue | Solution | +|-------|----------| +| **`DataSource.schema()` returns wrong types** | Spark types must match exactly. Use `StructType([StructField("col", StringType())])` — don't return Python dicts | +| **Data source not found after registration** | Ensure `spark.dataSource.register(MyDataSource)` is called before `spark.read.format("my_source")`. The name comes from `MyDataSource.name()` | +| **Serialization error in `read()`** | The `DataSourceReader.read()` method runs on executors. Don't reference SparkSession or driver-only objects inside it | +| **Streaming source never triggers new batches** | `latestOffset()` must return a new offset when new data is available. If it returns the same offset, Spark skips the batch | +| **Schema evolution not supported** | Python data sources have a fixed schema from `schema()`. To handle schema changes, return a superset schema and fill missing fields with NULL | + ## Related - databricks-testing: Test data sources on Databricks clusters diff --git a/databricks-tools-core/databricks_tools_core/auth.py b/databricks-tools-core/databricks_tools_core/auth.py index 21913983..c3db9fb4 100644 --- a/databricks-tools-core/databricks_tools_core/auth.py +++ b/databricks-tools-core/databricks_tools_core/auth.py @@ -160,9 +160,7 @@ def get_workspace_client() -> WorkspaceClient: # Cross-workspace: explicit token overrides env OAuth so tool operations # target the caller-specified workspace instead of the app's own workspace if force and host and token: - return tag_client( - WorkspaceClient(host=host, token=token, auth_type="pat", **product_kwargs) - ) + return tag_client(WorkspaceClient(host=host, token=token, auth_type="pat", **product_kwargs)) # In Databricks Apps (OAuth credentials in env), explicitly use OAuth M2M. # Setting auth_type="oauth-m2m" prevents the SDK from also reading @@ -185,9 +183,7 @@ def get_workspace_client() -> WorkspaceClient: # Development mode: use explicit token if provided if host and token: - return tag_client( - WorkspaceClient(host=host, token=token, auth_type="pat", **product_kwargs) - ) + return tag_client(WorkspaceClient(host=host, token=token, auth_type="pat", **product_kwargs)) if host: return tag_client(WorkspaceClient(host=host, **product_kwargs)) diff --git a/databricks-tools-core/tests/unit/test_sql.py b/databricks-tools-core/tests/unit/test_sql.py index d1b661c6..42137ba5 100644 --- a/databricks-tools-core/tests/unit/test_sql.py +++ b/databricks-tools-core/tests/unit/test_sql.py @@ -121,8 +121,7 @@ def test_executor_without_query_tags_omits_from_api(self, mock_get_client): assert "query_tags" not in call_kwargs -def _make_warehouse(id, name, state, creator_name="other@example.com", - enable_serverless_compute=False): +def _make_warehouse(id, name, state, creator_name="other@example.com", enable_serverless_compute=False): """Helper to create a mock warehouse object.""" w = mock.Mock() w.id = id @@ -141,33 +140,29 @@ class TestSortWithinTier: def test_serverless_first(self): """Serverless warehouses should come before classic ones.""" classic = _make_warehouse("c1", "Classic WH", State.RUNNING) - serverless = _make_warehouse("s1", "Serverless WH", State.RUNNING, - enable_serverless_compute=True) + serverless = _make_warehouse("s1", "Serverless WH", State.RUNNING, enable_serverless_compute=True) result = _sort_within_tier([classic, serverless], current_user=None) assert result[0].id == "s1" assert result[1].id == "c1" def test_serverless_before_user_owned(self): """Serverless should be preferred over user-owned classic.""" - classic_owned = _make_warehouse("c1", "My WH", State.RUNNING, - creator_name="me@example.com") - serverless_other = _make_warehouse("s1", "Other WH", State.RUNNING, - creator_name="other@example.com", - enable_serverless_compute=True) - result = _sort_within_tier([classic_owned, serverless_other], - current_user="me@example.com") + classic_owned = _make_warehouse("c1", "My WH", State.RUNNING, creator_name="me@example.com") + serverless_other = _make_warehouse( + "s1", "Other WH", State.RUNNING, creator_name="other@example.com", enable_serverless_compute=True + ) + result = _sort_within_tier([classic_owned, serverless_other], current_user="me@example.com") assert result[0].id == "s1" def test_serverless_user_owned_first(self): """Among serverless, user-owned should come first.""" - serverless_other = _make_warehouse("s1", "Other Serverless", State.RUNNING, - creator_name="other@example.com", - enable_serverless_compute=True) - serverless_owned = _make_warehouse("s2", "My Serverless", State.RUNNING, - creator_name="me@example.com", - enable_serverless_compute=True) - result = _sort_within_tier([serverless_other, serverless_owned], - current_user="me@example.com") + serverless_other = _make_warehouse( + "s1", "Other Serverless", State.RUNNING, creator_name="other@example.com", enable_serverless_compute=True + ) + serverless_owned = _make_warehouse( + "s2", "My Serverless", State.RUNNING, creator_name="me@example.com", enable_serverless_compute=True + ) + result = _sort_within_tier([serverless_other, serverless_owned], current_user="me@example.com") assert result[0].id == "s2" assert result[1].id == "s1" @@ -177,8 +172,7 @@ def test_empty_list(self): def test_no_current_user(self): """Without a current user, only serverless preference applies.""" classic = _make_warehouse("c1", "Classic", State.RUNNING) - serverless = _make_warehouse("s1", "Serverless", State.RUNNING, - enable_serverless_compute=True) + serverless = _make_warehouse("s1", "Serverless", State.RUNNING, enable_serverless_compute=True) result = _sort_within_tier([classic, serverless], current_user=None) assert result[0].id == "s1" @@ -186,14 +180,12 @@ def test_no_current_user(self): class TestGetBestWarehouseServerless: """Tests for serverless preference in get_best_warehouse.""" - @mock.patch("databricks_tools_core.sql.warehouse.get_current_username", - return_value="me@example.com") + @mock.patch("databricks_tools_core.sql.warehouse.get_current_username", return_value="me@example.com") @mock.patch("databricks_tools_core.sql.warehouse.get_workspace_client") def test_prefers_serverless_within_running_shared(self, mock_client_fn, mock_user): """Among running shared warehouses, serverless should be picked.""" classic_shared = _make_warehouse("c1", "Shared WH", State.RUNNING) - serverless_shared = _make_warehouse("s1", "Shared Serverless", State.RUNNING, - enable_serverless_compute=True) + serverless_shared = _make_warehouse("s1", "Shared Serverless", State.RUNNING, enable_serverless_compute=True) mock_client = mock.Mock() mock_client.warehouses.list.return_value = [classic_shared, serverless_shared] mock_client_fn.return_value = mock_client @@ -201,14 +193,12 @@ def test_prefers_serverless_within_running_shared(self, mock_client_fn, mock_use result = get_best_warehouse() assert result == "s1" - @mock.patch("databricks_tools_core.sql.warehouse.get_current_username", - return_value="me@example.com") + @mock.patch("databricks_tools_core.sql.warehouse.get_current_username", return_value="me@example.com") @mock.patch("databricks_tools_core.sql.warehouse.get_workspace_client") def test_prefers_serverless_within_running_other(self, mock_client_fn, mock_user): """Among running non-shared warehouses, serverless should be picked.""" classic = _make_warehouse("c1", "My WH", State.RUNNING) - serverless = _make_warehouse("s1", "Fast WH", State.RUNNING, - enable_serverless_compute=True) + serverless = _make_warehouse("s1", "Fast WH", State.RUNNING, enable_serverless_compute=True) mock_client = mock.Mock() mock_client.warehouses.list.return_value = [classic, serverless] mock_client_fn.return_value = mock_client @@ -216,14 +206,12 @@ def test_prefers_serverless_within_running_other(self, mock_client_fn, mock_user result = get_best_warehouse() assert result == "s1" - @mock.patch("databricks_tools_core.sql.warehouse.get_current_username", - return_value="me@example.com") + @mock.patch("databricks_tools_core.sql.warehouse.get_current_username", return_value="me@example.com") @mock.patch("databricks_tools_core.sql.warehouse.get_workspace_client") def test_tier_order_preserved_over_serverless(self, mock_client_fn, mock_user): """A running shared classic should still beat a stopped serverless.""" running_shared_classic = _make_warehouse("c1", "Shared WH", State.RUNNING) - stopped_serverless = _make_warehouse("s1", "Fast WH", State.STOPPED, - enable_serverless_compute=True) + stopped_serverless = _make_warehouse("s1", "Fast WH", State.STOPPED, enable_serverless_compute=True) mock_client = mock.Mock() mock_client.warehouses.list.return_value = [stopped_serverless, running_shared_classic] mock_client_fn.return_value = mock_client