Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions databricks-skills/databricks-app-apx/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,7 @@ Create two markdown files:
**TypeScript errors**: Wait for OpenAPI regen, verify hook names match operation_ids
**OpenAPI not updating**: Check watcher status with `apx dev status`, restart if needed
**Components not added**: Run shadcn from project root with `--yes` flag
**React page crashes to blank after data loads (Error #310)**: `useMemo`/`useCallback` hooks placed after early returns (`if (loading) return <Spinner />`) violate React Rules of Hooks. Hooks must be called in the same order on every render. Move ALL hooks before any conditional returns and guard their internals instead: `useMemo(() => { if (!data.length) return []; ... }, [data])`

## Reference Materials

Expand Down
43 changes: 39 additions & 4 deletions databricks-skills/databricks-app-python/2-app-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,21 +31,39 @@ Use `valueFrom` to reference resources — never hardcode IDs:
```yaml
env:
- name: DATABRICKS_WAREHOUSE_ID
valueFrom: sql-warehouse
valueFrom:
resource: sql-warehouse

- name: SERVING_ENDPOINT_NAME
valueFrom: serving-endpoint
valueFrom:
resource: serving-endpoint

- name: DB_CONNECTION_STRING
valueFrom: database
valueFrom:
resource: database
```

Add resources via the Databricks Apps UI when creating or editing an app:
Add resources via the Databricks Apps UI or CLI:

**Option 1: UI**
1. Navigate to Configure step
2. Click **+ Add resource**
3. Select resource type and set permissions
4. Assign a key (referenced in `valueFrom`)

**Option 2: CLI (API PATCH)** — required when deploying programmatically. Without resources attached, the gateway shows "App Not Available" even if the process is running:

```bash
databricks api patch /api/2.0/apps/<app-name> --json '{
"resources": [
{"name": "sql-warehouse", "sql_warehouse": {"id": "<warehouse-id>", "permission": "CAN_USE"}},
{"name": "serving-endpoint", "serving_endpoint": {"name": "<endpoint-name>", "permission": "CAN_QUERY"}}
]
}' --profile <profile>
```

**CRITICAL**: Resources must be attached BEFORE deploying. Without them, the gateway will refuse to serve the app even though the process is running and healthy.

---

## Communication Strategies
Expand Down Expand Up @@ -112,9 +130,26 @@ For Lakebase patterns, see [5-lakebase.md](5-lakebase.md).

---

## Troubleshooting: `valueFrom` vs `value`

If `valueFrom: resource:` fails with "Error reading app.yaml", use hardcoded `value:` as a fallback:

```yaml
env:
- name: DATABRICKS_WAREHOUSE_ID
value: "<actual-warehouse-id>"
- name: SERVING_ENDPOINT_NAME
value: "<actual-endpoint-name>"
```

This can happen when resources aren't yet attached to the app or the resource key doesn't match. Prefer `valueFrom` when resources are properly configured, but use `value` to unblock deployment.

---

## Best Practices

- Always use `valueFrom` — keeps apps portable between environments
- If `valueFrom` fails with "Error reading app.yaml", fall back to `value:` with hardcoded IDs (see above)
- Grant service principal minimum required permissions (e.g., `CAN USE` not `CAN MANAGE` for SQL warehouse)
- Use Lakebase for transactional workloads; SQL warehouse for analytical workloads
- For external services, use UC connections or secrets (never hardcode API keys)
8 changes: 5 additions & 3 deletions databricks-skills/databricks-app-python/3-frameworks.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def get_connection():
| Detail | Value |
|--------|-------|
| Pre-installed version | 1.38.0 |
| app.yaml command | `["streamlit", "run", "app.py"]` |
| app.yaml command | `["streamlit", "run", "app.py"]` — port, address, and headless are auto-configured by the runtime via `DATABRICKS_APP_PORT` |
| Auth header | `st.context.headers.get('x-forwarded-access-token')` |

**Databricks tips**:
Expand Down Expand Up @@ -152,7 +152,7 @@ def get_data():
| Detail | Value |
|--------|-------|
| Pre-installed version | 3.0.3 |
| app.yaml command | `["gunicorn", "app:app", "-w", "4", "-b", "0.0.0.0:8000"]` |
| app.yaml command | `["gunicorn", "app:app", "-w", "4", "-b", "0.0.0.0:8000"]` — uses `DATABRICKS_APP_PORT` default (8000) |
| Auth header | `request.headers.get('x-forwarded-access-token')` |

**Databricks tips**:
Expand Down Expand Up @@ -192,7 +192,7 @@ async def get_data(request: Request):
| Detail | Value |
|--------|-------|
| Pre-installed version | 0.115.0 |
| app.yaml command | `["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]` |
| app.yaml command | `["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]` — uses `DATABRICKS_APP_PORT` default (8000) |
| Auth header | `request.headers.get('x-forwarded-access-token')` via `Request` |

**Databricks tips**:
Expand Down Expand Up @@ -244,5 +244,7 @@ class State(rx.State):
- Add only additional packages your app needs to `requirements.txt`
- SDK `Config()` auto-detects credentials from injected environment variables
- Apps must bind to `DATABRICKS_APP_PORT` env var (defaults to 8000). Streamlit is auto-configured by the runtime; for other frameworks, read the env var in code or hardcode 8000 in `app.yaml` command. **Never use 8080**
- **No external CDN dependencies** in frontend HTML — the app runtime blocks outbound CDN requests (React, Recharts, Google Fonts, Babel). Build self-contained HTML with inline JS/CSS only
- **Never delete apps to fix issues** — just redeploy. Deleting disrupts OAuth integration
- For framework-specific deployment commands, see [4-deployment.md](4-deployment.md)
- For authorization integration, see [1-authorization.md](1-authorization.md)
60 changes: 48 additions & 12 deletions databricks-skills/databricks-app-python/4-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ command:

env:
- name: DATABRICKS_WAREHOUSE_ID
valueFrom: sql-warehouse
valueFrom:
resource: sql-warehouse
- name: USE_MOCK_BACKEND
value: "false"
```
Expand All @@ -28,13 +29,15 @@ env:

| Framework | Command |
|-----------|---------|
| Dash | `["python", "app.py"]` |
| Streamlit | `["streamlit", "run", "app.py"]` |
| Gradio | `["python", "app.py"]` |
| Dash | `["python", "app.py"]` — bind to `DATABRICKS_APP_PORT` in code |
| Streamlit | `["streamlit", "run", "app.py"]` — port/address/headless auto-configured by runtime |
| Gradio | `["python", "app.py"]` — bind to `DATABRICKS_APP_PORT` in code |
| Flask | `["gunicorn", "app:app", "-w", "4", "-b", "0.0.0.0:8000"]` |
| FastAPI | `["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]` |
| Reflex | `["reflex", "run", "--env", "prod"]` |

**Port binding**: Apps must listen on `DATABRICKS_APP_PORT` (defaults to 8000). Streamlit is auto-configured. For Flask/FastAPI, 8000 in the command matches the default. For Dash/Gradio, read the env var in code: `int(os.environ.get("DATABRICKS_APP_PORT", 8000))`. **Never use 8080.**

### Step 2: Create and Deploy

```bash
Expand All @@ -57,9 +60,19 @@ databricks apps get <app-name>

### Redeployment

**NEVER delete and recreate an app to fix deployment issues** — just redeploy. Deleting disrupts OAuth integration and doesn't fix underlying problems.

**Clean stale files** before redeploying — leftover files (e.g., old `main.py`) in the workspace source path can cause conflicts:

```bash
databricks workspace delete /Workspace/Users/<user>/apps/<app-name> --recursive
databricks workspace import-dir . /Workspace/Users/<user>/apps/<app-name>
# Check for stale files
databricks workspace list /Workspace/Users/<user>/apps/<app-name>

# Remove stale files if needed
databricks workspace delete /Workspace/Users/<user>/apps/<app-name>/<stale-file>

# Sync and redeploy
databricks sync . /Workspace/Users/<user>/apps/<app-name> --full
databricks apps deploy <app-name> \
--source-code-path /Workspace/Users/<user>/apps/<app-name>
```
Expand Down Expand Up @@ -115,6 +128,35 @@ For programmatic app lifecycle management, see [6-mcp-approach.md](6-mcp-approac

## Post-Deployment

### Attach Resources (CRITICAL)

Without resources attached, the gateway shows "App Not Available" even if the process is running. Attach resources via API PATCH **before** deploying:

```bash
databricks api patch /api/2.0/apps/<app-name> --json '{
"resources": [
{"name": "sql-warehouse", "sql_warehouse": {"id": "<warehouse-id>", "permission": "CAN_USE"}},
{"name": "serving-endpoint", "serving_endpoint": {"name": "<endpoint-name>", "permission": "CAN_QUERY"}}
]
}' --profile <profile>
```

**Find the correct warehouse ID** for the target workspace:
```bash
databricks warehouses list --profile <profile>
```

### Configure Permissions

```bash
databricks api put /api/2.0/permissions/apps/<app-name> --json '{
"access_control_list": [
{"user_name": "<your-email>", "permission_level": "CAN_MANAGE"},
{"group_name": "users", "permission_level": "CAN_USE"}
]
}' --profile <profile>
```

### Check Logs

```bash
Expand All @@ -134,9 +176,3 @@ databricks apps logs <app-name>
2. Check all pages load correctly
3. Verify data connectivity (look for backend initialization messages in logs)
4. Test user authorization flow if enabled

### Configure Permissions

- Set `CAN USE` for approved users/groups
- Set `CAN MANAGE` only for trusted developers
- Verify service principal has required resource permissions
69 changes: 69 additions & 0 deletions databricks-skills/databricks-app-python/5-lakebase.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,77 @@ asyncpg

**This is the most common cause of Lakebase app failures.**

## Troubleshooting: OAuth / Security Label Errors

When the app's service principal connects to Lakebase via OAuth (i.e. `PGPASSWORD` is not auto-injected), you may see:

```
FATAL: An oauth token was supplied but no role security label was configured in postgres for role "<SP_CLIENT_ID>"
```

**Root cause**: The SP's PostgreSQL role exists but lacks the `databricks_auth` security label that maps it to a Databricks identity.

**Fix**: Connect as the instance owner and set the security label:

```sql
-- 1. Find the SP's numeric ID (from Databricks workspace)
-- databricks service-principals list -o json | grep <SP_CLIENT_ID>
-- Look for the "id" field (numeric)

-- 2. Set the security label in Lakebase
SECURITY LABEL FOR databricks_auth ON ROLE "<SP_CLIENT_ID>"
IS 'id=<SP_NUMERIC_ID>,type=SERVICE_PRINCIPAL';

-- 3. Grant schema/table access
GRANT USAGE ON SCHEMA my_schema TO "<SP_CLIENT_ID>";
GRANT SELECT ON ALL TABLES IN SCHEMA my_schema TO "<SP_CLIENT_ID>";
```

You can verify with: `SELECT * FROM pg_seclabels WHERE objtype = 'role';`

**When PGPASSWORD is empty**: If Databricks auto-injects `PGHOST` and `PGUSER` but NOT `PGPASSWORD`, use the SDK to generate an OAuth token:

```python
from databricks.sdk import WorkspaceClient
import uuid

w = WorkspaceClient()
cred = w.database.generate_database_credential(
request_id=str(uuid.uuid4()),
instance_names=["my-lakebase-instance"],
)
# Use cred.token as the password
```

---

## Lakebase Sync (Reverse ETL from Delta)

To sync Unity Catalog Delta tables to Lakebase for low-latency serving:

1. Add primary keys to source Delta tables (required):
```sql
ALTER TABLE catalog.schema.my_table ALTER COLUMN id SET NOT NULL;
ALTER TABLE catalog.schema.my_table ADD CONSTRAINT pk PRIMARY KEY (id);
```

2. Create synced tables via CLI:
```bash
databricks database create-synced-database-table --json '{
"name": "catalog.schema.lb_my_table",
"source_table_full_name": "catalog.schema.my_table",
"scheduling_policy": {"snapshot": {}},
"primary_key_columns": ["id"]
}'
```

The `name` field is the **destination** UC table pointer (use a prefix like `lb_` to avoid conflicts with the source table).

---

## Notes

- Lakebase is in **Public Preview**
- Each app gets its own PostgreSQL role with `Can connect and create` permission
- Lakebase is ideal alongside SQL warehouse: use Lakebase for app state, SQL warehouse for analytics
- When using Lakebase Sync, synced tables appear in the Lakebase schema with the prefix you chose
14 changes: 10 additions & 4 deletions databricks-skills/databricks-app-python/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ Build Python-based Databricks applications. For full examples and recipes, see t
- **MUST** use `dash-bootstrap-components` for Dash app layout and styling
- **MUST** use `@st.cache_resource` for Streamlit database connections
- **MUST** deploy Flask with Gunicorn, FastAPI with uvicorn (not dev servers)
- **MUST NOT** use external CDN links in frontend HTML (React, Recharts, Google Fonts, Babel, etc.) — the app runtime blocks outbound CDN requests. Use self-contained inline JS/CSS only
- **MUST NOT** delete and recreate apps to fix deployment issues — just redeploy. Deleting disrupts OAuth integration
- **MUST NOT** upload `node_modules/`, `frontend/src/`, `__pycache__/`, or other dev-only files to the workspace when deploying. For React/Vite apps with a FastAPI backend, only upload: `app.py`, `backend.py`, `requirements.txt`, `app.yaml`, and the `static/` build output folder. Use targeted `databricks workspace import` for individual files and `databricks workspace import-dir static <ws-path>/static` for the build — never `import-dir .` from the app root
- **MUST** (React) place ALL hooks (`useState`, `useEffect`, `useMemo`, `useCallback`, `useRef`) BEFORE any early return statements in React components. React requires hooks to be called in the exact same order on every render. Placing `useMemo`/`useCallback` after `if (loading) return <Spinner />` causes "Rendered fewer hooks than expected" (React Error #310) — the component calls fewer hooks on the loading render than on the data-loaded render, crashing the entire page to blank. Move all hooks to the top of the function body, guard their internals with `if (!data.length) return []` instead

## Required Steps

Expand All @@ -35,9 +39,9 @@ Copy this checklist and verify each item:

| Framework | Best For | app.yaml Command |
|-----------|----------|------------------|
| **Dash** | Production dashboards, BI tools, complex interactivity | `["python", "app.py"]` |
| **Streamlit** | Rapid prototyping, data science apps, internal tools | `["streamlit", "run", "app.py"]` |
| **Gradio** | ML demos, model interfaces, chat UIs | `["python", "app.py"]` |
| **Dash** | Production dashboards, BI tools, complex interactivity | `["python", "app.py"]` — bind to `DATABRICKS_APP_PORT` in code |
| **Streamlit** | Rapid prototyping, data science apps, internal tools | `["streamlit", "run", "app.py"]` — port/address/headless auto-configured by runtime |
| **Gradio** | ML demos, model interfaces, chat UIs | `["python", "app.py"]` — bind to `DATABRICKS_APP_PORT` in code |
| **Flask** | Custom REST APIs, lightweight apps, webhooks | `["gunicorn", "app:app", "-w", "4", "-b", "0.0.0.0:8000"]` |
| **FastAPI** | Async APIs, auto-generated OpenAPI docs | `["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]` |
| **Reflex** | Full-stack Python apps without JavaScript | `["reflex", "run", "--env", "prod"]` |
Expand Down Expand Up @@ -174,7 +178,9 @@ class EntityIn(BaseModel):
| **Streamlit: set_page_config error** | `st.set_page_config()` must be the first Streamlit command |
| **Dash: unstyled layout** | Add `dash-bootstrap-components`; use `dbc.themes.BOOTSTRAP` |
| **Slow queries** | Use Lakebase for transactional/low-latency; SQL warehouse for analytical queries |

| **"App Not Available" after deploy** | Ensure resources are attached via API PATCH before deploying; verify app binds to `DATABRICKS_APP_PORT` |
| **Frontend loads blank/black** | External CDN requests (React, Recharts, Google Fonts, Babel) are blocked by the app runtime. Use self-contained inline JS/CSS only — no external `<script>` or `<link>` tags |
| **React page crashes to blank after data loads** | `useMemo`/`useCallback` hooks placed after early returns (`if (loading) return ...`) violate React Rules of Hooks. Move ALL hooks before any conditional returns. Guard hook internals instead: `useMemo(() => { if (!data.length) return []; ... }, [data])` |
---

## Platform Constraints
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ uc_toolkit = UCFunctionToolkit(
| Function | Purpose |
|----------|---------|
| `system.ai.python_exec` | Execute Python code |
| `system.ai.similarity_search` | Vector similarity search |

### Creating a UC Function

Expand Down