Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
e042805
Add Lakebase autoscaling support to agent templates
jennsun Mar 5, 2026
f2a8398
rm dab postgres resource since not ready in apps yet
jennsun Mar 5, 2026
e74d9b8
restore to template scaffolding code
jennsun Mar 5, 2026
12cca49
revert gitignore fixes
jennsun Mar 5, 2026
faa24fc
.
jennsun Mar 5, 2026
0f1186e
revert dab blocks
jennsun Mar 5, 2026
b5631c7
.
jennsun Mar 6, 2026
d2e8416
revert quickstart test
jennsun Mar 6, 2026
b23405e
autoscaling lakebase claude skills
jennsun Mar 6, 2026
cebed77
specify sp client id
jennsun Mar 6, 2026
c17d8dc
rm PGENDPOINT
jennsun Mar 6, 2026
f0460b7
.
jennsun Mar 6, 2026
e9ba0c1
rm passing in endpoint
jennsun Mar 6, 2026
136b22e
bundle run
jennsun Mar 6, 2026
abb3b9c
append postgres resource and give explicit api for postgres
jennsun Mar 6, 2026
fe5924b
separate permission granting into separate script
jennsun Mar 6, 2026
6416e1e
include permission granting script in existing lakebase-setup skill
jennsun Mar 6, 2026
74d2c39
update autoscaling sdk to redeploy after adding postgres resource
jennsun Mar 9, 2026
854d12c
add autoscaling parameters to quickstart
jennsun Mar 9, 2026
1f29abc
update lakebase permission script
jennsun Mar 10, 2026
8553f99
update packages w autoscaling sdk release
jennsun Mar 10, 2026
98d03e4
use askuserquestion to prompt user of lakebase instance / only for st…
jennsun Mar 10, 2026
62a64f2
add app.yaml back in for stateful agent examples -> needed for app de…
jennsun Mar 11, 2026
193006c
sync yaml to remove p ostgres
jennsun Mar 11, 2026
3e3e13e
update quickstart to replace values in app.yaml/databricks.yml files …
jennsun Mar 11, 2026
b87f5b5
add grant lakebase permissions script to all templates
jennsun Mar 11, 2026
0bfad4e
update test quickstart
jennsun Mar 11, 2026
2cdc939
add tests for grant lakebase permissions
jennsun Mar 11, 2026
e12a92d
add lakebase configs in quickstart skill
jennsun Mar 12, 2026
6814088
.
jennsun Mar 12, 2026
6d622e0
revert pguser/pgdatabase/pghost vars needed for frontend
jennsun Mar 12, 2026
ea6c45d
add lakebase in .yml file
jennsun Mar 12, 2026
0302b49
validate lakebase autoscaling instance exists
jennsun Mar 12, 2026
543e341
rename lakebase creation in quickstart to make clearer it's for autos…
jennsun Mar 12, 2026
483c55d
test happy path for provisioned + autoscaling
jennsun Mar 12, 2026
1f13c01
restore template names for autoscaling app.yaml
jennsun Mar 12, 2026
0abb2d9
rm delete/make sure env vars commented out from template are cleaned up
jennsun Mar 12, 2026
97adc27
update deploy order so we grant permissions before re-deploying app, …
jennsun Mar 12, 2026
1be8598
pin min to 0.79.0: https://github.com/databricks/databricks-sdk-py/re…
jennsun Mar 12, 2026
9685dfd
remove hardcoded values
jennsun Mar 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .claude/skills/add-tools-langgraph/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ See the `examples/` directory for complete YAML snippets:
| `sql-warehouse.yaml` | SQL warehouse | SQL execution |
| `serving-endpoint.yaml` | Model serving endpoint | Model inference |
| `genie-space.yaml` | Genie space | Natural language data |
| `lakebase.yaml` | Lakebase database | Agent memory storage |
| `lakebase.yaml` | Lakebase database | Agent memory storage (provisioned) |
| `lakebase-autoscaling.md` | Lakebase autoscaling postgres | Agent memory storage (autoscaling) |
| `experiment.yaml` | MLflow experiment | Tracing (already configured) |
| `custom-mcp-server.md` | Custom MCP apps | Apps starting with `mcp-*` |

Expand Down
157 changes: 157 additions & 0 deletions .claude/skills/add-tools-langgraph/examples/lakebase-autoscaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# Autoscaling Postgres Lakebase Instances (not provisioned)

Autoscaling Lakebase postgres resources are **not yet supported as resource dependencies in `databricks.yml`**. Use `LAKEBASE_AUTOSCALING_PROJECT` and `LAKEBASE_AUTOSCALING_BRANCH` as static env vars, and add the postgres resource via API after deploy. The postgres resource serves two purposes: (1) granting the app's service principal access to Lakebase, and (2) on the next redeploy, injecting database connection env vars that the frontend (chat UI) needs.

## Steps

### 1. Add autoscaling env vars to `databricks.yml`

Add `LAKEBASE_AUTOSCALING_PROJECT` and `LAKEBASE_AUTOSCALING_BRANCH` as static `value:` env vars in your app's config block:

```yaml
# In databricks.yml - add to resources.apps.<app>.config.env:
- name: LAKEBASE_AUTOSCALING_PROJECT
value: "<your-project-name>"
- name: LAKEBASE_AUTOSCALING_BRANCH
value: "<your-branch-name>"
```

### 2. Deploy your agent app

```bash
databricks bundle deploy
databricks bundle run <your-app-resource-name> # from databricks.yml resources.apps.*
```

### 3. Add the postgres resource via API

After the app is deployed, add the postgres resource using the Databricks API. This grants the app's service principal access to Lakebase. The frontend connection env vars are injected later when the app is redeployed (step 5). The agent backend reads PROJECT+BRANCH from the static env vars you set in step 1.

**Important:** The PATCH replaces the entire `resources` list, so you must fetch existing resources first and append the postgres resource to preserve other resources (e.g., MLflow experiments added by DAB).

```bash
# 1. Fetch existing resources
EXISTING=$(databricks api get /api/2.0/apps/<your-app-name> | jq -c '.resources // []')

# 2. Append the postgres resource
UPDATED=$(echo "$EXISTING" | jq -c '. + [{
"name": "postgres",
"postgres": {
"branch": "projects/<project-id>/branches/<branch-id>",
"database": "projects/<project-id>/branches/<branch-id>/databases/<database-id>",
"permission": "CAN_CONNECT_AND_CREATE"
}
}]')

# 3. Patch with the merged list
databricks api patch /api/2.0/apps/<your-app-name> \
--json "{\"resources\": $UPDATED}"
```

Replace the placeholders:
- `<your-app-name>`: Your deployed app name (e.g., `agent-langgraph-stm`)
- `<project-id>`, `<branch-id>`, `<database-id>`: Look these up using the **postgres API** (see below)

#### Finding your project, branch, and database IDs

Autoscaling Lakebase uses the **postgres API** (`/api/2.0/postgres/`), NOT the database API. Do NOT use `/api/2.0/database/` or `/api/2.0/lakebase/` — those are for provisioned instances.

```bash
# List projects — find your project ID
databricks api get /api/2.0/postgres/projects

# List branches for a project
databricks api get /api/2.0/postgres/projects/<project-id>/branches

# List databases for a branch
databricks api get /api/2.0/postgres/projects/<project-id>/branches/<branch-id>/databases
```

API docs: https://docs.databricks.com/api/workspace/postgres

### 4. Grant table permissions to the app's service principal

The app's service principal needs permissions on the memory tables **before** the app redeploys and runs migrations. Use the `scripts/grant_lakebase_permissions.py` script included in the template.

First, get the service principal **client ID** (UUID format):

```bash
databricks apps get <your-app-name> --output json | jq -r '.service_principal_client_id'
```

Then run the grant script. Pass `--instance-name` for provisioned instances, or `--project` + `--branch` for autoscaling (the script also reads these from `.env` if set):

```bash
# Provisioned:
uv run python scripts/grant_lakebase_permissions.py <sp-client-id> --instance-name <name>

# Autoscaling:
uv run python scripts/grant_lakebase_permissions.py <sp-client-id> --project <project> --branch <branch>
```

> Upon first usage of stateful agent the schemas and tables below won't exist yet. Attempt the grants below, but if `grant_table` or `grant_all_tables_in_schema` calls fail because the table/schema doesn't exist, that's expected, not an error.

### 5. Redeploy the app to pick up the postgres resource

After adding the postgres resource and granting permissions, you **must** redeploy the app so it picks up the database connection env vars injected by the resource. The SP now has CREATE on schemas, so drizzle migrations will succeed on startup.

```bash
# Use `databricks apps deploy` (NOT `databricks bundle deploy`, which would wipe the resource you just added)
databricks apps deploy <your-app-name> \
--source-code-path <source-code-path>
```

To find the `source-code-path`, check the app's current deployment:
```bash
databricks apps get <your-app-name> --output json | jq -r '.active_deployment.source_code_path'
```

> **Note:** The postgres resource will be wiped on next `databricks bundle deploy`, so you must re-add it (step 3), re-grant (step 4), and redeploy (step 5) after each bundle deploy.

### 6. Use autoscaling env vars in your agent code

In `agent_server/agent.py`, read the project and branch from env vars:

```python
LAKEBASE_AUTOSCALING_PROJECT = os.getenv("LAKEBASE_AUTOSCALING_PROJECT") or None
LAKEBASE_AUTOSCALING_BRANCH = os.getenv("LAKEBASE_AUTOSCALING_BRANCH") or None
```

Then pass them to your memory store:

```python
from databricks_langchain import AsyncCheckpointSaver, AsyncDatabricksStore

# For short-term memory:
async with AsyncCheckpointSaver(
project=LAKEBASE_AUTOSCALING_PROJECT,
branch=LAKEBASE_AUTOSCALING_BRANCH,
) as saver:
...

# For long-term memory:
async with AsyncDatabricksStore(
project=LAKEBASE_AUTOSCALING_PROJECT,
branch=LAKEBASE_AUTOSCALING_BRANCH,
embedding_endpoint=EMBEDDING_ENDPOINT,
embedding_dims=EMBEDDING_DIMS,
) as store:
...
```

## Deploy Sequence Summary

1. `databricks bundle deploy` + `databricks bundle run` — uploads code and starts the app with PROJECT+BRANCH env vars
2. Add postgres resource via API (`PATCH /api/2.0/apps/<name>`) — grants the SP permissions to Lakebase
3. Grant table permissions via `scripts/grant_lakebase_permissions.py` — use the SP client ID from `databricks apps get`
4. **Redeploy the app** (`databricks apps deploy`) — injects frontend connection env vars from the postgres resource and restarts the app; migrations succeed because the SP already has CREATE on schemas

> **On subsequent `databricks bundle deploy`s:** DAB overwrites app resources, wiping the postgres resource. You must re-add it via API (step 2), re-grant (step 3), and redeploy (step 4) after each bundle deploy. The `LakebaseClient` grants persist and only need to be re-run if the service principal changes.

## Notes

- The agent backend uses `LAKEBASE_AUTOSCALING_PROJECT` and `LAKEBASE_AUTOSCALING_BRANCH` env vars to connect
- The postgres resource added via API grants the SP permissions to Lakebase; the frontend connection env vars are only injected when the app is redeployed (step 4)
- After adding the postgres resource, you **must redeploy** (`databricks apps deploy`) for the app to pick up those injected env vars
- For local development, set the same `LAKEBASE_AUTOSCALING_PROJECT` and `LAKEBASE_AUTOSCALING_BRANCH` in your `.env` file
- The permission grants persist across deployments, but must be re-run if the app's service principal changes
1 change: 1 addition & 0 deletions .claude/skills/add-tools-openai/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ See the `examples/` directory for complete YAML snippets:
| `sql-warehouse.yaml` | SQL warehouse | SQL execution |
| `serving-endpoint.yaml` | Model serving endpoint | Model inference |
| `genie-space.yaml` | Genie space | Natural language data |
| `lakebase-autoscaling.md` | Lakebase autoscaling postgres | Agent memory storage (autoscaling) |
| `experiment.yaml` | MLflow experiment | Tracing (already configured) |
| `custom-mcp-server.md` | Custom MCP apps | Apps starting with `mcp-*` |

Expand Down
147 changes: 147 additions & 0 deletions .claude/skills/add-tools-openai/examples/lakebase-autoscaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Autoscaling Postgres Lakebase Instances (not provisioned)

Autoscaling Lakebase postgres resources are **not yet supported as resource dependencies in `databricks.yml`**. Use `LAKEBASE_AUTOSCALING_PROJECT` and `LAKEBASE_AUTOSCALING_BRANCH` as static env vars, and add the postgres resource via API after deploy. The postgres resource serves two purposes: (1) granting the app's service principal access to Lakebase, and (2) on the next redeploy, injecting database connection env vars that the frontend (chat UI) needs.

## Steps

### 1. Add autoscaling env vars to `databricks.yml`

Add `LAKEBASE_AUTOSCALING_PROJECT` and `LAKEBASE_AUTOSCALING_BRANCH` as static `value:` env vars in your app's config block:

```yaml
# In databricks.yml - add to resources.apps.<app>.config.env:
- name: LAKEBASE_AUTOSCALING_PROJECT
value: "<your-project-name>"
- name: LAKEBASE_AUTOSCALING_BRANCH
value: "<your-branch-name>"
```

### 2. Deploy your agent app

```bash
databricks bundle deploy
databricks bundle run <your-app-resource-name> # from databricks.yml resources.apps.*
```

### 3. Add the postgres resource via API

After the app is deployed, add the postgres resource using the Databricks API. This grants the app's service principal access to Lakebase. The frontend connection env vars are injected later when the app is redeployed (step 5). The agent backend reads PROJECT+BRANCH from the static env vars you set in step 1.

**Important:** The PATCH replaces the entire `resources` list, so you must fetch existing resources first and append the postgres resource to preserve other resources (e.g., MLflow experiments added by DAB).

```bash
# 1. Fetch existing resources
EXISTING=$(databricks api get /api/2.0/apps/<your-app-name> | jq -c '.resources // []')

# 2. Append the postgres resource
UPDATED=$(echo "$EXISTING" | jq -c '. + [{
"name": "postgres",
"postgres": {
"branch": "projects/<project-id>/branches/<branch-id>",
"database": "projects/<project-id>/branches/<branch-id>/databases/<database-id>",
"permission": "CAN_CONNECT_AND_CREATE"
}
}]')

# 3. Patch with the merged list
databricks api patch /api/2.0/apps/<your-app-name> \
--json "{\"resources\": $UPDATED}"
```

Replace the placeholders:
- `<your-app-name>`: Your deployed app name (e.g., `agent-openai-sdk-stm`)
- `<project-id>`, `<branch-id>`, `<database-id>`: Look these up using the **postgres API** (see below)

#### Finding your project, branch, and database IDs

Autoscaling Lakebase uses the **postgres API** (`/api/2.0/postgres/`), NOT the database API. Do NOT use `/api/2.0/database/` or `/api/2.0/lakebase/` — those are for provisioned instances.

```bash
# List projects — find your project ID
databricks api get /api/2.0/postgres/projects

# List branches for a project
databricks api get /api/2.0/postgres/projects/<project-id>/branches

# List databases for a branch
databricks api get /api/2.0/postgres/projects/<project-id>/branches/<branch-id>/databases
```

API docs: https://docs.databricks.com/api/workspace/postgres

### 4. Grant table permissions to the app's service principal

The app's service principal needs permissions on the memory tables **before** the app redeploys and runs migrations. Use the `scripts/grant_lakebase_permissions.py` script included in the template.

First, get the service principal **client ID** (UUID format):

```bash
databricks apps get <your-app-name> --output json | jq -r '.service_principal_client_id'
```

Then run the grant script. Pass `--instance-name` for provisioned instances, or `--project` + `--branch` for autoscaling (the script also reads these from `.env` if set):

```bash
# Provisioned:
uv run python scripts/grant_lakebase_permissions.py <sp-client-id> --instance-name <name>

# Autoscaling:
uv run python scripts/grant_lakebase_permissions.py <sp-client-id> --project <project> --branch <branch>
```

> Upon first usage of stateful agent the schemas and tables below won't exist yet. Attempt the grants below, but if `grant_table` or `grant_all_tables_in_schema` calls fail because the table/schema doesn't exist, that's expected, not an error.

### 5. Redeploy the app to pick up the postgres resource

After adding the postgres resource and granting permissions, you **must** redeploy the app so it picks up the database connection env vars injected by the resource. The SP now has CREATE on schemas, so drizzle migrations will succeed on startup.

```bash
# Use `databricks apps deploy` (NOT `databricks bundle deploy`, which would wipe the resource you just added)
databricks apps deploy <your-app-name> \
--source-code-path <source-code-path>
```

To find the `source-code-path`, check the app's current deployment:
```bash
databricks apps get <your-app-name> --output json | jq -r '.active_deployment.source_code_path'
```

> **Note:** The postgres resource will be wiped on next `databricks bundle deploy`, so you must re-add it (step 3), re-grant (step 4), and redeploy (step 5) after each bundle deploy.

### 6. Use autoscaling env vars in your agent code

In `agent_server/agent.py`, read the project and branch from env vars:

```python
LAKEBASE_AUTOSCALING_PROJECT = os.getenv("LAKEBASE_AUTOSCALING_PROJECT") or None
LAKEBASE_AUTOSCALING_BRANCH = os.getenv("LAKEBASE_AUTOSCALING_BRANCH") or None
```

Then pass them to your memory session:

```python
from databricks_openai.agents import AsyncDatabricksSession

async with AsyncDatabricksSession(
project=LAKEBASE_AUTOSCALING_PROJECT,
branch=LAKEBASE_AUTOSCALING_BRANCH,
) as session:
result = await Runner.run(agent, input=messages, session=session)
```

## Deploy Sequence Summary

1. `databricks bundle deploy` + `databricks bundle run` — uploads code and starts the app with PROJECT+BRANCH env vars
2. Add postgres resource via API (`PATCH /api/2.0/apps/<name>`) — grants the SP permissions to Lakebase
3. Grant table permissions via `scripts/grant_lakebase_permissions.py` — use the SP client ID from `databricks apps get`
4. **Redeploy the app** (`databricks apps deploy`) — injects frontend connection env vars from the postgres resource and restarts the app; migrations succeed because the SP already has CREATE on schemas

> **On subsequent `databricks bundle deploy`s:** DAB overwrites app resources, wiping the postgres resource. You must re-add it via API (step 2), re-grant (step 3), and redeploy (step 4) after each bundle deploy. The `LakebaseClient` grants persist and only need to be re-run if the service principal changes.

## Notes

- The agent backend uses `LAKEBASE_AUTOSCALING_PROJECT` and `LAKEBASE_AUTOSCALING_BRANCH` env vars to connect
- The postgres resource added via API grants the SP permissions to Lakebase; the frontend connection env vars are only injected when the app is redeployed (step 4)
- After adding the postgres resource, you **must redeploy** (`databricks apps deploy`) for the app to pick up those injected env vars
- For local development, set the same `LAKEBASE_AUTOSCALING_PROJECT` and `LAKEBASE_AUTOSCALING_BRANCH` in your `.env` file
- The permission grants persist across deployments, but must be re-run if the app's service principal changes
11 changes: 11 additions & 0 deletions .claude/skills/deploy/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,17 @@ databricks apps get <app-name> --output json | jq '{app_status, compute_status}'
databricks apps get <app-name> --output json | jq -r '.url'
```

## Post-Deploy: Autoscaling Lakebase Resources

If the agent uses **autoscaling Lakebase** (user mentions "autoscaling", "project", or "branch" in the context of Lakebase), you must add the postgres resource via API **after** deploying, then redeploy:

1. Deploy the app first (`databricks bundle deploy` + `databricks bundle run`)
2. Add the postgres resource via API (`PATCH /api/2.0/apps/<name>`)
3. **Redeploy the app** (`databricks apps deploy`) — the app must be redeployed after adding the postgres resource so it picks up the database connection env vars injected by the resource (needed by the frontend/chat UI). Note: `databricks bundle run` does NOT redeploy — it only starts/restarts the app with the existing deployment, so new resource env vars won't be picked up. You must use `databricks apps deploy` instead.
4. Grant table permissions to the app's service principal — fetch the SP client ID via `databricks apps get <name> --output json | jq -r '.service_principal_client_id'`

**See `.claude/skills/add-tools/examples/lakebase-autoscaling.md` for complete steps.**

## Important Notes

- **App naming convention**: App names must be prefixed with `agent-` (e.g., `agent-my-assistant`, `agent-data-analyst`)
Expand Down
Loading