Update agent examples and quickstart to support Lakebase Autoscaling#151
Update agent examples and quickstart to support Lakebase Autoscaling#151jennsun wants to merge 18 commits intodatabricks:mainfrom
Conversation
- Add autoscaling (project/branch) as an alternative to provisioned Lakebase instances across all 3 memory templates - Detect app environment via DATABRICKS_APP_NAME and use PGENDPOINT (auto-injected by platform) for autoscaling in deployed apps - Update quickstart to support creating new Lakebase instances or connecting to existing provisioned/autoscaling instances - Use autoscaling_endpoint parameter (renamed in databricks-ai-bridge) - Add databricks-ai-bridge git dependency for autoscaling support - Update tests to remove update_databricks_yml_lakebase references Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
248b026 to
e39267c
Compare
| # Note: Using git reference until official release | ||
| "databricks-langchain[memory] @ git+https://github.com/databricks/databricks-ai-bridge@lakebase-autoscaling-support#subdirectory=integrations/langchain", | ||
| "databricks-ai-bridge @ git+https://github.com/databricks/databricks-ai-bridge@lakebase-autoscaling-support", | ||
| "databricks-agents>=1.9.3", |
There was a problem hiding this comment.
will remove after release
dhruv0811
left a comment
There was a problem hiding this comment.
Looks good overall!! Left some comments for local testing cleanup and some clarifications on lakebase options.
| database: | ||
| instance_name: '<your-lakebase-instance-name>' | ||
| database_name: 'databricks_postgres' | ||
| - name: 'postgres' |
There was a problem hiding this comment.
nit: couple comments around here saying this is for autoscaling, see below for provisioned?
| # TODO: Update with your Lakebase instance for session storage | ||
| # Option 1: Provisioned instance (set instance name) | ||
| # LAKEBASE_INSTANCE_NAME= | ||
| # Option 2: Autoscaling instance (set project and branch) |
There was a problem hiding this comment.
Did we not have a third option with just autoscaling_endpoint?
There was a problem hiding this comment.
the thinking here is that when users develop locally, they do so via passing in project/branch name. the autoscaling_endpoint is only used when we are in the app environment and we can directly read from PGENDPOINT (since it's not as user friendly to pass in) - hence adding this check: https://github.com/databricks/app-templates/pull/151/changes#diff-1e8ffff9ea9c91fc9dce7d8af129175d1fcfe357eabbcec14705feff13eeacacR53
| # Check for Lakebase access/connection errors | ||
| if any(keyword in error_msg for keyword in ["permission"]): | ||
| logger.error(f"Lakebase access error: {e}") | ||
| lakebase_desc = LAKEBASE_INSTANCE_NAME or LAKEBASE_AUTOSCALING_ENDPOINT or f"{LAKEBASE_AUTOSCALING_PROJECT}/{LAKEBASE_AUTOSCALING_BRANCH}" |
There was a problem hiding this comment.
Is it worth doing some validation like we did in the SDK to ensure that we only have one valid combination of variables set here? Say a user has project and branch set, but accidentally also set instance name, they should be notified rather than silently picking up instance name and short circuiting here.
There was a problem hiding this comment.
since the sdk covers all these cases/will throw errors, I think adding validation here would be a bit redundant
| LAKEBASE_AUTOSCALING_ENDPOINT = os.getenv("PGENDPOINT") if _is_app_env else None | ||
| LAKEBASE_AUTOSCALING_PROJECT = os.getenv("LAKEBASE_AUTOSCALING_PROJECT") or None | ||
| LAKEBASE_AUTOSCALING_BRANCH = os.getenv("LAKEBASE_AUTOSCALING_BRANCH") or None | ||
|
|
There was a problem hiding this comment.
bit of a chicken & egg prob here when DABs are not ready -
- in order for us to be able to pass in PGENDPOINT as a var, we need the autoscaling instance as a resource on the app
- DAB deployment will overwrite app resources, so it would remove manually-added postgres resource when we deploy
- we will therefore use LAKEBASE_AUTOSCALING_PROJECT and LAKEBASE_AUTOSCALING_BRANCH as
static env vars supported by our agent SDK
if we want endpoint extension in future, code will look something like:
# Autoscaling params: in the app environment, PGENDPOINT is provided automatically;
# for local dev, use project/branch names directly.
_is_app_env = bool(os.getenv("DATABRICKS_APP_NAME"))
LAKEBASE_AUTOSCALING_ENDPOINT = os.getenv("PGENDPOINT") if _is_app_env else None
LAKEBASE_AUTOSCALING_PROJECT = os.getenv("LAKEBASE_AUTOSCALING_PROJECT") or None
LAKEBASE_AUTOSCALING_BRANCH = os.getenv("LAKEBASE_AUTOSCALING_BRANCH") or None
dhruv0811
left a comment
There was a problem hiding this comment.
Added some comments based on a dry run with Claude.
| content = databricks_yml.read_text() | ||
| return "LAKEBASE_INSTANCE_NAME" in content | ||
| return ( | ||
| "LAKEBASE_INSTANCE_NAME" in content |
There was a problem hiding this comment.
nit: Is this supposed to be LAKEBASE_INSTANCE_NAME OR (LAKEBASE_AUTOSCALING_PROJECT AND LAKEBASE_AUTOSCALING_BRANCH)
| help="Databricks workspace URL (for initial setup)", | ||
| metavar="URL", | ||
| ) | ||
| parser.add_argument( |
There was a problem hiding this comment.
Could we add arguments for lakebase project and branch here? Makes it easier for agents to run quickstart if we support non-interactive mode.
There was a problem hiding this comment.
Do we want to sync this script like we sync quickstart btw?
| ) | ||
| parser.add_argument( | ||
| "--autoscaling-endpoint", | ||
| default=os.getenv("LAKEBASE_AUTOSCALING_ENDPOINT"), |
There was a problem hiding this comment.
This script is the only place that we use LAKEBASE_AUTOSCALING_ENDPOINT. Seems to be confusing for agents, can we either add documentation for this env var on what it refers to?
| - name: LAKEBASE_INSTANCE_NAME | ||
| value: "<your-lakebase-instance-name>" | ||
| # Autoscaling Lakebase config | ||
| - name: LAKEBASE_AUTOSCALING_PROJECT |
There was a problem hiding this comment.
Does it make it easier if we add a step in the quickstart script to update these variables based on what the user provides?
| value: "300" | ||
| - name: MLFLOW_EXPERIMENT_ID | ||
| value_from: "experiment" | ||
| - name: LAKEBASE_INSTANCE_NAME |
There was a problem hiding this comment.
Do we want to keep this var? Maybe just have it commented out, in case people don't go down the autoscaling path?
updated quickstart experience to select instance:


examples work

ex app: https://dbc-fb904e0e-bca7.staging.cloud.databricks.com/apps/agent-longtermj?o=4146361371977516