This file provides guidance to AI assistants when working with code in this repository.
This is the Databricks CLI, a command-line interface for interacting with Databricks workspaces and managing Databricks Assets Bundles (DABs). The project is written in Go and follows a modular architecture.
When moving code from one place to another, please don't unnecessarily change the code or omit parts.
make build- Build the CLI binarymake test- Run unit tests for all packagesgo test ./acceptance -run TestAccept/bundle/<path>/<to>/<folder> -tail -test.v- run a single acceptance testmake integration- Run integration tests (requires environment variables)make cover- Generate test coverage reports
make lint- Run linter on changed files only (uses lintdiff.py)make lintfull- Run full linter with fixes (golangci-lint)make ws- Run whitespace lintermake fmt- Format code (Go, Python, YAML)make checks- Run quick checks (tidy, whitespace, links)
make schema- Generate bundle JSON schemamake docs- Generate bundle documentationmake generate- Generate CLI code from OpenAPI spec (requires universe repo)
Use "git rm" to remove and "git mv" to rename files instead of directly modifying files on FS.
If asked to rebase, always prefix each git command with appropriate settings so that it never launches interactive editor. GIT_EDITOR=true GIT_SEQUENCE_EDITOR=true VISUAL=true GIT_PAGER=cat git fetch origin main && GIT_EDITOR=true GIT_SEQUENCE_EDITOR=true VISUAL=true GIT_PAGER=cat git rebase origin/main
cmd/ - CLI command structure using Cobra framework
cmd/cmd.go- Main command setup and subcommand registrationcmd/bundle/- Bundle-related commands (deploy, validate, etc.)cmd/workspace/- Workspace API commands (auto-generated)cmd/account/- Account-level API commands (auto-generated)
bundle/ - Core bundle functionality for Databricks Asset Bundles
bundle/bundle.go- Main Bundle struct and lifecycle managementbundle/config/- Configuration loading, validation, and schemabundle/deploy/- Deployment logic (Terraform and direct modes)bundle/mutator/- Configuration transformation pipelinebundle/phases/- High-level deployment phases
libs/ - Shared libraries and utilities
libs/dyn/- Dynamic configuration value manipulationlibs/filer/- File system abstraction (local, DBFS, workspace)libs/auth/- Databricks authentication handlinglibs/sync/- File synchronization between local and remote
Bundles: Configuration-driven deployments of Databricks resources (jobs, pipelines, etc.). The bundle system uses a mutator pattern where each transformation is a separate, testable component.
Mutators: Transform bundle configuration through a pipeline. Located in bundle/config/mutator/ and bundle/mutator/. Each mutator implements the Mutator interface.
Direct vs Terraform Deployment: The CLI supports two deployment modes controlled by DATABRICKS_BUNDLE_ENGINE environment variable:
terraform(default) - Uses Terraform for resource managementdirect- Direct API calls without Terraform
Please make sure code that you author is consistent with the codebase and concise.
The code should be self-documenting based on the code and function names.
Functions should be documented with a doc comment as follows:
// SomeFunc does something. func SomeFunc() { ... }
Note how the comment starts with the name of the function and is followed by a period.
Avoid redundant and verbose comments. Use terse comments and only add comments if it complements, not repeats the code.
Focus on making implementation as small and elegant as possible. Avoid unnecessary loops and allocations. If you see an opportunity of making things simpler by dropping or relaxing some requirements, ask user about the trade-off.
Use modern idiomatic Golang features (version 1.24+). Specifically:
- Use for-range for integer iteration where possible. Instead of for i:=0; i < X; i++ {} you must write for i := range X{}.
- Use builtin min() and max() where possible (works on any type and any number of values).
- Do not capture the for-range variable, since go 1.22 a new copy of the variable is created for each loop iteration.
- Bundle config uses
dyn.Valuefor dynamic typing - Config loading supports includes, variable interpolation, and target overrides
- Schema generation is automated from Go struct tags
When writing Python scripts, we bias for conciseness. We think of Python in this code base as scripts.
- use Python 3.11
- Do not catch exceptions to make nicer messages, only catch if you can add critical information
- use pathlib.Path in almost all cases over os.path unless it makes code longer
- Do not add redundant comments.
- Try to keep your code small and the number of abstractions low.
- After done, format you code with "ruff format -n "
- Use "#!/usr/bin/env python3" shebang.
- Unit tests: Standard Go tests alongside source files
- Integration tests:
integration/directory, requires live Databricks workspace - Acceptance tests:
acceptance/directory, uses mock HTTP server
Each file like process_target_mode_test.go should have a corresponding test file like process_target_mode_test.go. If you add new functionality to a file, the test file should be extended to cover the new functionality.
Tests should look like the following:
package mutator_test
func TestApplySomeChangeReturnsDiagnostics(t *testing.T) { ... }
func TestApplySomeChangeFixesThings(t *testing.T) { ctx := context.Background() b, err := ...some operation... require.NoError(t, err) ... assert.Equal(t, ...) }
Notice that:
- Tests are often in the same package but suffixed wit _test.
- The test names are prefixed with Test and are named after the function or module they are testing.
- 'require' and 'require.NoError' are used to check for things that would cause the rest of the test case to fail.
- 'assert' is used to check for expected values where the rest of the test is not expected to fail.
When writing tests, please don't include an explanation in each test case in your responses. I am just interested in the tests.
- Located in
acceptance/with nested directory structure - Each test directory contains
databricks.yml,script, andoutput.txt - Run with
go test ./acceptance -run TestAccept/bundle/<path>/<to>/<folder> -tail -test.v - Use
-updateflag to regenerate expected output files - When a test fails because it has an old output, just run it one more time with an
-updateflag instead of changing theoutput.txtdirectly
When asked to update acceptance tests, follow this workflow:
-
Run the update command:
- For all acceptance tests:
make test-update - When asked to update acceptance tests for templates specifically:
make test-update-templates
- For all acceptance tests:
-
Verify code quality:
- Run
make fmtandmake lint - Critical: If these commands modify any files in
acceptance/, this indicates an issue in the source files (e.g., inlibs/template/templates/for template tests)!
- Run
-
Fix the root cause:
- Never manually edit files in
acceptance/- they are auto-generated outputs - Find and fix the corresponding source file that generated the problematic acceptance test output
- For template tests: fix files in
libs/template/templates/ - Common issues: trailing whitespace, missing/extra newlines, formatting problems
- Never manually edit files in
-
Regenerate after fixing:
- After fixing the source files, run the update command again (e.g.,
make test-update-templates) - This regenerates the acceptance test outputs from the corrected sources
- Now
make fmtandmake lintshould pass with no changes
- After fixing the source files, run the update command again (e.g.,
Example workflow:
# Update acceptance tests
make test-update # or make test-update-templates for templates only
# Check for issues - if these modify files in acceptance/, you have a source file problem
make fmt
make lint
# If there are modifications in acceptance/:
# 1. Find the corresponding source file (e.g., in libs/template/templates/ for templates)
# 2. Fix the issue there (e.g., whitespace, newlines)
# 3. Regenerate from the fixed source
make test-update # or make test-update-templates
# Verify everything is clean
make fmt # Should show no changes now
make lint # Should show no issues nowKey principle: Files in acceptance/ are outputs, not sources. Always fix the source files and regenerate.
Use the following for logging:
import "github.com/databricks/cli/libs/log"
log.Infof(ctx, "...")
log.Debugf(ctx, "...")
log.Warnf(ctx, "...")
log.Errorf(ctx, "...")
Note that the 'ctx' variable here is something that should be passed in as an argument by the caller. We should not use context.Background() like we do in tests.
Use cmdio.LogString to print to stdout:
import "github.com/databricks/cli/libs/cmdio"
cmdio.LogString(ctx, "...")
A databricks_template_schema.json file is used to configure bundle templates.
Below is a good reference template:
{ "welcome_message": "\nWelcome to the dbt template for Databricks Asset Bundles!\n\nA workspace was selected based on your current profile. For information about how to change this, see https://docs.databricks.com/dev-tools/cli/profiles.html.\nworkspace_host: {{workspace_host}}", "properties": { "project_name": { "type": "string", "pattern": "^[A-Za-z_][A-Za-z0-9-_]+$", "pattern_match_failure_message": "Name must consist of letters, numbers, dashes, and underscores.", "default": "dbt_project", "description": "\nPlease provide a unique name for this project.\nproject_name", "order": 1 }, "http_path": { "type": "string", "pattern": "^/sql/.\../warehouses/[a-z0-9]+$", "pattern_match_failure_message": "Path must be of the form /sql/1.0/warehouses/", "description": "\nPlease provide the HTTP Path of the SQL warehouse you would like to use with dbt during development.\nYou can find this path by clicking on "Connection details" for your SQL warehouse.\nhttp_path [example: /sql/1.0/warehouses/abcdef1234567890]", "order": 2 }, "default_catalog": { "type": "string", "default": "{{default_catalog}}", "pattern": "^\w*$", "pattern_match_failure_message": "Invalid catalog name.", "description": "\nPlease provide an initial catalog{{if eq (default_catalog) ""}} (leave blank when not using Unity Catalog){{end}}.\ndefault_catalog", "order": 3 }, "personal_schemas": { "type": "string", "description": "\nWould you like to use a personal schema for each user working on this project? (e.g., 'catalog.{{short_name}}')\npersonal_schemas", "enum": [ "yes, use a schema based on the current user name during development", "no, use a shared schema during development" ], "order": 4 }, "shared_schema": { "skip_prompt_if": { "properties": { "personal_schemas": { "const": "yes, use a schema based on the current user name during development" } } }, "type": "string", "default": "default", "pattern": "^\w+$", "pattern_match_failure_message": "Invalid schema name.", "description": "\nPlease provide an initial schema during development.\ndefault_schema", "order": 5 } }, "success_message": "\n📊 Your new project has been created in the '{{.project_name}}' directory!\nIf you already have dbt installed, just type 'cd {{.project_name}}; dbt init' to get started.\nRefer to the README.md file for full "getting started" guide and production setup instructions.\n" }
Notice that:
- The welcome message has the template name.
- By convention, property messages include the property name after a newline, e.g. default_catalog above has a description that says "\nPlease provide an initial catalog [...].\ndefault_catalog",
- Each property defines a variable that is used for the template.
- Each property has a unique 'order' value that increments by 1 with each property.
- Enums use 'type: "string' and have an 'enum' field with a list of possible values.
- Helpers such as {{default_catalog}} and {{short_name}} can be used within property descriptors.
- Properties can be referenced in messages and descriptions using {{.property_name}}. {{.project_name}} is an example.
- Run
make checks fmt lintbefore committing - Use
make test-updateto regenerate acceptance test outputs after changes - The CLI binary supports both
databricksandpipelinescommand modes based on executable name - Resource definitions in
bundle/config/resources/are auto-generated from OpenAPI specs