-
Notifications
You must be signed in to change notification settings - Fork 58
Add marketplace-provisioning app #732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
michellejm
wants to merge
4
commits into
databrickslabs:main
Choose a base branch
from
michellejm:marketplace-provisioning
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
561dbf2
Add marketplace-provisioning app
michellejm 5d315df
Add marketplace-provisioning README frontmatter and CODEOWNERS entry
michellejm 0bca3d0
Security hardening and Marketplace-readiness pass on marketplace-prov…
michellejm ae6c676
Address Copilot review feedback on marketplace-provisioning
michellejm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| # Python | ||
| __pycache__/ | ||
| *.py[cod] | ||
| .venv/ | ||
| venv/ | ||
| *.egg-info/ | ||
|
|
||
| # Node | ||
| node_modules | ||
| npm-debug.log* | ||
|
|
||
| # SQLite | ||
| *.db | ||
| *.db-wal | ||
| *.db-shm | ||
|
|
||
| # Secrets / environment | ||
| .env | ||
|
|
||
| # IDE | ||
| .idea/ | ||
| .vscode/ | ||
| *.swp | ||
|
|
||
| # Databricks | ||
| .databricks/ | ||
|
|
||
| # OS | ||
| .DS_Store | ||
| Thumbs.db |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,219 @@ | ||
| --- | ||
| title: "Marketplace Provisioning App" | ||
| language: python | ||
| author: "Michelle McSweeney" | ||
| date: 2026-04-20 | ||
|
|
||
| tags: | ||
| - marketplace | ||
| - genie | ||
| - databricks-apps | ||
| - free-edition | ||
| - provisioning | ||
| --- | ||
|
|
||
| # Marketplace Provisioning App | ||
|
|
||
| ## Goal | ||
|
|
||
| Provision Genie spaces and AI/BI dashboards for non-technical users on Databricks Free Edition — without requiring them to run a notebook. | ||
|
|
||
| ## Approach: Marketplace App (ephemeral provisioner) | ||
|
|
||
| 1. User clicks "Get" on a Marketplace App listing | ||
| 2. App installs and runs in their workspace | ||
| 3. App's service principal creates Genie spaces + dashboards via REST API | ||
| 4. App grants permissions to workspace users | ||
| 5. User deletes the app (frees up the single Free Edition app slot) | ||
|
|
||
| ### Why this approach | ||
|
|
||
| | Alternative considered | Why it doesn't work | | ||
| |---|---| | ||
| | Marketplace data product | Only supports data assets (tables, volumes, models) — not Genie spaces or dashboards | | ||
| | DABS | Requires CLI installation + auth — not viable for non-technical users | | ||
| | "Open in Databricks" | Engineering done but not launched — no timeline | | ||
| | Hosted external web app | Major build (OAuth integration), unclear Free Edition support | | ||
|
|
||
| ### Why a Marketplace App is feasible for us | ||
|
|
||
| Marketplace Apps are first-party only (must be in `databricks` or `databricks-labs` GitHub org). This is fine since this is a Databricks-internal project. Requires going through the Marketplace listing SOP with the Marketplace team. | ||
|
|
||
| ## API Details | ||
|
|
||
| ### Genie Space Creation | ||
|
|
||
| - **Endpoint**: `POST /api/2.0/genie/spaces` (GA since March 2026) | ||
| - **Python SDK**: `w.genie.create_space(warehouse_id, serialized_space, title, parent_path)` | ||
| - **Key payload**: `serialized_space` JSON string containing tables, instructions, sample questions, example SQL, join specs | ||
| - **Limits**: Max 30 tables/space, 100 instructions, 100 example SQL queries | ||
| - **Permissions after creation**: Must use `PUT /api/2.0/permissions/genie/{space_id}` (Genie API has no built-in permissions management) | ||
|
|
||
| #### Serialized Space Structure | ||
|
|
||
| ```json | ||
| { | ||
| "version": 2, | ||
| "config": { | ||
| "sample_questions": [{"id": "<32-char-hex>", "question": ["..."]}] | ||
| }, | ||
| "data_sources": { | ||
| "tables": [ | ||
| { | ||
| "identifier": "catalog.schema.table", | ||
| "description": ["..."], | ||
| "column_configs": [{"column_name": "...", "description": ["..."], "synonyms": [...]}] | ||
| } | ||
| ] | ||
| }, | ||
| "instructions": { | ||
| "text_instructions": [{"id": "<32-char-hex>", "content": ["..."]}], | ||
| "example_question_sqls": [{"id": "<32-char-hex>", "question": ["..."], "sql": ["..."]}] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| IDs must be 32-character lowercase hex. Collections must be sorted alphabetically by identifier. | ||
|
|
||
| ### Dashboard Creation + Publishing | ||
|
|
||
| - **Create draft**: `POST /api/2.0/lakeview/dashboards` | ||
| - **Publish**: `POST /api/2.0/lakeview/dashboards/{dashboard_id}/published` | ||
| - **Python SDK**: `w.lakeview.create(dashboard)` then `w.lakeview.publish(dashboard_id, embed_credentials, warehouse_id)` | ||
|
|
||
| #### Create Request | ||
|
|
||
| ```python | ||
| from databricks.sdk import WorkspaceClient | ||
| from databricks.sdk.service.dashboards import Dashboard | ||
|
|
||
| w = WorkspaceClient() | ||
|
|
||
| dashboard = w.lakeview.create( | ||
| dashboard=Dashboard( | ||
| display_name="My Dashboard", | ||
| serialized_dashboard='{"pages": [...], "datasets": [...]}', | ||
| warehouse_id="<warehouse_id>", | ||
| parent_path="/Users/user@example.com" | ||
| ) | ||
| ) | ||
|
|
||
| w.lakeview.publish( | ||
| dashboard_id=dashboard.dashboard_id, | ||
| embed_credentials=True, | ||
| warehouse_id="<warehouse_id>" | ||
| ) | ||
| ``` | ||
|
|
||
| #### Serialized Dashboard Structure | ||
|
|
||
| ```json | ||
| { | ||
| "pages": [ | ||
| { | ||
| "name": "<uuid>", | ||
| "displayName": "Page Title", | ||
| "layout": [ | ||
| { | ||
| "widget": {"name": "<uuid>", "queries": [...], "spec": {...}}, | ||
| "position": {"x": 0, "y": 0, "width": 6, "height": 4} | ||
| } | ||
| ] | ||
| } | ||
| ], | ||
| "datasets": [ | ||
| { | ||
| "name": "<uuid>", | ||
| "displayName": "Dataset Name", | ||
| "query": "SELECT * FROM catalog.schema.table" | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| No formal JSON schema exists — reverse-engineer by exporting a prototype dashboard. | ||
|
|
||
| #### embed_credentials | ||
|
|
||
| - `true` ("Shared data permission"): Viewers query using publisher's credentials. Better performance (shared cache). Publisher must have SELECT on all tables. | ||
| - `false` ("Individual data permission"): Each viewer needs their own table access. | ||
| - **Known issue**: Delta Sharing tables do NOT work with `embed_credentials: true`. | ||
|
|
||
| ### Permissions Management | ||
|
|
||
| After creating objects, the app must grant access: | ||
|
|
||
| - **Dashboards**: `PUT /api/2.0/permissions/dashboards/{workspace_object_id}` | ||
| - **Genie spaces**: `PUT /api/2.0/permissions/genie/{space_id}` | ||
|
|
||
| The service principal is the owner by default — other users have no access until explicitly granted. | ||
|
|
||
| ## Free Edition Constraints | ||
|
|
||
| | Constraint | Impact | | ||
| |---|---| | ||
| | 1 app per account | App must be deleted after provisioning to free the slot | | ||
| | Apps auto-stop after 24 hours | Fine — provisioning takes seconds | | ||
| | SP creation reported as unreliable | **Biggest risk** — must validate early | | ||
| | Serverless SQL warehouse (quota-limited) | Genie + dashboards need a warehouse — should work | | ||
| | Genie available on Free Edition | Confirmed via Slack (with some bugs tracked) | | ||
| | Dashboards available on Free Edition | Confirmed via docs and blog posts | | ||
|
|
||
| ## Build Plan | ||
|
|
||
| ### Step 1: Validate (do this first) | ||
|
|
||
| - [ ] Confirm a Marketplace App can install and run its service principal on Free Edition | ||
| - [ ] Confirm the SP can create a Genie space on Free Edition | ||
| - [ ] Confirm the SP can create + publish a dashboard on Free Edition | ||
| - [ ] Confirm the SP can grant permissions to workspace users | ||
|
|
||
| ### Step 2: Build prototype | ||
|
|
||
| - [ ] Create target Genie space + dashboard manually in a dev workspace | ||
| - [ ] Export `serialized_space` and `serialized_dashboard` JSON via API | ||
| - [ ] Build minimal app (Streamlit/Flask) that provisions from bundled configs | ||
| - [ ] App flow: startup -> detect warehouse -> create Genie space -> create + publish dashboard -> grant permissions -> show "Done" screen with links | ||
|
|
||
| ### Step 3: Marketplace listing | ||
|
|
||
| - [ ] Host app in `databricks` or `databricks-labs` GitHub org | ||
| - [ ] Go through Marketplace listing SOP | ||
| - [ ] Contacts: Jianyu Zhou (Marketplace Apps eng), DJ Sharkey (Marketplace), Tia Chang (Marketplace PM) | ||
|
|
||
| ### Step 4: Self-cleanup | ||
|
|
||
| - [ ] Test whether the app can call `DELETE /api/2.0/apps/{app_name}` on itself | ||
| - [ ] Fallback: show "Done — you can delete this app" with instructions | ||
|
|
||
| ## Key Contacts | ||
|
|
||
| | Person | Role | Relevance | | ||
| |---|---|---| | ||
| | Jianyu Zhou | Marketplace Apps engineer | Marketplace App listing process | | ||
| | DJ Sharkey | Marketplace | Listing process | | ||
| | Tia Chang | Marketplace PM | Feature requests, listing approval | | ||
| | Naim Achahboun | Apps team | Recommended DABS pattern, Apps technical questions | | ||
| | Will Valori | Free Edition PM | Free Edition capabilities/limitations | | ||
| | Miranda Luna | PM, AI/BI Genie | Genie API, limits, roadmap | | ||
| | Hanlin Sun | PM/Eng, AI/BI Genie | Pricing, API limits | | ||
|
|
||
| ## Relevant Slack Channels | ||
|
|
||
| - `#ai-bi-genie` — Genie questions | ||
| - `#eng-databricks-apps` — Databricks Apps + Genie resources | ||
| - `#free-edition` — Free Edition feature availability | ||
| - `#apa-agent-bricks` — Agent Bricks + Genie integration | ||
|
|
||
| ## Research Sources | ||
|
|
||
| - [Genie API reference](https://docs.databricks.com/api/workspace/genie) | ||
| - [Genie Space Import/Export guide](https://docs.google.com/document/d/14hsOeDAtylMlSKVkakbYMGBIFr_SihR-Rrij5NCnT4k) | ||
| - [Dashboard CRUD API tutorial](https://docs.databricks.com/aws/en/dashboards/tutorials/dashboard-crud-api) | ||
| - [Lakeview API reference](https://docs.databricks.com/api/workspace/lakeview/create) | ||
| - [Dashboard permissions](https://docs.databricks.com/aws/en/dashboards/tutorials/manage-permissions) | ||
| - [Python SDK - Genie](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/dashboards/genie.html) | ||
| - [Python SDK - Lakeview](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/dashboards/lakeview.html) | ||
| - [Apps FAQ (go/apps/faq)](https://go/apps/faq) | ||
| - [AI/BI FAQ (go/aibi/faq)](https://docs.google.com/document/d/1vjcYSiTilHDRppuas9Kh8uAw_TOFC-62U2-nweRmx_I) | ||
| - [Google Doc with full research](https://docs.google.com/document/d/17UNJYk2jk_T21RtTMM7Elky_9Id2wGmmmb6jAi5VE9A/edit) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| # Security Policy | ||
|
|
||
| ## Reporting a Vulnerability | ||
|
|
||
| If you discover a security vulnerability in this project, please report it responsibly. | ||
|
|
||
| **Do not open a public GitHub issue for security vulnerabilities.** | ||
|
|
||
| Instead, please email [security@databricks.com](mailto:security@databricks.com) with: | ||
|
|
||
| - A description of the vulnerability | ||
| - Steps to reproduce the issue | ||
| - Any potential impact assessment | ||
|
|
||
| You will receive an acknowledgment within 48 hours and a detailed response within 5 business days. | ||
|
|
||
| ## Supported Versions | ||
|
|
||
| | Version | Supported | | ||
| |---------|-----------| | ||
| | Latest | Yes | | ||
|
|
||
| ## Authentication Model | ||
|
|
||
| This app uses the **Databricks App service principal** for all Databricks | ||
| operations (SQL statements, Files API, Genie space CRUD, SCIM lookups, | ||
| permission grants). Credentials are obtained from the | ||
| `databricks.sdk.WorkspaceClient` at runtime — the app never reads a | ||
| `DATABRICKS_TOKEN` environment variable and never accepts a PAT from the | ||
| user. | ||
|
|
||
| ### Why service-principal-only (no on-behalf-of-user) | ||
|
|
||
| Provisioning a scenario requires privileged operations that an unprivileged | ||
| end user typically cannot perform in a shared workspace: | ||
|
|
||
| - Creating Unity Catalog catalogs, schemas, volumes, and tables | ||
| - Uploading CSV/Parquet data into managed volumes | ||
| - Granting `USE_CATALOG`, `USE_SCHEMA`, and `SELECT` to the end user | ||
| - Creating Genie spaces and granting the user `CAN_RUN` | ||
|
|
||
| The Marketplace deployment targets Databricks Free Edition workspaces where | ||
| the installing user *is* the workspace admin, so the app service principal | ||
| runs with admin-equivalent authority by design. Users only receive the | ||
| minimum grants needed to query their provisioned scenario. | ||
|
|
||
| ### What the app never does | ||
|
|
||
| - Does not store or log user tokens, session cookies, or PATs | ||
| - Does not accept credentials via the UI or API payloads | ||
| - Does not call external (non-Databricks) services | ||
| - Does not persist customer data beyond the user's own workspace | ||
|
|
||
| ## Input Validation | ||
|
|
||
| - All Unity Catalog identifiers (catalog, schema, table, column names) are | ||
| validated against a strict identifier regex before being interpolated | ||
| into SQL. See `provisioner._check_ident` / `_check_dotted_ident`. | ||
| - Parameterized SQL bindings are used for all user-supplied values. | ||
| - Mystery text is length-capped and passed through a word-boundary | ||
| profanity filter. | ||
|
|
||
| ## Security Practices | ||
|
|
||
| - Dependencies are tracked in `requirements.txt` (Python) and | ||
| `package.json` (Node.js). | ||
| - No user credentials are stored by the application. | ||
| - The app uses SQLite for transient game state only (reset on each | ||
| deployment). | ||
| - The in-memory provisioning-status dict is capped to prevent unbounded | ||
| growth. | ||
| - Exception messages surfaced to the UI are sanitized; full tracebacks are | ||
| only written to server logs. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The README’s Genie API section states
serialized_spaceuses"version": 1, butbuild_serialized_space()currently emitsversion: 2. If version 2 is required, update the README example to avoid misleading future changes/debugging; if not, align the implementation with the documented version.