Add marketplace-provisioning app#732
Conversation
|
Please sign all commits |
3e3097a to
d6eb4c5
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new marketplace-provisioning/ Databricks App (“Data Detective”) that provisions Unity Catalog data + Genie spaces for several prebuilt business-mystery scenarios, with a React frontend for gameplay and a FastAPI backend for provisioning/scoring.
Changes:
- Introduces a FastAPI backend (
app.py) with SQLite persistence, background provisioning, SSE/polling status APIs, and keyword-based scoring. - Adds a provisioning engine (
provisioner.py) that creates catalogs/schemas/volumes/tables and creates/updates Genie spaces from bundled scenario configs. - Adds a React/Vite frontend (and committed build output under
static/) plus scenario JSON/CSV assets and Marketplace metadata.
Reviewed changes
Copilot reviewed 46 out of 55 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| marketplace-provisioning/uv.lock | Adds uv lockfile stub for the new Python project. |
| marketplace-provisioning/static/index.html | Adds built frontend entry HTML referencing bundled assets. |
| marketplace-provisioning/static/assets/index-DrcDyRBK.css | Adds bundled frontend CSS output. |
| marketplace-provisioning/scenarios/registry.py | Defines scenario registry and answer keys used for scoring. |
| marketplace-provisioning/scenarios/operations/genie_space.json | Genie space config for Operations scenario (tables/instructions/sample SQL). |
| marketplace-provisioning/scenarios/operations/data/products.csv | Operations scenario dataset: products. |
| marketplace-provisioning/scenarios/operations/data/fulfillment_partners.csv | Operations scenario dataset: fulfillment partners. |
| marketplace-provisioning/scenarios/marketing/genie_space.json | Genie space config for Marketing scenario. |
| marketplace-provisioning/scenarios/marketing/data/campaigns.csv | Marketing scenario dataset: campaigns. |
| marketplace-provisioning/scenarios/financial-planning/genie_space.json | Genie space config for Financial Planning scenario. |
| marketplace-provisioning/scenarios/financial-planning/data/exception_approvals.csv | Financial Planning dataset: exception approvals. |
| marketplace-provisioning/scenarios/financial-planning/data/cost_center_master.csv | Financial Planning dataset: cost center master. |
| marketplace-provisioning/scenarios/init.py | Marks scenarios as a Python package. |
| marketplace-provisioning/scenarios/HR/genie_space.json | Genie space config for HR scenario. |
| marketplace-provisioning/scenarios/HR/data/resignations.csv | HR dataset: resignations. |
| marketplace-provisioning/scenarios/HR/data/managers.csv | HR dataset: managers. |
| marketplace-provisioning/requirements.txt | Python runtime deps for backend/provisioner. |
| marketplace-provisioning/pyproject.toml | Project metadata + Ruff config and per-file ignores. |
| marketplace-provisioning/provisioner.py | Implements UC + Genie provisioning, retries, SQL execution, and idempotency checks. |
| marketplace-provisioning/manifest.yaml | Marketplace manifest (scopes/description). |
| marketplace-provisioning/frontend/vite.config.ts | Vite config to build into ../static and proxy /api locally. |
| marketplace-provisioning/frontend/tsconfig.json | TypeScript config for the React frontend. |
| marketplace-provisioning/frontend/src/types.ts | Shared frontend TS types for accounts/submissions/provisioning events. |
| marketplace-provisioning/frontend/src/profanityFilter.ts | Frontend nickname profanity filter helper. |
| marketplace-provisioning/frontend/src/main.tsx | Frontend entrypoint mounting the React app. |
| marketplace-provisioning/frontend/src/index.css | Frontend source CSS (also used for build output). |
| marketplace-provisioning/frontend/src/components/QuestionView.tsx | Main gameplay view: evidence entry, scoring, answer reveal, leaderboard link. |
| marketplace-provisioning/frontend/src/components/ProvisioningStatus.tsx | Provisioning progress UI using SSE with polling fallback. |
| marketplace-provisioning/frontend/src/components/NicknameEntry.tsx | Entry flow: nickname + scenario selection. |
| marketplace-provisioning/frontend/src/components/EvidenceField.tsx | Evidence input component (text/image/CSV with drag-drop). |
| marketplace-provisioning/frontend/src/components/CaseFile.tsx | Read-only view of last submission/evidence/score. |
| marketplace-provisioning/frontend/src/api.ts | Frontend API client for backend endpoints. |
| marketplace-provisioning/frontend/src/App.tsx | App router/state management + session persistence. |
| marketplace-provisioning/frontend/package.json | Frontend deps/scripts (React/Vite/TypeScript). |
| marketplace-provisioning/frontend/index.html | Vite dev HTML entry. |
| marketplace-provisioning/databricks.yml | Bundle config for deploying the app. |
| marketplace-provisioning/app.yaml | App command definition (uvicorn). |
| marketplace-provisioning/app.py | FastAPI backend: SQLite models, provisioning threads, SSE, scoring, static hosting. |
| marketplace-provisioning/SECURITY.md | Documents auth model, validation, and security posture. |
| marketplace-provisioning/README.md | Design/approach documentation and operational notes. |
| marketplace-provisioning/.gitignore | Ignores venv, node_modules, sqlite DB files, etc. |
| CODEOWNERS | Adds code owner entry for marketplace-provisioning. |
Files not reviewed (1)
- marketplace-provisioning/frontend/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class EvidenceItemModel(BaseModel): | ||
| field_order: int = Field(..., ge=1, le=MAX_EVIDENCE) | ||
| type: str = Field(..., pattern="^(text|image|csv)$") | ||
| content: str = "" | ||
|
|
||
|
|
||
| class CreateSubmission(BaseModel): | ||
| account_id: str | ||
| solution: str = "" | ||
| recommendation: str = "" | ||
| evidence: list[EvidenceItemModel] = [] | ||
|
|
There was a problem hiding this comment.
EvidenceItemModel.content has no size limit, and the server doesn’t validate payload size for evidence uploads. A client can submit arbitrarily large text/base64 content (especially for images), which can bloat SQLite and exhaust memory/disk (DoS risk). Add max length constraints in the Pydantic models and/or enforce per-item and total evidence size limits in create_submission().
| _check_ident(scenario_key, "scenario_key") | ||
|
|
There was a problem hiding this comment.
provision_scenario() rejects scenario keys containing hyphens via _check_ident(scenario_key, ...), but scenarios/registry.py defines at least one scenario key as financial-planning and the directory name matches that. This will raise ValueError and prevent provisioning that scenario. Consider relaxing validation for scenario_key (it’s only used for filesystem path/locking) or using a separate path-safe validator that allows - while keeping strict SQL identifier validation for catalog/table/column names.
| # Grant access on catalog | ||
| report("grants", "Securing access...", 6) | ||
| execute_sql(host, headers, warehouse_id, | ||
| f"GRANT USE CATALOG ON CATALOG {catalog_name} TO `account users`") | ||
| execute_sql(host, headers, warehouse_id, | ||
| f"GRANT USE SCHEMA ON CATALOG {catalog_name} TO `account users`") | ||
| execute_sql(host, headers, warehouse_id, | ||
| f"GRANT SELECT ON CATALOG {catalog_name} TO `account users`") |
There was a problem hiding this comment.
The GRANT statements appear to use incorrect Unity Catalog syntax: GRANT USE SCHEMA ON CATALOG ... and GRANT SELECT ON CATALOG ... are not valid in standard UC SQL (USE SCHEMA is granted on a schema, and SELECT is typically granted on schemas/tables). This will likely fail at runtime and stop provisioning. Update these grants to target the schema (e.g., catalog_name.default) and grant SELECT at the appropriate scope (schema or specific tables).
| # Grant access on catalog | |
| report("grants", "Securing access...", 6) | |
| execute_sql(host, headers, warehouse_id, | |
| f"GRANT USE CATALOG ON CATALOG {catalog_name} TO `account users`") | |
| execute_sql(host, headers, warehouse_id, | |
| f"GRANT USE SCHEMA ON CATALOG {catalog_name} TO `account users`") | |
| execute_sql(host, headers, warehouse_id, | |
| f"GRANT SELECT ON CATALOG {catalog_name} TO `account users`") | |
| # Grant access on catalog and schema | |
| report("grants", "Securing access...", 6) | |
| execute_sql(host, headers, warehouse_id, | |
| f"GRANT USE CATALOG ON CATALOG {catalog_name} TO `account users`") | |
| execute_sql(host, headers, warehouse_id, | |
| f"GRANT USE SCHEMA ON SCHEMA {catalog_name}.`default` TO `account users`") | |
| execute_sql(host, headers, warehouse_id, | |
| f"GRANT SELECT ON SCHEMA {catalog_name}.`default` TO `account users`") |
| @@ -0,0 +1,301 @@ | |||
| @import url('https://fonts.googleapis.com/css2?family=Playfair+Display:ital,wght@0,400;0,600;0,700;1,400&family=Lora:ital,wght@0,400;0,500;0,600;1,400&display=swap'); | |||
There was a problem hiding this comment.
This CSS imports Google Fonts from fonts.googleapis.com, which triggers external network requests from the user's browser. That conflicts with the PR description/manifest claim of “no external network calls” and may also fail in locked-down environments. Consider bundling fonts locally or removing the external @import and using system fonts only.
| // Also check if any profane word appears as a substring in the full input | ||
| const compressed = input.toLowerCase().replace(/[^a-z]/g, ""); | ||
| for (const profane of PROFANE_WORDS) { | ||
| if (compressed.includes(profane)) return true; | ||
| } |
There was a problem hiding this comment.
The client-side profanity filter checks for profane words as substrings of the compressed input. This will create false positives for benign nicknames (e.g., words that contain a shorter profane token) and also diverges from the backend’s word-boundary-only behavior, leading to inconsistent acceptance/rejection. Consider removing substring matching and keeping word-boundary checks aligned with the server.
| ```json | ||
| { | ||
| "version": 1, | ||
| "config": { | ||
| "sample_questions": [{"id": "<32-char-hex>", "question": ["..."]}] | ||
| }, | ||
| "data_sources": { | ||
| "tables": [ | ||
| { | ||
| "identifier": "catalog.schema.table", | ||
| "description": ["..."], | ||
| "column_configs": [{"column_name": "...", "description": ["..."], "synonyms": [...]}] | ||
| } | ||
| ] | ||
| }, | ||
| "instructions": { | ||
| "text_instructions": [{"id": "<32-char-hex>", "content": ["..."]}], | ||
| "example_question_sqls": [{"id": "<32-char-hex>", "question": ["..."], "sql": ["..."]}] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| IDs must be 32-character lowercase hex. Collections must be sorted alphabetically by identifier. |
There was a problem hiding this comment.
The README’s Genie API section states serialized_space uses "version": 1, but build_serialized_space() currently emits version: 2. If version 2 is required, update the README example to avoid misleading future changes/debugging; if not, align the implementation with the documented version.
| @@ -0,0 +1 @@ | |||
| @import"https://fonts.googleapis.com/css2?family=Playfair+Display:ital,wght@0,400;0,600;0,700;1,400&family=Lora:ital,wght@0,400;0,500;0,600;1,400&display=swap";*,*:before,*:after{box-sizing:border-box;margin:0;padding:0}body{font-family:Lora,Georgia,Times New Roman,serif;background:#1a1a2e;background-image:radial-gradient(ellipse at 20% 50%,rgba(72,52,40,.3) 0%,transparent 60%),radial-gradient(ellipse at 80% 20%,rgba(45,27,55,.3) 0%,transparent 60%);color:#e8e0d4;line-height:1.6;min-height:100vh}.container{max-width:720px;margin:0 auto;padding:24px 16px}h1{font-family:Playfair Display,Georgia,serif;font-size:2rem;margin-bottom:16px;color:#d4a853;letter-spacing:.02em}h2{font-family:Playfair Display,Georgia,serif;font-size:1.3rem;margin-bottom:12px;color:#d4a853}label{display:block;font-weight:600;margin-bottom:4px;font-size:.9rem;color:#c4b89a}input[type=text],input[type=password],textarea{width:100%;padding:10px 12px;border:1px solid #4a3f35;border-radius:4px;font-size:.95rem;font-family:Lora,Georgia,serif;background:#2a2440;color:#e8e0d4;transition:border-color .2s,box-shadow .2s}input[type=text]::placeholder,input[type=password]::placeholder,textarea::placeholder{color:#7a7068;font-style:italic}input[type=text]:focus,input[type=password]:focus,textarea:focus{outline:none;border-color:#d4a853;box-shadow:0 0 0 2px #d4a85333}textarea{resize:vertical;min-height:80px}button{cursor:pointer;padding:10px 20px;border:none;border-radius:4px;font-size:.95rem;font-weight:600;font-family:Playfair Display,Georgia,serif;letter-spacing:.03em;transition:background .2s,opacity .2s,transform .1s}button:disabled{opacity:.4;cursor:not-allowed}.btn-primary{background:linear-gradient(135deg,#8b1a1a,#6b0f0f);color:#f0e6d3;border:1px solid #a0282880;text-transform:uppercase;letter-spacing:.08em}.btn-primary:hover:not(:disabled){background:linear-gradient(135deg,#a02020,#7a1414);transform:translateY(-1px)}.btn-secondary{background:#3a3250;color:#c4b89a;border:1px solid #524a68}.btn-secondary:hover:not(:disabled){background:#4a4260}.btn-danger{background:#6b0f0f;color:#f0e6d3;border:1px solid #8b1a1a80}.btn-danger:hover:not(:disabled){background:#7a1414}.btn-small{padding:6px 14px;font-size:.82rem}.card{background:linear-gradient(145deg,#25203a,#1e1a30);border:1px solid #3a3250;border-radius:8px;padding:24px;box-shadow:0 2px 8px #0000004d,inset 0 1px #ffffff08;margin-bottom:16px}.card-instructions{background:linear-gradient(145deg,#2a2218,#221c14);border:1px solid #5a4a35;border-left:3px solid #d4a853}.evidence-field{background:#1e1a30;border:1px solid #3a3250;border-radius:6px;padding:12px;margin-bottom:10px}.evidence-header{display:flex;align-items:center;justify-content:space-between;margin-bottom:8px}.evidence-toggle{display:flex;gap:6px}.evidence-toggle button{padding:4px 10px;font-size:.8rem;border-radius:4px}.evidence-toggle button.active{background:#8b1a1a;color:#f0e6d3;border:1px solid #a0282880}.evidence-toggle button:not(.active){background:#3a3250;color:#8a7e6a;border:1px solid #524a68}.badge{display:inline-block;padding:3px 10px;border-radius:12px;font-size:.8rem;font-weight:600;font-family:Lora,Georgia,serif}.badge-waiting{background:#3a3250;color:#8a7e6a;border:1px solid #524a68}.badge-active{background:#27593466;color:#6abf7b;border:1px solid rgba(106,191,123,.3)}.badge-ended{background:#6b0f0f66;color:#e07070;border:1px solid rgba(224,112,112,.3)}.mt-8{margin-top:8px}.mt-16{margin-top:16px}.mt-24{margin-top:24px}.mb-8{margin-bottom:8px}.mb-16{margin-bottom:16px}.flex-row{display:flex;gap:8px;align-items:center}.text-center{text-align:center}.text-muted{color:#8a7e6a;font-size:.9rem}.text-success{color:#6abf7b}.text-error{color:#e07070}.nav-tabs{display:flex;gap:4px;margin-bottom:20px;border-bottom:1px solid #3a3250;padding-bottom:0}.nav-tabs button{background:none;border:none;padding:8px 16px;font-size:.9rem;color:#8a7e6a;border-bottom:2px solid transparent;margin-bottom:-1px;border-radius:0}.nav-tabs button.active{color:#d4a853;border-bottom-color:#d4a853}.divider{text-align:center;color:#5a4a35;margin:8px 0;font-size:.9rem;letter-spacing:.3em}.btn-genie{display:inline-block;padding:12px 32px;background:linear-gradient(135deg,#1a6b3a,#0f5a2a);color:#f0e6d3;border:1px solid rgba(106,191,123,.4);border-radius:6px;font-size:1.05rem;font-weight:600;font-family:Playfair Display,Georgia,serif;letter-spacing:.06em;text-transform:uppercase;text-decoration:none;transition:background .2s,transform .1s;cursor:pointer}.btn-genie:hover{background:linear-gradient(135deg,#228b4a,#1a6b3a);transform:translateY(-1px)}@keyframes pulse{0%,to{opacity:1;transform:scale(1)}50%{opacity:.6;transform:scale(1.1)}} | |||
There was a problem hiding this comment.
This bundled CSS still contains a Google Fonts @import URL, so the deployed app will make external network requests at runtime. If “no external network calls” is a requirement for Marketplace, the build output should not reference external font URLs (bundle fonts locally or remove the import).
| - Root cause deduction (LLM-judged): up to +100 | ||
| - Business recommendation (LLM-judged): up to +250 |
There was a problem hiding this comment.
The module docstring claims root cause and recommendation scoring are “LLM-judged”, but the implementation below is keyword-overlap based (no LLM calls). This mismatch is confusing for maintainers and users; update the docstring (and any UI text) to match the actual scoring approach.
| - Root cause deduction (LLM-judged): up to +100 | |
| - Business recommendation (LLM-judged): up to +250 | |
| - Root cause deduction (keyword-overlap scored): up to +100 | |
| - Business recommendation (keyword-overlap scored): up to +250 |
| <div style={{ fontWeight: 600 }}>Root Cause Deduction</div> | ||
| <div className="text-muted" style={{ fontSize: "0.8rem" }}>AI-judged match to correct answer</div> | ||
| </div> | ||
| <span style={{ color: "#d4a853", fontWeight: 600, whiteSpace: "nowrap" }}>up to 100 pts</span> | ||
| </div> | ||
| <div style={{ display: "flex", justifyContent: "space-between", padding: "8px 0", borderBottom: "1px solid #3a3250" }}> | ||
| <div> | ||
| <div style={{ fontWeight: 600 }}>Business Recommendation</div> | ||
| <div className="text-muted" style={{ fontSize: "0.8rem" }}>Relevance, actionability, business value</div> | ||
| </div> | ||
| <span style={{ color: "#d4a853", fontWeight: 600, whiteSpace: "nowrap" }}>up to 250 pts</span> |
There was a problem hiding this comment.
The scoring rubric UI text says “AI-judged match” for root cause and recommendation, but the backend scoring is keyword-based. This is a user-facing inconsistency (and conflicts with the PR description of no LLM dependency). Update the rubric copy to reflect keyword-based scoring so users understand what affects their score.
| _set_provisioning_status(account_id, { | ||
| "step": "done", | ||
| "message": "Your case is ready!", | ||
| "progress": 8, | ||
| "total": 8, | ||
| "genie_url": result["genie_url"], | ||
| }) |
There was a problem hiding this comment.
Provisioning progress totals are inconsistent: provisioner.provision_scenario() reports total_steps = 9, but the backend status initialization/final status uses total: 8 and progress: 8. This can make the UI progress bar jump backwards or never reach 100% depending on which event arrives last. Align the step count across provisioner, _run_provisioning, and the frontend default (ProvisioningStatus).
Game + Genie provisioner for Marketplace/Free Edition users. Flask backend (app.py, provisioner.py) plus React/Vite frontend and scenarios data. Co-authored-by: Isaac
Per CONTRIBUTING.md requirements for new experiments. Co-authored-by: Isaac
…isioning - Enforce service-principal-only auth in provisioner.get_auth (no DATABRICKS_TOKEN fallback) - Validate all Unity Catalog identifiers via _check_ident / _check_dotted_ident before SQL interpolation - Fix profanity filter substring bug (word-boundary match instead of "in" check) - Cap in-memory provisioning status dict to prevent unbounded growth - Sanitize user-facing error messages; log full tracebacks server-side only - Remove dead code: get_current_user(), init_db UNIQUE migration, Leaderboard UI + api stubs + types + CSS - Strip stale warehouse_id from all scenario genie_space.json files - Rewrite manifest.yaml to Marketplace schema (version, name, description, user_api_scopes) - Drop serving_endpoint resource from databricks.yml - Expand SECURITY.md with auth model, scope rationale, and input-validation notes - Rebuild frontend bundle Co-authored-by: Isaac
Blockers / runtime correctness: - provisioner: fix UC GRANT syntax — schema-scoped grants instead of catalog - provisioner: allow hyphens in scenario_key via _check_path_ident (financial-planning was rejected) - provisioner: add default 60s request timeout so stalled HTTP can't hang threads Marketplace policy (no external network calls): - index.css: drop fonts.googleapis.com @import; rely on system serif fallback chain - frontend bundle: rebuild so static/assets no longer references external fonts - QuestionView: leaderboard form URL is now config-driven via LEADERBOARD_FORM_URL env; hardcoded Google Forms URL removed. Marketplace builds default to no leaderboard Safety / DoS: - app: cap evidence content (1.5MB per item) and total submission (6MB) - app: cap solution/recommendation text fields at 5000 chars Behavior: - app: provision-on-demand by default; PRE_PROVISION_ALL env opt-in for legacy startup behavior - app: SSE timeout event now includes progress/total fields - app + provisioner: align step count to 8 across provisioner, backend, and frontend default Consistency: - profanityFilter (frontend): drop substring match; word-boundary only, matches backend - README: Genie permissions URL corrected to /api/2.0/permissions/genie/{space_id} - README: serialized_space example bumped to version 2 to match implementation - app docstring + scoring rubric UI: say "keyword-overlap" / "Keyword match" instead of LLM-judged Co-authored-by: Isaac
d6eb4c5 to
ae6c676
Compare
Summary
Adds
marketplace-provisioning/— a Databricks App that provisions Genie spaces and Unity Catalog datasets for Marketplace / Free Edition users. Players pick a business-mystery scenario and the app creates a catalog, tables, volumes, and a configured Genie space in their workspace, then scores submitted solutions against expected evidence.databricks.sdk.WorkspaceClient(no PAT storage, no on-behalf-of-user)manifest.yamland minimalapp.yamlTest plan
databricks bundle deployto a Free Edition workspaceThis pull request and its description were written by Isaac.