Databricks Vibe Modelling Agent

Generate production-grade enterprise data models from natural language

Describe your business. Get a data model. Vibe it until it's perfect.

Concepts · Getting Started · Widget Reference · Vibing Workflow · Action Catalog · Troubleshooting

Documentation
What Is Vibe Modelling?
Concepts
Getting Started
- Quick Start Recipes
Operations
The Vibing Workflow
Widget Reference
- Core Widgets (01–11)
- Convention Widgets (12–24)
Auto-Generated Next Vibes
The Complete Action Catalog
Pipeline Stages
Output Artifacts
Viewing a Generated Model
Quality Rules & Enforcement
LLM Architecture
Metric Views
Troubleshooting
Glossary
Recent fixes (v0.6.x → v0.8.x)

📚 Documentation

Document	Description
docs/	Documentation index — whitepaper, design guide, integration guide
docs/design-guide.md	Technical design reference
docs/integration-guide.md	UI/consumer integration protocol
docs/whitepaper.md	Philosophy and complete rules catalog
runner/readme.md	Pipeline orchestrator guide
tests/readme.md	Test suite reference

What Is Vibe Modelling?

flowchart LR
    IN["Business description<br/>+ natural-language vibes"] --> AG["Vibe Modelling Agent<br/>(multi-model LLM ensemble)"]
    AG --> MM["Metamodel<br/>(_business, _domain,<br/>_product, _attribute)"]
    AG --> PS["Physical schemas<br/>(UC catalogs, schemas,<br/>tables, FKs, tags)"]
    AG --> SM["Sample data<br/>(CSV per table)"]
    AG --> DC["Docs + artifacts<br/>(README, Excel, DBML,<br/>ontology, dictionary)"]
    AG --> NV["Next Vibes<br/>(self-suggested<br/>improvements)"]

Vibe Modelling is a Databricks-native, LLM-powered approach to generating enterprise data models from natural language. Instead of manually drawing ER diagrams, writing DDL, or importing pre-built industry templates, you describe your business in plain English and the agent builds a complete, production-grade data model — domains, tables, columns, foreign keys, tags, sample data, and documentation — end to end.

The name "Vibe" reflects the core workflow:

Generate a base model → review → vibe it with natural language → repeat → deploy

Each iteration produces a new version. The agent carries forward your context so nothing is lost between runs. You are never locked into a static template — the model evolves with your business.

Current version: v0.7.7 — see Version history and Recent fixes below.

Note on numbering: the dev branch tracked four internal iterations labelled v0.7.1, v0.7.2, v0.7.3, v0.7.4 between the v0.7.0 and v0.7.1 release tags. The v0.7.1 release tag on main consolidates all of those iterations into a single public release, so __AGENT_VERSION__ = "0.7.1" is stamped into every model.json produced from this point forward. The earlier per-iteration version-history rows below are kept verbatim as the engineering trace.

🚀 Three Ways to Run

1. Quick mode (5-minute start)

Fill only these 4 widgets and run the agent notebook:

Widget	Value
`business_name`	e.g. `Airlines`
`business_description`	2–3 sentences describing the business
`data_model_scopes`	`Minimum Viable Model - MVM` (lean) or `Expanded Coverage Model - ECM` (full)
`deployment_catalog`	Unity Catalog target, e.g. `industry_mvm_v1`

Everything else auto-fills. You get a complete metamodel + physical schemas + sample data + documentation + next_vibes.txt for iteration.

2. Runner mode (both ECM + MVM with test install)

When you want both ECM and MVM produced and installed in one go — plus a staging round-trip to prove the install works before landing in the permanent catalogs — use the vibe_runner notebook instead of the agent directly.

Runner launches a 4-task Databricks Job:

ECM generate — builds the full ECM into a staging catalog
ECM install — deploys the ECM into {business}_ecm_v1
MVM shrink — derives MVM from ECM in a staging catalog (reuses the ECM so you don't regenerate from scratch)
MVM install — deploys the MVM into {business}_mvm_v1

Tasks 2 and 3 run in parallel once Task 1 is done; Task 4 waits for Task 3.

Inputs: a single my-industries.json file listing one or more businesses (name + description + optional vibes) plus the standard widget defaults. Runner reads the file, auto-discovers the latest dbx_vibe_modelling_agent_v* notebook in the same directory, and orchestrates the job.

See runner/readme.md for the exact file format and submit recipe.

3. Full mode (every widget)

Use the Widget Reference section below when you need fine control over naming conventions, tag prefixes, sample data volume, or installation behavior.

Widget semantics note (v0.5.8+): for schema_prefix, schema_suffix, catalog_prefix, catalog_suffix, tag_prefix, tag_suffix — an explicitly empty widget value overrides whatever is in the model.json. For all other widgets, empty widget = "use file value if present".

📖 Concepts

Industry Data Models vs. Business Data Models

What is an Industry Data Model?

An industry data model is a generic, one-size-fits-all template designed for an entire vertical — retail, banking, healthcare, telecoms, etc. Organizations like the TM Forum (telecoms), ARTS (retail), ACORD (insurance), and HL7 (healthcare) publish canonical schemas that attempt to cover every conceivable entity and relationship across the industry.

The problem: When you adopt an industry model, you inherit everything — including the 60–80% of tables that your business will never use. You then spend months trimming, renaming, and reshaping the model to fit your actual processes.

What is a Business Data Model?

A business data model is tailored, contextualized, and specific to YOUR organization. It reflects your actual business processes, your product lines, your org structure, your regulatory environment, your terminology, and your governance requirements.

Vibe Modelling generates business data models. The LLM understands your industry deeply (via the complexity tier system), but the model it produces is shaped entirely by your context.

Aspect	Industry Data Model	Business Data Model (Vibe)
Scope	Entire industry vertical	Your specific business
Customization	Post-delivery (manual pruning)	Built-in (LLM-driven from your context)
Relevance	20–40% directly applicable	90–100% directly applicable
Time to production	Months of adaptation work	Hours with iterative vibing
Naming	Committee-standard naming	Your business terminology and conventions
Evolution	New version = re-adoption	Vibe the next version from the previous one
Cost	License fees + adaptation labor	Compute cost of LLM generation

The Vibe Philosophy

1. GENERATE  →  Describe your business, get a base model
2. VIBE IT   →  Review output, provide natural-language refinements
3. REPEAT    →  Each iteration = new version; agent auto-suggests next vibes
4. DEPLOY    →  Physical Unity Catalog schemas, tables, FKs, tags, sample data

The Four-Level Hierarchy: Divisions → Domains → Products → Attributes

Every Vibe model follows a strict four-level hierarchy:

graph TD
    B["🏢 BUSINESS<br/><i>e.g., Contoso Manufacturing</i>"]
    B --> D1["⚙️ Operations Division"]
    B --> D2["💼 Business Division"]
    B --> D3["🏛️ Corporate Division"]
    
    D1 --> DOM1["📦 logistics"]
    D1 --> DOM2["🏭 production"]
    D1 --> DOM3["📋 inventory"]
    
    D2 --> DOM4["👤 customer"]
    D2 --> DOM5["💰 sales"]
    D2 --> DOM6["🧾 billing"]
    
    D3 --> DOM7["👥 hr"]
    D3 --> DOM8["📊 finance"]
    
    DOM5 --> P1["🗂️ order"]
    DOM5 --> P2["🗂️ order_item"]
    DOM5 --> P3["🗂️ quote"]
    
    P1 --> A1["🔑 order_id <i>(PK, BIGINT)</i>"]
    P1 --> A2["📅 order_date <i>(DATE)</i>"]
    P1 --> A3["🔗 customer_id <i>(FK → customer.customer)</i>"]
    P1 --> A4["💲 total_amount <i>(DECIMAL)</i>"]
    P1 --> A5["📌 status <i>(STRING)</i>"]

Level 1: Divisions

Divisions are the top-level organizational grouping. Every domain belongs to exactly one division.

Division	Purpose	Typical Domains
Operations	Core operational backbone — mechanisms, infrastructure, processes	Logistics, production, inventory, service delivery, supply chain, quality control
Business	Revenue-generating and customer-facing functions	Customer/party, billing/revenue, product catalog, sales, subscriptions
Corporate	Supporting functions for governance (NOT directly revenue-generating)	HR, finance, legal/compliance, marketing, procurement, IT

Division Balance Rules:

Operations + Business MUST be >= 80% of all domains

Corporate is capped at <= 20% of total domains

No Corporate domain allowed until Operations AND Business each have >= 2 domains

Level 2: Domains

A domain is a logical grouping of related data products (tables). When deployed, each domain maps 1:1 to a Unity Catalog schema.

Named in snake_case, exactly one word (e.g., customer, fulfillment, logistics)
Count depends on model scope (MVM vs ECM) and industry complexity tier
The shared domain is reserved — the pipeline auto-creates it during SSOT consolidation; never create it manually

Level 3: Products (Tables)

A product is a data table within a domain — a first-class business entity with its own identity, lifecycle, and attributes.

Every product gets a PK column: <product_name>_<pk_suffix> (default: <product_name>_id)
Classified as CORE (entities stakeholders query directly) or HELPER (supporting entities)
Tagged by data type: master_data | transactional_data | reference_data | association_table

The First-Class Entity Test: A product must have its own identity, its own lifecycle, and at least 5 unique business attributes. Anything less should be an attribute on another table or merged.

Level 4: Attributes (Columns)

Each attribute has:

Property	Example	Purpose
`attribute`	`customer_email`	Column name (snake_case)
`type`	`STRING`	Apache Spark SQL data type
`tags`	`restricted,pii_email`	Classification + PII tags
`value_regex`	`^[a-zA-Z0-9._%+-]+@`	Validation pattern or enum
`business_glossary_term`	`Customer Email Address`	Human-readable business name
`description`	`Primary contact email for the customer`	What this attribute represents
`reference`	`GDPR Article 6`	Regulatory or standard reference
`foreign_key_to`	`customer.customer.customer_id`	FK target (`domain.product.pk`), or empty

DAG Enforcement — No Circular Dependencies

FK relationships MUST form a Directed Acyclic Graph (DAG):

✅ VALID:   order → customer → address
❌ INVALID: order → customer → address → order  (cycle!)

Why it matters: Circular dependencies prevent clean data loading order, break ETL pipelines, and indicate modeling errors.

How the agent enforces it:

Detect — Python DFS (Depth-First Search) cycle detection during QA
Break — LLM Cycle Break specialist determines which FK to remove (always the weakest link)
Verify — Re-run DFS to confirm graph is now a valid DAG
Iterate — Up to 5 rounds to resolve all cycles, including residual ones

Allowed exception: Self-referencing hierarchical FKs (e.g., parent_category_id → category.category_id) are permitted — they represent tree structures, not multi-node cycles.

Single Source of Truth (SSOT)

Each core business concept has exactly ONE authoritative domain and ONE authoritative product (table) that owns it. No concept is duplicated across domains.

What SSOT prevents:

Two customer tables in different domains
A product table in both sales and inventory
An invoice table in both billing and finance

How SSOT is enforced:

Phase	Mechanism
Generation	LLM places each entity in its authoritative domain — the domain that CREATES/OWNS that data
QA Deduplication	Global pass detects same-name products + synonym pairs (e.g., `customer` vs `client`) with 60%+ attribute overlap
Consolidation	Overlapping products merged into `shared` domain with discriminator column
Cross-Domain References	Other domains use FK columns to reference the authoritative table — no duplication

The "Where Do I Go?" Test: For any entity, ask: "If a business user needs this information, is there exactly ONE place to get it?" If the answer is ambiguous, there is an SSOT violation.

Model Scopes: MVM vs. ECM

	MVM (Minimum Viable Model)	ECM (Expanded Coverage Model)
Size	30–50% of ECM table count	Full coverage
Attribute depth	SAME as ECM — full production-grade (same min/max per tier)	Full production-grade
Domains	Essential business functions only	All functions including corporate back-office
Ideal for	SMBs, rapid deployments, POCs, dev/test	Fortune 100, multinational enterprises
Lightness	Fewer domains & tables (NEVER thinner attributes)	Maximum breadth

MVM is NOT a skeleton or demo toy. It is a production-ready subset where every delivered table is fully-featured.

Subdomains

Every domain is organized into subdomains — semantic groupings of related products within a domain. Subdomains provide an additional organizational layer between domains and products.

Subdomain Rules

Rule	Description
Count	Minimum and maximum subdomains per domain are defined by the complexity tier. Never exactly 1 subdomain per domain.
Naming	EXACTLY 2 words per subdomain name. 1 word = rejected. 3+ words = rejected.
Min Products	Every subdomain must contain at least the minimum products per subdomain defined by the tier: ECM tier_1–tier_4: 3, ECM tier_5: 2; MVM tier_1–tier_3: 3, MVM tier_4–tier_5: 2.
No Overlap	No two subdomains within the same domain may share any word in their names.
Business Terms	Use business terminology, NOT technical terms.
Balanced	Products distributed as evenly as possible across subdomains.
No Placeholders	NEVER use: "Sub Domain1", "Category 1", "Group A", "N/A", "Other", "General", "Miscellaneous", "Uncategorized".
No Drift	Each subdomain belongs to exactly one parent domain.

Example: In the party domain:

identity: individual, organization, party_identification, kyc_verification
engagement: party_interaction, consent_record, loyalty_enrollment

Industry Complexity Tiers

The agent auto-classifies your business into one of five tiers:

Classification is based on 7 scoring dimensions (scored 0 or 1 each):

Regulatory density — 3+ distinct regulatory bodies imposing data/reporting requirements
Party complexity — 3+ distinct party types (customers, suppliers, partners, employees, etc.)
Product hierarchy depth — 50+ product variants with complex bundling/pricing/lifecycle
Infrastructure management — Owns physical or digital infrastructure requiring asset tracking
Industry canonical model — 200+ entity types defined by an industry standards body
Transaction complexity — 10+ distinct transaction types with multi-step lifecycles
Operational system landscape — 5+ major systems of record across business functions

Rule: If the business falls between two tiers, classify UP (prefer the more complex tier).

Tier	Label	Hallmarks	ECM Domains	ECM Products/Domain	Attrs/Product	Subdomains/Domain
`tier_1`	Ultra-Complex	5+ regulatory bodies, multi-entity structures (banking, insurance, pharma)	15–22	14–28	15–50	3–6
`tier_2`	Complex	2–4 regulatory bodies, multi-channel (telecoms, energy, healthcare)	12–18	14–26	12–50	2–5
`tier_3`	Moderate	1–2 regulatory bodies, 3+ business functions (manufacturing, retail)	10–15	12–24	10–45	2–5
`tier_4`	Standard	Light regulation, regional complexity (logistics, agriculture)	8–12	10–20	10–40	2–4
`tier_5`	Simple	Minimal regulation, service-based (consulting, SaaS, media)	5–8	8–18	8–35	2–4

MVM Tier Sizing (attribute depth is the SAME as ECM):

Tier	MVM Domains	MVM Products/Domain	Subdomains/Domain
`tier_1`	9–14	8–16	2–4
`tier_2`	8–12	8–14	2–4
`tier_3`	6–10	7–13	2–4
`tier_4`	5–8	6–11	2–3
`tier_5`	3–6	5–10	2–3

MVM counts are automatically derived as ~30–50% of the ECM counts for each tier. Attribute depth (min/max attributes per product) is the same for both MVM and ECM within each tier.

🚀 Getting Started

Quick Start Recipes

Recipe 1: Generate Your First Model (Minimal Input)

Widget	Value
01. Business	`My Company Name`
02. Description	`A brief description of what your company does`
03. Operation	`new base model`
05. Model Scope	`Minimum Viable Model - MVM`
09. Installation Catalog	`my_catalog` (or leave blank for logical-only)
Everything else	Defaults

Run the notebook. Done.

Recipe 2: Generate a Rich Model (Recommended)

Widget	Value
01. Business	`Contoso Manufacturing`
02. Description	`A multinational aluminum smelting company operating across 12 countries with 15,000 employees`
03. Operation	`new base model`
05. Model Scope	`Expanded Coverage Model - ECM`
06. Business Domains	`production, quality, supply, customer, sales, logistics, billing`
07. Org Divisions	`Operations, Business and Corporate`
09. Installation Catalog	`contoso_dev`
10. Sample Records	`10`

Recipe 3: Vibe an Existing Model

Widget	Value
01. Business	`Contoso Manufacturing`
02. Description	(same as before)
03. Operation	`vibe modeling of version`
04. Version	`1`
08. Model Vibes	`Add a warranty domain. Remove the corporate_strategy domain. Run quality checks.`

Recipe 4: Use Auto-Generated Next Vibes

After any pipeline run, find next_vibes.txt in the vibes/ output folder. Copy the suggested vibes into widget 08.

Widget	Value
01. Business	`Contoso Manufacturing`
02. Description	(same as before)
03. Operation	`vibe modeling of version`
04. Version	Next version number (previous + 1)
08. Model Vibes	(paste recommended vibes from next_vibes.txt)

Recipe 5: Deploy to a New Catalog

Widget	Value
03. Operation	`install model`
09. Installation Catalog	`prod_catalog`
11. Model JSON File	`/Volumes/catalog/schema/volume/vibes/contoso/ecm_v1/model.json`
10. Sample Records	`0` (or a number for test data)

⚙️ Operations

The 03. Operation widget selects which pipeline mode to run:

Operation	Purpose	Key Requirements
`new base model`	Generate a brand-new data model from scratch	Business name + description (via widgets)
`vibe modeling of version`	Apply natural-language instructions to refine an existing version	Version + vibes (widget 08)
`shrink ecm`	Convert an ECM to a leaner MVM	Version + deployment catalog
`enlarge mvm`	Expand an MVM into a comprehensive ECM	Version + deployment catalog
`install model`	Deploy a logical model into physical Unity Catalog objects	Model JSON file (widget 11) + deployment catalog
`uninstall model version`	Remove a version's physical artifacts from the catalog	Business name + version + catalog
`generate sample data`	Generate synthetic sample records for an existing deployed model	Model JSON file (widget 11) + deployment catalog

🎵 The Vibing Workflow

This is the core power of Vibe Modelling — iterative refinement via natural language:

graph LR
    A["🆕 New Base Model<br/>Version 1"] --> B["🔍 Review Output"]
    B --> C["📝 Write Vibes<br/><i>or use auto-generated next_vibes.txt</i>"]
    C --> D["🎵 Vibe Modeling<br/>Version 2"]
    D --> E["🔍 Review"]
    E --> F["🎵 Vibe v3"]
    F --> G["..."]
    G --> H["✅ Deploy"]

What Vibes Can Do

Vibes are free-form natural language. The agent interprets them and translates them into specific actions:

"Add a compliance domain with regulatory_filing and audit_trail tables"
"Remove the HR domain, we don't need it"
"Merge the customer_support domain into the customer domain"
"Add a source_system column to every table"
"Run a full quality check and fix any issues found"
"The order table should have a shipping_address_id FK to the address table"
"Rename all tables starting with dim_ to remove that prefix"
"Normalize the order domain to 3NF"
"Mark all email columns as PII"
"Generate an ontology (RDF/RDFS) for the model"
"Keep only billing-related tables, drop everything else"
"Make this model ECM — I need the large version"

Auto-Generated Next Vibes

After every pipeline run, the agent produces next_vibes.txt and current_vibes.txt in the vibes/ folder, containing:

Your current business context (preserved as-is)
Recommended vibes for the next iteration (based on QA findings)
Model health metadata: confidence score, warning counts, issue breakdown
Version history and progression tracking

To use them: copy the recommended vibes into widget 08. Model Vibes, set operation to vibe modeling of version, set the next version, and run.

🎛️ Widget Reference

The notebook exposes 28 configurable widgets. Below is the complete reference.

Core Configuration Widgets (01–11)

#	Widget	Type	Mandatory	Default	Description
01	Business (name)	Text	Yes	—	Your business/organization name
02	Description	Text	Recommended	—	What your business does (richer = better model)
03	Operation	Dropdown	Yes	`new base model`	Pipeline operation to run
04	Version	Dropdown	Conditional¹	—	Model version number (1–100)
05	Model Scope	Dropdown	Yes	`Minimum Viable Model - MVM`	MVM (lean) or ECM (comprehensive)
06	Business Domains	Text	No	—	Comma-separated seed domains
07	Included Org Divisions	Dropdown	Yes	`Operations and Business`	Which divisions to include
08	Model Vibes	Text	Conditional²	—	Natural language instructions — inline (max 2,000 chars) or file path to a `.txt` on a UC Volume
09	Installation Catalog	Text	Conditional³	—	Unity Catalog for physical deployment
10	Sample Records	Dropdown	No	`0`	Synthetic records per table (0 = none)
11	Model JSON File	Text	Conditional⁴	—	Path to a previously generated `model.json` for re-install or continuation

¹ Required for all operations except new base model (auto-assigned). ² Required for vibe modeling of version. ³ Required for install, uninstall, generate sample data, shrink, enlarge. Optional for new base model and vibe modeling. ⁴ Required for install model, generate sample data when re-deploying a previously generated model.

Detailed Widget Descriptions (click to expand)

01. Business (name)

The name of your business. Used as the top-level identifier across the entire model. Case-insensitive matching (stored via LOWER()).

Sample values: Contoso Inc, Acme Healthcare, Global Telecom Corp, NextGen Retail

02. Description

A rich description of what your business does. Include industry, size, geography, key products/services. The LLM uses this to determine the industry complexity tier and tailor the model.

Sample values: A multinational aluminum smelting and manufacturing company operating across 12 countries, A digital-first healthcare provider specializing in telemedicine

03. Operation

See the Operations section for full details on each mode.

04. Version

Options: Empty, or 1–100

For new base model, leave empty — auto-assigned as 1 (or auto-incremented). For vibe modeling of version, this is the version you are modifying; output creates version N+1.

05. Model Scope

Options: Minimum Viable Model - MVM | Expanded Coverage Model - ECM

MVM = lean core (fewer domains/tables, same attribute depth). ECM = comprehensive Fortune 100 coverage.

06. Business Domains

Comma-separated list of specific domains you want. If blank, the LLM auto-generates the optimal set for your industry.

Sample values: customer, sales, billing, inventory, logistics | clinical, pharmacy, research

07. Included Org Divisions

Options: Operations | Operations and Business | Operations, Business and Corporate

Controls which organizational divisions contribute domains to the model.

08. Model Vibes

Supports two input modes:

Inline text — type your vibe instructions directly into the widget (max 2,000 characters). Best for short, targeted changes.
File path — provide a path to a .txt file on a Unity Catalog Volume (e.g., /Volumes/catalog/schema/vol/my_vibes.txt). Best for longer, multi-paragraph instructions that exceed the inline limit.

See The Vibing Workflow for examples of what vibes can do.

09. Installation Catalog

The Unity Catalog where physical schemas, tables, FK constraints, tags, and sample data will be created. Must already exist. You need CREATE SCHEMA privileges. If blank for new base model/vibe modeling, only the logical model (JSON artifacts) is produced.

Sample values: dev_catalog, prod_data_models, contoso_lakehouse

10. Sample Records

Options: 0, 5, 10, 15, 20, 25, 50, 100

0 = skip sample data generation. 10 = good default for review. 50–100 = for load testing / demos. Sample data respects FK relationships (child records reference valid parent IDs).

11. Model JSON File

(Optional) Path to a previously generated model.json file on a Unity Catalog Volume. Used when re-installing or continuing from a prior run. After every run, the agent generates next_vibes.txt in the vibes/ folder — review it for recommended next vibe instructions.

Sample values: /Volumes/my_catalog/my_schema/vol/vibes/contoso/ecm_v1/model.json

Model Convention Widgets (12–24)

#	Widget	Type	Default	Options / Format
12	Naming Convention	Dropdown	`snake_case`	`snake_case`, `camelCase`, `PascalCase`, `SCREAMING_CASE`
13	Primary Key Suffix	Dropdown	`_id`	`_id`, `_key`, `_pk`, `id`, `key`
15	Schema Prefix	Text	(empty)	e.g., `stg_`, `raw_`, `dw_`
16	Tag Prefix	Text	`dbx_`	e.g., `dbx_`, `vibe_`, `mdl_`
17	Table ID Type	Dropdown	`BIGINT`	`BIGINT`, `INT`, `LONG`, `STRING`
18	Boolean Format	Dropdown	`Boolean (True/False)`	`Boolean (True/False)`, `Int (0/1)`, `String (Y/N)`
19	Date Format	Dropdown	`yyyy-MM-dd`	`yyyy-MM-dd`, `dd/MM/yyyy`, `MM/dd/yyyy`, `yyyy/MM/dd`, `dd-MM-yyyy`
20	Timestamp Format	Dropdown	`yyyy-MM-dd'T'HH:mm:ss.SSSXXX`	4 ISO/standard options
21	Classification Levels	Text	`restricted=restricted, confidential=confidential, internal=Internal, public=public`	Comma-separated `key=Label` pairs
22	Housekeeping Columns	Dropdown	`No`	`No`, `Yes` — adds `created_by`, `created_at`, `updated_by`, `updated_at`
23	History Tracking Columns	Dropdown	`No`	`No`, `Yes` — adds `valid_from`, `valid_to`, `is_current` (SCD Type 2)
09a	Cataloging Style	Dropdown	`One Catalog`	`One Catalog`, `Catalog per Division`, `Catalog per Domain`
09b	Catalog Prefix	Text	(empty)	e.g., `dev_`, `prod_`
09c	Catalog Suffix	Text	(empty)	e.g., `_lakehouse`, `_dw`
15a	Schema Suffix	Text	(empty)	e.g., `_db`, `_schema`
16a	Tag Suffix	Text	(empty)	e.g., `_tag`
24	Vibe Session ID	Text	(empty)	UUID for external UI progress tracking

Widget #14 does not exist (numbering gap between 13 and 15).

📄 Auto-Generated Next Vibes

After every run, the agent produces next_vibes.txt in the vibes/ output folder with recommended next vibe instructions. Copy these into widget 08. Model Vibes for the next iteration.

The model.json file includes session metadata:

{
  "_next_vibe_metadata": {
    "generated_from_version": "mvm_v1",
    "model_version": "1",
    "status": "needs_work",
    "confidence_score": 78,
    "summary": "Model has 3 unlinked columns and 1 siloed table",
    "issues_addressed": ["..."],
    "issues_not_addressed": ["..."],
    "data_modeler_notes": "Recommend adding a warehouse domain",
    "model_stats_at_generation": { "domains": 8, "products": 47, "attributes": 523 },
    "issue_counts": { "error": 0, "warning": 3, "info": 5 },
    "version_history": ["..."]
  }
}

🎯 The Complete Action Catalog

Vibes are translated into 190+ specific actions organized into categories. You never need to name these actions — just describe what you want in natural language and the agent maps your intent.

Entity Management (1–25)

#	Action	What It Does
1	`drop`	Remove a domain, product, attribute, tag, or link
2	`create`	Add a new domain, product, or attribute
3	`rename`	Change the name of an entity
4	`alter_description`	Modify the description
5	`change_type`	Change an attribute's data type
6	`merge`	Combine two entities into one
7	`split`	Divide one entity into multiple
8	`move_product`	Move a product to another domain
9	`move_attribute`	Move an attribute to another product
10	`delete_attribute`	Remove an attribute
11	`create_link`	Create a foreign key relationship
12	`drop_link`	Remove a foreign key relationship
13–17	Prefix/suffix ops	Add, remove, or change name prefixes/suffixes
18–24	Metadata ops	Update glossary terms, references, regex, tags
25	`modify`	Regenerate an entity with specific guidance

Quality Check & Analysis (26–38)

#	Action	What It Does
26	`run_quality_checks`	Compound: runs detect_duplicates + dedupe_attributes + detect_cycles + detect_siloed + fix_fk_anomalies
26b	`run_product_domain_fit`	LLM audit: are products in the right domains? Relocates misplaced ones.
27	`detect_duplicates`	Find SSOT violations (semantic duplicate products across domains)
28	`fix_duplicates`	Auto-merge/remove detected duplicates
29	`dedupe_attributes`	Remove duplicate columns within products
30	`detect_cycles`	Find circular FK dependencies
31	`break_cycles`	Auto-fix cycles by removing weakest FK links
32	`detect_siloed`	Find completely disconnected tables (no FKs in any direction)
33	`fix_siloed`	Connect disconnected tables via linking attempts
34	`review_links`	Audit all FK relationships for anomalies
35	`fix_fk_anomalies`	Repair broken, mismatched, or orphaned FK references
36	`fix_ambiguous_fks`	Resolve FKs that match multiple target tables
37	`merge_small_tables`	Analyze and consolidate tables with < 5 attributes
38	`identify_core_products`	Identify business-critical products for protection
95	`model_checkup`	Mega-compound: runs static analysis + auto-queues ALL appropriate fixes

FK & Linking Operations (39–46, 133–138)

#	Action	What It Does
39	`run_linking`	Compound: in-domain + cross-domain + M:N detection
40	`run_in_domain_linking`	Link FKs within each domain
41	`run_cross_domain_linking`	Link FKs across different domains
42	`detect_many_to_many`	Find potential M:N relationships
43	`create_junction_tables`	Create bridge/association tables for M:N
45	`redirect_fk`	Redirect FK to a different target table
46	`find_unlinked_columns`	Find *_id columns without FK relationships
133	`remove_product_prefix`	Remove redundant table-name prefix from columns
134	`fix_fk_column_naming`	Fix FK columns that don't end with the target PK name
135	`connect_table`	Connect a specific disconnected table
136	`link_specific_columns`	Link an explicit list of unlinked _id columns
138	`find_missing_fk_links`	Comprehensive: LLM classifies each unlinked column as LINK, CREATE, DROP, or KEEP_AS_IS

Bulk Operations (62–64, 110–114)

#	Action	What It Does
62	`bulk_rename_products`	Rename multiple products by pattern
63	`bulk_drop_products`	Drop products matching a pattern (e.g., `stg_*`)
64	`bulk_move_products`	Move products matching a pattern to another domain
110	`bulk_change_type`	Change data type for attributes by pattern
111	`bulk_set_nullable`	Set nullable for attributes by pattern
113	`bulk_remove_attributes`	Remove attributes matching a pattern from all tables

Tag & Classification Operations (54–61, 98–106, 156–163)

#	Action	What It Does
54–57	Tag add/remove	Add or remove tags from products or domains
58	`conditional_tag`	Tag entities matching a condition
59–60	Bulk tagging	Tag by name pattern or by column presence
98–101	PII/sensitive marking	Mark as PII, sensitive, encrypted, deprecated
103	`set_table_type`	Classify as dimension, fact, lookup, bridge, staging, archive
156	`classify_table_tier`	Medallion architecture: bronze / silver / gold
161	`set_data_owner`	Assign data owner/steward
162	`set_update_frequency`	Document expected freshness (real_time, daily, monthly, etc.)
163	`map_to_source_system`	Map table/column to source system (SAP, Salesforce, etc.)

Template Column Actions (148–155)

#	Action	Columns Added
148	`add_scd_columns`	`effective_from`, `effective_to`, `is_current`, `row_hash`
149	`add_audit_columns`	`created_at`, `updated_at`, `created_by`, `updated_by`
150	`add_soft_delete_columns`	`is_deleted`, `deleted_at`, `deleted_by`
151	`add_temporal_columns`	`valid_from`, `valid_to`, `system_from`, `system_to`
152	`add_versioning_columns`	`version_number`, `version_valid_from`, `version_valid_to`, `is_latest_version`
153	`add_multitenancy_columns`	`tenant_id`
154	`add_lineage_columns`	`source_system`, `source_table`, `ingestion_timestamp`, `etl_job_id`
155	`add_gdpr_columns`	`consent_status`, `consent_date`, `data_subject_request_id`, `right_to_erasure_date`

Structural Transformations (164–191)

#	Action	What It Does
164	`normalize_to_3nf`	Apply Third Normal Form normalization (LLM-powered)
165	`denormalize_for_analytics`	Create wide/denormalized tables for BI
184	`promote_to_table`	Extract an attribute into its own lookup/reference table + FK
185	`inline_table`	Merge a child table into its parent (denormalize)
186	`swap_domains`	Atomically swap two domain names
187	`impact_analysis`	Show what would break if a table/domain were dropped (read-only)
190	`enlarge_model`	Wholesale expansion from MVM to ECM scope
191	`shrink_model`	Wholesale reduction from ECM to MVM scope
178	`VIBE_PRUNE_PROMPT`	LLM-powered: keep only tables related to a focus area
179	`drop_domains_except`	Drop all domains except a specified keep-list

Artifact Generation (139–147)

#	Action	Output
139	`generate_readme`	README documentation
140	`generate_data_model_json`	Complete model as JSON
141	`generate_ontology`	RDF/RDFS ontology (Turtle format)
142	`generate_dbml`	DBML schema for visualization tools
143	`generate_release_notes`	Changelog/release notes for the version
144	`generate_excel`	Excel workbook export
145	`generate_data_dictionary`	Comprehensive data dictionary
146	`export_model_report`	Full model documentation report
147	`generate_test_cases`	Data quality test case specifications

Metric View Operations (126–134)

#	Action	What It Does
126	`run_metric_modeling`	Generate Databricks metric views (dimensions + measures)
129	`add_metric_measure`	Add a specific KPI measure to metric views
130	`remove_metric_measure`	Remove a measure
131	`add_metric_dimension`	Add a grouping dimension
132	`remove_metric_dimension`	Remove a dimension
133	`alter_metric_filter`	Set/update metric view filter logic
134	`drop_metric_view`	Remove an entire metric view

🔄 Pipeline Stages

When you run the agent, it executes these progress stages in order (the # column is for reference only — no numeric stage IDs are emitted in progress events):

#	Stage	Duration	What Happens
1	Setup and Configuration	2–10s	Validates inputs, creates metamodel tables
2	Interpreting Instructions	10–30s	(Vibe mode only) Parses vibes into structured action plan
3	Collecting Business Context	10–30s	LLM enriches your description across business/industry dimensions
4	Designing Domains	15–60s	Generates domains following division model + SSOT
5	Creating Data Products	1–10m	Products per domain with domain-architect (Step 3.6) and global-architect (Step 3.7) review
6	Enriching Data Products with Attributes	5–40m	All columns for every product
7	Cross-Domain Linking	1–5m	In-domain → global sweep → pairwise comparison
8	Quality Assurance	30s–3m	9 sub-checks (naming, PK/types, overlaps, topology, auto-remediation)
9	Applying Naming Conventions	10–30s	Final naming consistency pass
10	Model Finalization	10–30s	Finalize logical model snapshot before physical deployment
11	Subdomain Allocation	10–30s	Allocates products into semantic subdomains
19	Generating Metric View Artifacts	10–30s	Exports metric view definitions (legacy stage ID)
17	Generating Artifacts	30s–2m	README, Excel/CSV, model JSON, data dictionary, model report
12	Physical Schema Construction	1–10m	Creates UC schemas + Delta tables (if catalog set)
13	Applying Foreign Keys	30s–2m	FK constraints on physical tables
14	Applying Tags	2–15m	Classification, PII, data type tags on schemas/tables/columns
15	Applying Metric Views	30s–5m	Databricks metric views for KPI tracking
16	Generating Sample Data	1–15m	Synthetic records respecting FK relationships
18	Consolidation and Cleanup	10–30s	Consolidates/merges metadata and cleanup

📦 Output Artifacts

Logical Artifacts (Always Generated)

File	Description
`model.json`	Complete model export (primary output)
`readme.md`	Human-readable model documentation
`vibes/current_vibes.txt`	Vibe instructions used in this run
`vibes/next_vibes.txt`	Recommended next vibe instructions
`domains.json`	All domain definitions
`products.json`	All product (table) definitions
`attributes.json`	All attribute (column) definitions
`docs/.xlsx` / `docs/.csv`	Excel/CSV export of the entire model (CSV fallback if openpyxl unavailable)
`readme.md` (parent folder)	Model overview document comparing MVM and ECM scopes
`schemas/*.sql`	SQL DDL files (per-domain schemas, cross-domain FKs, catalogs)
`diagram/_dbml_.txt`	DBML schema diagram
`ontology/_rdf_.ttl`	RDF/Turtle ontology representation
`docs/releasenotes.txt`	Auto-generated release notes

Conditional Artifacts (Generated on Demand via Queued Operations)

File	Description
`docs/_data_dictionary_.txt`	Column-level reference guide
`docs/_model_report_.txt`	Full documentation report
`docs/_test_cases_.txt`	Generated test cases

Physical Deployment (When Catalog Is Set)

Object	Example
Schemas	`catalog.customer`, `catalog.sales`, `catalog.logistics`
Tables	Delta tables with all columns and correct Spark SQL types
FK Constraints	Physical foreign key constraints between tables
Tags	Unity Catalog tags on schemas, tables, and columns
Metric Views	Databricks metric views for KPI calculation
Sample Data	Synthetic records with valid FK references

👁️ Viewing a Generated Model

Once the agent has produced a model (a model.json file plus the artifacts under <industry>/<scope>_v1/), you have two ways to explore it: open it in the model-viewer Databricks App for an interactive ER-graph view, or load model.json programmatically.

The model-viewer app

The viewer/model_viewer_app_installer.ipynb notebook in this repo provisions a Databricks App that renders any model.json as an interactive entity-relationship graph. It exposes three navigable views — full model, single domain, single product — and lets you load models either directly from a GitHub repo (e.g. amralieg/vibe-business-data-models) or by uploading a local model.json file.

Install the viewer

Open viewer/model_viewer_app_installer.ipynb in your Databricks workspace.
Run all cells. The installer creates the Databricks App and prints the app URL when it completes.
Open the app URL in a browser.

Load a model

Inside the app you have two loaders:

Load from repo — paste any GitHub owner/repo (default: amralieg/vibe-business-data-models) and pick the industry + flavour from the dropdown. Note: GitHub rate-limits anonymous API calls — if you hit a 429 / "rate limit exceeded" message, fall back to the JSON loader.
Load from JSON — upload a model.json directly. Use this when (a) you just generated a model with the agent and want to inspect it before deploying, or (b) GitHub is rate-limiting the repo loader.

What you see — three views

1. Full-model overview. Every entity in the model arranged on a single canvas, FK relationships drawn between them, domains colour-coded as rectangles, products on the perimeter:

2. Domain drill-down. Click any domain to zoom in. You see the domain's sub-domains as named groups and the products inside each, with the FK web restricted to within-domain links:

3. Single-product radial view. Click any product (table) to centre it. Every other product it relates to via FK fans out around it, grouped by domain, so you can see at a glance every join path leaving that table:

Programmatic inspection (no app required)

import json
m = json.load(open('<industry>/mvm_v1/model.json'))['model']
for d in m['domains']:
    for p in d.get('products', []):
        print(f"{d['name']}.{p['name']}: {len(p['attributes'])} columns, "
              f"{sum(1 for a in p['attributes'] if a.get('foreign_key_to'))} FKs")

The _manifest.json next to every model.json gives you a quick counts summary (domains, products, attributes, FKs, metric views) without parsing the full model.

🛡️ Quality Rules & Enforcement

The agent enforces a comprehensive rule system during generation and QA:

Rule Group	Key Rules
Naming (G01)	snake_case default; domains: 1 word, singular, max 20 chars; products: 1–3 words, max 30 chars, no domain prefix; attributes: max 50 chars, no product prefix (except PK); FKs must end with target PK name; preserve unit qualifiers
Semantic Dedup (G02)	First-Class Entity Test (5 criteria); SSOT violation detection; 60%+ overlap → merge to shared; 90%+ overlap same domain → remove; attribute dedup >80% confidence; semantic distinction rules (method vs channel, target vs actual, lifecycle timestamps)
FK Integrity (G03)	FK target must exist; no bidirectional FKs; DAG required (no cycles); FK type compatibility; system identifiers exempted; hierarchical self-refs exempt
PK Rules (G04)	Every product has exactly 1 PK; PK = {product}_{suffix}; PK type = configured type (default BIGINT); PK exempt from prefix stripping
Normalization (G05)	3NF enforcement; orphaned FK detection; denormalized attribute detection with >95% confidence; point-in-time snapshots exempt; measurement attributes exempt; geographic coordinates on physical entities exempt
Division Balance (G06)	Ops + Business ≥ 80% of domains; Corporate ≤ 20%; no Corporate until Ops AND Business each have ≥ 2; cross-division relocation forbidden; forbidden generic domain names; Org Chart Test; Fragmentation Test (30%+ overlap → merge)
Data Types (G07)	Spark SQL types only; no ARRAY/STRUCT/MAP; no calculated metrics/aggregates
Tags (G08)	Classification in tags only; PII = RESTRICTED + specific PII tag (pii_email, pii_phone, pii_financial, pii_health, pii_identifier, pii_address, pii_biometric, pii_name, pii_dob); empty tags for regular data; custom user tags always applied
Graph Topology (G09)	DAG enforcement; DFS cycle detection; hierarchical self-ref exemption; zero siloed tables; each domain ≥2 cross-domain connections; parent-child FKs never broken in cycle resolution
Honesty Check (G11)	0–100 scale; below 55 = permanently discarded; 55–70 = borderline retry; ≥90 = accepted; contradiction penalty applied post-processing
Product Design (G12)	M:N requires 3 indicators with ≥2 of 3 strong; association ratio ECM ≤15%, MVM ≤5%; core products 1–3 per domain; forbidden product suffixes (_analysis, _analytics, _report, _prediction); Silver Layer only (no analytics products)
Sample Data (G13)	Exact record count; sequential PK from 10001; FK random [10001, 10001+N-1]; regex compliance; 3-letter country codes; no Lorem Ipsum; realistic business data
Vibe Constraints (G14)	Dedup overlap thresholds; max relocation percentage per pass; normalization confidence 95%; mutation budgets by mode (surgical, holistic, generative); domain hard ceiling factor 1.5x
Physical Schema Deployment (G15)	Schema and table creation factories; attribute dict factory; product dict factory; FK dependency ordering; consolidation and cleanup
Subdomains	Exactly 2-word names; min products per subdomain per tier; no overlapping words; balanced distribution; no placeholder names

🤖 LLM Architecture

The agent uses a multi-model ensemble with automatic demotion and recovery:

Order	Role	Purpose	Endpoint	Input Tokens	Output Tokens
10	Thinker (large)	Complex reasoning, architecture reviews, QA decisions	`databricks-claude-opus-4-6`	200,000	128,000
20	Worker (large)	High-volume generation: products, attributes, FKs, dedup	`databricks-claude-sonnet-4-6`	200,000	64,000
30	Thinker (large)	Fallback thinker	`databricks-claude-opus-4-5`	200,000	64,000
40	Worker (large)	Fallback worker	`databricks-claude-sonnet-4-5`	200,000	64,000
50	Worker (small)	Simpler tasks: domain generation, tag classification	`databricks-gpt-oss-120b`	131,072	25,000
60	Worker (tiny)	Sample data generation	`databricks-gpt-oss-20b`	131,072	25,000

Automatic model demotion: After 3 cumulative failures on a model, the entire priority order is demoted — the failing model is pushed down and the next model takes its place. After 5 consecutive successes, the original order is restored. After 3 consecutive timeouts, a model is marked broken and skipped for the rest of the session.

Prompt architecture: 49 specialized prompts, each mapped to a specific model role and temperature setting. Thinker prompts use temperature 0 for deterministic reasoning. Worker prompts use temperature 0–0.3 for controlled generation. Sample generation uses temperature 0.5 for creative variety.

📊 Metric Views

The agent generates Databricks metric views — reusable KPI definitions that sit on top of your data tables:

Component	Description	Example
Dimensions	Grouping columns	`region`, `product_category`, `fiscal_quarter`
Measures	Single-aggregate expressions	`SUM(revenue)`, `COUNT(DISTINCT customer_id)`, `AVG(order_value)`
Filters	Row-level predicates	`WHERE status = 'completed'`

Metric views are auto-generated per domain, focusing on KPIs that would appear in executive dashboards and quarterly business reviews. Each measure uses a single aggregate function (nested aggregates like AVG(SUM(...)) are not supported by metric view YAML and are automatically prevented).

🔧 Troubleshooting

Symptom	Likely Cause	Resolution
"No deployment catalog specified — skipping physical model deployment"	Widget 09 is empty	Set deployment catalog to deploy physical tables
Model has too few domains	MVM scope + tier_5 classification	Switch to ECM, or provide seed domains in widget 06
Model has irrelevant domains	LLM inferred from description	Vibe: `"Remove the X domain"`
FK pointing to wrong table	LLM linked incorrectly	Vibe: `"Redirect order.warehouse_id FK to logistics.warehouse"`
Circular dependency warning	DAG violation	Agent auto-fixes during QA; if persistent: `"Break cycles"`
SSOT violation (duplicate entities)	Same concept in multiple domains	Vibe: `"Run quality checks"` or `"Fix duplicates"`
Pipeline crashed mid-run	LLM timeout or transient error	Re-run with same inputs — agent cleans up incomplete versions
Model JSON file parse error	Invalid JSON (smart quotes, trailing commas)	Validate JSON; use straight quotes only
"Version X exists but is incomplete"	Previous run failed	Agent auto-detects — just re-run
Model too large / too many tables	ECM + high complexity tier	Use `shrink ecm`, or vibe: `"Keep only core business tables"`
Convention changes not applied	Convention widgets not set	Update the model convention widgets (12–24) with your desired values before running
Metric views have `COUNT(1)` instead of real KPIs	Nested aggregates were auto-replaced	LLM prompt prevents this; re-run metrics: `"Regenerate metrics"`

📘 Glossary

Term	Definition
Attribute	A column in a product (table) — has name, type, tags, description
Business Data Model	A data model tailored to a specific organization, generated by Vibe Modelling
Corporate Division	Supporting functions (HR, Finance, Legal) that enable but don't directly generate revenue
DAG	Directed Acyclic Graph — the required topology for FK relationships (no cycles)
Division	Top-level organizational grouping: Operations, Business, or Corporate
Domain	A logical grouping of related tables, deployed as a Unity Catalog schema
ECM	Expanded Coverage Model — comprehensive enterprise-grade model scope
FK	Foreign Key — a column referencing another table's primary key
Honesty Check	LLM self-assessment score (0–100%) to ensure output quality
Industry Data Model	A generic, vendor-published schema template for an entire industry vertical
Junction Table	An association table resolving M:N relationships between two entities
Metric View	A Databricks KPI definition with dimensions, measures, and filters
MVM	Minimum Viable Model — lean, production-ready scope (30–50% of ECM)
PK	Primary Key — the unique identifier column for a table
Product	A data table within a domain — a first-class business entity
SSOT	Single Source of Truth — each business concept has one and only one authoritative owner
Tier	Industry complexity classification (tier_1 = Ultra-Complex to tier_5 = Simple)
Vibe	A natural language instruction provided to the agent to modify the model
Vibe Modelling	The iterative process of generating and refining data models using natural language

Version history

Version	Change
v0.7.4	6 root-cause resilience fixes from the v0.7.3 vibe_runner ecomm 3x5 crash audit (run 309772240949880, FileNotFoundError on industries.json placeholder path) plus the principal-engineer SHRINK-NEW-SILO complaint ("the agent shouldn't fail because of that — it should correct the result and continue") and the broader autonomous-resilience review. Bumps `__AGENT_VERSION__` to `0.7.4` (alias=agent-version-global, mirrored into model.json as `agent_version`). Goal of the bundle: STOP wasting hours of compute on recoverable failure modes. NEW-1 [runner-single-biz-fallback FIRED] (alias=runner-single-biz-fallback) — CRITICAL resilience fix in `runner/vibe_runner.ipynb`. `load_business_context` previously raised `FileNotFoundError` whenever the `business_context` widget pointed at a missing or placeholder path (`/Workspace/Users/<your-user>/vibe-modelling/industries.json`) — the v0.7.3 standalone runner submission crashed at 6s with no recovery. Fix: introduces `_runner_is_placeholder_or_missing_path` (detects empty/`<your-user>`/`<user>` placeholders + `os.path.exists` check) and `_runner_build_single_business_context_from_widgets` which synthesises the industries.json structure from direct widget params (`business_name`, `business_description`, `model_vibes`, `business_domains`, plus all `WIDGET_VALUE_KEYS` with sensible defaults via `_SINGLE_BIZ_WIDGET_DEFAULTS`). The fallback fires on file-not-found, JSON-decode-error, empty-businesses-list, AND missing widget_values keys — every degenerate input that previously crashed the runner now degrades gracefully when a `business_name` widget is present. Existing multi-industry mode (real industries.json with multiple businesses) is fully preserved; the fallback is gated on `business_name` widget presence so it never silently overrides a real config. NEW-2 [shrink-orphan-drop-emptied → shrink-fk-densest-fallback FIRED] (alias=shrink-fk-densest-fallback) — CRITICAL resilience fix on the principal-engineer's complaint. When the SHRINK-NEW-SILO orphan-drop emptied `tables_to_keep`, v0.7.2 raised `ValueError` and killed the entire job after hours of compute. v0.7.4 instead engages a deterministic FK-densest fallback: counts FK in/out edges per source product, ranks descending, picks the top `K = max(3, min(dropped_orphans_count, source_products // 2))` products with FK relationships, and uses those as the MVM survivor set. `domains_to_keep`/`domains_to_remove` are recomputed from the new survivor set. NEXT_VIBES is filed with `SHRINK_FK_DENSEST_FALLBACK_USED` so the next vibe-iteration can refine. Only escalates to ValueError when source ECM has zero FKs (genuine structural problem requiring upstream fix). NEW-3 [shrink-cascade-iterate FIRED] (alias=shrink-cascade-iterate) — CRITICAL resilience fix on cascade-silo failure. When auto-dropping primary orphans introduced cascading silos, v0.7.2 raised `ValueError`. v0.7.4 iterates the auto-drop up to 5 rounds (each round: detect new silos, drop them, re-detect). On convergence (no remaining silos), proceeds with the trimmed survivor set + files `SHRINK_CASCADE_AUTO_RECOVERED` into NEXT_VIBES. On non-convergence after 5 rounds OR `tables_to_keep` becoming empty, falls back to the FK-densest fallback (NEW-2) as the final recovery layer. Eliminates the principal-engineer's "the agent shouldn't fail because of that" failure class entirely. NEW-4 [ensemble-singleshot-fallback FIRED] (alias=ensemble-singleshot-fallback) — CRITICAL resilience fix on the ALL-VARIANTS-FAILED ensemble path. When every parallel ensemble variant in `step_generate_logical_schema` failed (typically transient infra OR overly-aggressive validators rejecting all 3 outputs), v0.7.3 raised `ValueError` and killed the run. v0.7.4 retries ONE more time with reduced complexity: temperature=0.0, the smallest enabled Claude Sonnet endpoint, and forced `max_business_domains` clamped to ≤5 (or user's sizing_directive max if set). On success, `successful_variants = [single_shot]` and the judge-selection skip path takes over normally. NEXT_VIBES is filed with `ENSEMBLE_SINGLESHOT_FALLBACK_USED`. Only raises if the single-shot fallback also fails (genuine LLM-infra outage). NEW-5 [install-ddl-retry-skip FIRED] (alias=install-ddl-retry-skip) — HIGH resilience fix on `_execute_sql_parallel_core`. When `halt_on_error=True` and a single DDL statement fails with a known-recoverable error class (`TABLE_OR_VIEW_ALREADY_EXISTS`, `SCHEMA_ALREADY_EXISTS`, `COLUMN_ALREADY_EXISTS`, `OBJECT_DOES_NOT_EXIST`, `DEADLINE_EXCEEDED`, `CANCELLED`, `DELTA_CONCURRENT_`, `503`/`504`, `ProtocolChangedException`), the install no longer halts. Instead: 2-second backoff + retry; on retry success → counts as success; on retry confirming "already exists" → counts as success (idempotent); otherwise → log error + skip + continue (failed_count incremented but no halt). Non-recoverable errors (syntax errors, type mismatches) still halt as before. Eliminates the install-time "single transient flake kills the entire install" failure class. Patterns are explicitly enumerated — no blanket "ignore everything", just the genuinely-recoverable classes. NEW-6 [runner-failure-manifest FIRED]* (alias=runner-failure-manifest) — HIGH observability fix in `runner/vibe_runner.ipynb`. When ANY of the 5 task states (ECM gen, ECM install, ECM uninstall, MVM shrink, MVM install) is non-SUCCESS, the runner now writes a `_failure_manifest.json` file under `/Volumes/{install_catalog}/_metamodel/vol_root/business/{biz}/{scope}/_failure/` BEFORE calling `drop_catalog(staging)`. The manifest captures: agent version, business name, timestamps, all 5 task states + exit statuses, per-task warnings (top 30 each), failed_parts list, parent run/job IDs. This means post-mortems have evidence even AFTER the staging catalog is gone — previously, an early-exit install crash left zero diagnostics on the persistent volume. Manifest is written to BOTH ecm_v1 and mvm_v1 install volumes (one or both will exist after partial success). All write failures inside the manifest path are non-fatal (caught + logged). New §10.6 + §9.4 entries recommended. Behavioural tests in `tests/unit-tests/test_v074_behavioral.py` (covers all 10 fixes: alias presence, fallback paths, FK-densest math, cascade-iterate convergence, single-shot retry config, recoverable-error classification, manifest write site, folder_path discovery, fidelity attr/fk verifiers, inner-workflow error capture). NEW-7 [runner-folder-path-discovery FIRED] (alias=runner-folder-path-discovery) — CRITICAL resilience fix in `runner/vibe_runner.ipynb` `read_widgets()`. The hard-coded `folder_path = "./../models"` resolved to `/Workspace/Users/models` when the runner was deployed flat to user-root (production deploy convention), and `/Workspace/Users/` is `DIRECTORY_PROTECTED` — async-flush at notebook exit then raised `AsyncFlushFaliedException: DIRECTORY_PROTECTED: Folder Users is protected` and `dbutils.notebook.run()` propagated `Workload failed` back to vibe_tester, even though all 5 child tasks SUCCEEDED. Root cause class: same shape as v0.7.2 NEW-6 `runner-agent-multipath` (hardcoded relative path broken by flat deploy). Fix: derives `folder_path` from `dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath()` — anchors under `/Workspace/Users/<user>/vibe_runner_models` when runner sits under `/Users/<user>/`, falls back to `/tmp/vibe_runner_models` otherwise. `os.makedirs(folder_path)` is also wrapped in try/except with a second-chance `/tmp` fallback so a permission failure on the discovered path can't bubble. NEW-8 [fidelity-deterministic-attr-count FIRED] (alias=fidelity-deterministic-attr-count) — HIGH quality fix on `_verify_deterministic` (cell containing VibeOrchestrator). The deterministic verifier had no pattern for "attribute count range per product" (e.g. VREQ-003: "between 15 and 30 attributes per product") so the requirement was marked PARTIAL with "no specific pattern matched", which dragged precision below the 0.85 threshold and triggered `Fidelity gates FAILED: precision 0.6 < required 0.85 — rollback recommended` even on perfectly-shaped models. Fix: matches `between N and M attributes per product` / `N to M attributes per product` patterns; computes per-product attribute count from the live `attrs_by_product`; returns `fulfilled` if all products fall in [N,M], `partial` if ≤10% violate, `failed` otherwise. NEW-9 [fidelity-deterministic-fk-density FIRED] (alias=fidelity-deterministic-fk-density) — HIGH quality fix on the same verifier. No pattern existed for "every product must have at least one foreign key" (VREQ-004 class), so FK-density requirements were always PARTIAL even when the model genuinely had ≥1 FK per product. Fix: matches `every/each/all product .* foreign key/fk` patterns; computes per-product inbound + outbound FK counts; returns `fulfilled` if all products have ≥1 FK in or out, `partial` if ≤10% siloed, `failed` otherwise. NEW-10 [vibe-tester-inner-workflow-error-capture FIRED] (alias=vibe-tester-inner-workflow-error-capture) — HIGH observability fix in `tests/vibe_tester.ipynb` Test 00 except-handler. When `dbutils.notebook.run(runner_notebook, ...)` raised, the only message vibe_tester captured was the generic `Workload failed, see run output for details`; the actual exception (e.g. `AsyncFlushFaliedException: DIRECTORY_PROTECTED`) was buried in the inner WORKFLOW_RUN created by the `dbutils.notebook.run()` call, invisible to the test summary. Fix: on Test-00 exception, the handler now lists recent WORKFLOW_RUNs via `WorkspaceClient.jobs.list_runs(run_type="WORKFLOW_RUN", limit=20)`, finds the one whose task notebook_path matches `runner_notebook` AND whose result_state ends with `FAILED`, fetches its `error` + `error_trace` via `jobs.get_run_output`, and prepends them to the test_summary error message. Eliminates the "vibe_tester says workload failed but doesn't say WHY" debugging cliff. To be deployed as `dbx_vibe_modelling_agent_v74` (extended) BEFORE the next vibe_tester 3x5 ecomm run. Per CLAUDE.md user directive 2026-04-28 ("NO VERSION ROAD MAP, FIX ALL IN THE CURRENT VERSION"), all fixes ship under the same v0.7.4 stamp without rolling to v0.7.5. Validation: this version closes 6 of the 9 audit items + 4 additional regressions surfaced by the v0.7.4 vibe_tester run (DIRECTORY_PROTECTED, fidelity precision 0.6, mv-prevalidate cascade, inner-workflow error obscurity). NEW-11 [mv-stale-catalog-rewrite FIRED] (alias=mv-stale-catalog-rewrite) — CRITICAL R6/R2 regression fix on `step_apply_metric_views`. v0.7.4r2 vibe_tester ecomm 3x5 audit (run 106533651778252) showed MVM_v2 metric views (13/13) failing with `[NO_SUCH_CATALOG_EXCEPTION] Catalog 'ecomm_ecm' was not found`. Root cause: `vibe modeling of version` reads MVM_v1's source from a catalog (`ecomm_ecm`) that was already cleaned up by the runner pre-deployment, but the LLM-generated MV CREATE statements baked the OLD catalog into the YAML `source:` field instead of the install-target catalog (`config["TARGET_CATALOG"]`). The pre-existing `mv-fallback-emit-live` could not save the MVs because the fallback also referenced the same dead catalog. Fix: BEFORE invoking `execute_metric_views_in_parallel_no_halt`, the install path now (a) enumerates live catalogs via `SHOW CATALOGS`, (b) walks every MV statement with two regex passes — one for backtick-quoted three-part names `cat`.`sch`.`tbl` and one for unquoted `cat.sch.tbl` — and (c) rewrites any catalog reference that is NOT the install-target catalog AND not in the live-catalogs set to point at the install-target catalog. Protected catalogs (`system`, `samples`, `hive_metastore`, `__databricks_internal`, `main`) are NEVER rewritten so legitimate cross-catalog refs are preserved. Emits `[mv-stale-catalog-rewrite FIRED]` with rewrite count, statements touched, the stale catalog names, and the resolved target. The guard is wrapped in try/except so a SHOW CATALOGS failure cannot break MV install (logs a non-fatal warning instead). Closes the v0.7.4r2 13/13 MV failure cascade and unblocks `vibe modeling of version` end-to-end. NEW-12 [run-test-inner-workflow-error-capture FIRED] (alias=run-test-inner-workflow-error-capture) — HIGH observability fix in `tests/vibe_tester.ipynb` `run_test()` helper. NEW-10 (`vibe-tester-inner-workflow-error-capture`) only patched the Test 00 (`_try_test_00_vibe_runner`) exception handler; the FOUR install tests (`04_install_mvm_v1`, `07_install_mvm_v2`, `11_ctx_install`, `11c_xconv_install`) call the agent through the generic `run_test()` helper which had its own except path with no inner-workflow capture. Result: install failures bubbled up the generic `Workload failed, see run output for details` with no traceback — 4 install tests went blind in the v0.7.4r2 audit. Fix: the same WorkspaceClient enumeration + `WORKFLOW_RUN` filter + notebook-path match + `get_run_output` chain from NEW-10 is now also embedded in `run_test()`'s exception handler, so EVERY agent invocation surfaces the inner workflow's `error` + `error_trace` to the test_summary. Closes the "vibe_tester says workload failed but doesn't say WHY" cliff for install tests as well. NEW-13 [vibe-attr-cap-override FIRED] (alias=vibe-attr-cap-override) — HIGH user-vibe authority fix on the attribute-count cap. v0.7.4r2 audit caught VREQ-003 violations on 10/17 products: vibe said "between 15 and 30 attributes per product" but the LLM emitted 31-36 attributes per product. Root cause: the `_ATTR_BUFFER_FACTOR` allowed the LLM to overshoot the per-product cap and the existing `apply_vibe_authority_overrides` code had no specific override for the attribute range. Fix: AFTER `apply_vibe_authority_overrides(config, widgets_values)`, a regex now scans `model_vibes`/`business_description` for `between N and M attributes per product` (and `N to M attributes per product`) phrasing; on match, mutates `config["PROMPT_VARIABLES"]["min_attributes_per_product"]` and `["max_attributes_per_product"]` to the user's literal range AND records the original phrase + before/after values in `config["_VIBE_ATTR_CAP_USER_OVERRIDE"]` so downstream stages can audit the override. Emits `[vibe-attr-cap-override FIRED]` showing old→new bounds. The override happens BEFORE prompt variables are read by the worker prompts, so the LLM's instruction set carries the user's exact cap instead of the heuristic default. NEW-14 [mv-date-interval-autofix FIRED] (alias=mv-date-interval-autofix) — MEDIUM R6 recurrence fix in `_sanitize_metric_measure_expr`. v0.7.4r2 audit caught 1 MV per version (e.g. `order_shipment` ECM_v1) failing on `CAST((delivered_date - shipped_date) AS DOUBLE)` because Spark SQL's `DATE - DATE` returns `INTERVAL DAY` and `INTERVAL DAY → DOUBLE` is a `DATATYPE_MISMATCH`. The LLM kept emitting this pattern; the existing sanitizer had no rewrite. Fix: a regex inside `_sanitize_metric_measure_expr` detects two date-typed columns subtracted (with optional surrounding `CAST(... AS DOUBLE/FLOAT/...)`) and rewrites the expression to `DATEDIFF(date1, date2)` — yields a numeric output safely castable to any numeric. Heuristic: column names ending in `_date`/`_time`/`_at`/`_on`/`_day` or starting with `date_` are treated as date-typed (industry-agnostic). Emits `[mv-date-interval-autofix FIRED]` with the original→rewritten expression. Eliminates the recurring `INTERVAL DAY → DOUBLE` install failure class. Behavioural tests for NEW-11..NEW-14 added to `tests/unit-tests/test_v074_behavioral.py` (52 tests total in this file; covers alias presence, ordering invariants, protected-catalog list, SHOW CATALOGS guard, prompt-variables mutation, sanitizer location, anti-tautology).
v0.7.3	1 root-cause fix from a downstream agent's report against the v0.7.2 ecomm 5x10 vibe_tester run (709587786254819). Bumps `__AGENT_VERSION__` to `0.7.3` (alias=agent-version-global, mirrored into model.json as `agent_version`). NEW-1 [session-end-status-honest] (alias=session-end-status-honest) — MEDIUM observability/honesty fix on the App-facing run-status pill. `VibeWriter._finalize_common` (cell 9, around line 4127 in deployed form) emitted the `Vibe Session / Session Ended` sentinel with hard-coded `status="stage_ended"` regardless of whether the caller was `finalize_pipeline` (success) or `finalize_pipeline_error` (failure). The downstream Vibe App reads this status to colour the run pill — green for `stage_ended`, red for `stage_failed`. Hard-coding `stage_ended` on the error path was lying about the session outcome: failed runs showed up as green-pill "completed" in the App. Reproduced by the downstream Claude session monitoring an unrelated shrink failure. Fix: (a) `_finalize_common` now accepts `end_status="stage_ended"` keyword (default keeps backwards-compatibility with `finalize_pipeline`), (b) `finalize_pipeline_error` now passes `end_status="stage_failed"` so the App pill turns red, (c) the `_is_session_bookend` immediate-flush check at the bottom of `emit_step` now also recognises `"stage_failed"` for the "Vibe Session" stage so the failure event is flushed within the same emit_step call instead of waiting for the next 10-second background flush window — otherwise the App could poll between the failure emit and the bookend flush and still see the prior `stage_in_progress` status. The fix is sentinel-tagged twice (`alias=session-end-status-honest`) at both the call site and the bookend logic so post-deploy grep can mechanically prove both ends shipped. Behavioural tests in `tests/unit-tests/test_v073_behavioral.py` (10 tests, all passing) cover: version bump; `_finalize_common` signature includes `end_status="stage_ended"` default; Session Ended emit references the parameter not the literal; `finalize_pipeline` does NOT pass the parameter (default preserves green pill); `finalize_pipeline_error` MUST pass `end_status="stage_failed"`; alias appears at ≥2 sites (call + bookend); bookend tuple includes `"stage_failed"` for Vibe Session; `stage_failed` is in `_VALID_STATUSES`; `_open_step_ids` pop logic on terminal statuses still includes `stage_failed` (regression sentinel — would leak open step ids if removed). To be deployed as `dbx_vibe_modelling_agent_v73` BEFORE the next vibe_runner 3x5 ecomm run.
v0.7.2	3 root-cause fixes from the v0.7.1 airlines tiny live audit (run 1097831250061095, 48 min, all 5 tasks SUCCESS, ECM 18 products / MVM 16 products / quality 94/100 / physical-vs-model.json parity 100%) PLUS the Phase 6 Retailer SHRINK-NEW-SILO report received during the run. Bumps `__AGENT_VERSION__` to `0.7.2` (alias=agent-version-global, mirrored into model.json as `agent_version`). NEW-1 [self-ref-mem-json-sync FIRED] (alias=self-ref-mem-json-sync) — root-cause fix for the v0.7.1 N2 Fidelity-gates regression (`Fidelity gates FAILED: precision 0.5 < required 0.85 — rollback recommended` with 3 self-FK columns `crew.bid_period.prior_bid_period_id`, `crew.training_center.parent_training_center_id`, `flight.aircraft.replaced_aircraft_id` only in Memory not in JSON). The `SELF-REF-FIX` autofix mutates `widgets_values["attributes"]` in-place (adds new renamed FK rows + clears the old PK-row's `foreign_key_to`) but never persists the mutated list back to `ATTRIBUTES_FILE_PATH`. The next downstream step reads JSON, sees stale data, computes Memory-vs-JSON drift, and the fidelity gate fails at precision 0.5. New defense: immediately AFTER the SELF-REF-FIX `try/except`, compare `len(widgets_values["attributes"])` vs `len(json.load(ATTRIBUTES_FILE_PATH))`; on count drift, rewrite the JSON file from the in-memory state (Memory is more authoritative — it has the latest autofix output). Emits `[self-ref-mem-json-sync FIRED]` with mem→json counts so audit can grep how many attrs were synced. Defense-in-depth pattern: even if a future autofix mutates Memory after SELF-REF-FIX, this sync handles the drift class generically. NEW-2 [mv-prevalidate-keywords-extend] (alias=mv-prevalidate-keywords-extend) — root-cause fix for the v0.7.1 mv-column-prevalidate-drop false positives. The pre-installation MV column validator scans every `expr:` body for tokens, strips out function-call names and known SQL keywords, and treats remaining tokens as column references. The v0.7.1 keyword set missed `between`, `current_date`, `interval`, and ~80 other Spark SQL conditional/datetime/string/aggregate/window keywords, so any MV using `WHERE expiry_date BETWEEN current_date AND current_date + INTERVAL 30 DAY` had `between`, `current_date`, and `interval` parsed as missing column tokens — and the entire MV was dropped (4/12 in the v0.7.1 ECM, 4/16 in the MVM). The expanded set now covers SQL-92 conditional + datetime/system + window + aggregate + array/map + numeric/string functions. Drops will now only fire for genuinely missing columns. NEW-3 [shrink-orphan-drop FIRED + shrink-orphan-drop-cascade FIRED + shrink-orphan-drop-cleared FIRED] (alias=shrink-orphan-drop, alias=shrink-orphan-drop-cascade, alias=shrink-orphan-drop-cleared) — root-cause fix for the Phase 6 Retailer customer report (Databricks parent run 939656662583490): `v0.9.1 SHRINK-NEW-SILO` validator raised `ValueError` when LLM kept REAL product `customer.segment` as MVM survivor without keeping any of its FK partners. Validator's error message LIED about "validator will retry the LLM" — no retry exists in v0.7.0/v0.7.1, so the ValueError just bubbled up and failed the task. Reported failure rate: ~30% on first try. Fix is the deterministic counterpart of v0.7.1's `shrink-phantom-drop` (which only handled LLM-hallucinated product names). When `_new_silos` is non-empty, the validator now: (a) auto-removes the orphan products from `tables_to_keep` AND `domain_relocations` deterministically, (b) logs `[shrink-orphan-drop FIRED]` with the dropped pairs and a before→after count, (c) files a `SHRINK_ORPHAN_DROPPED` entry in `NEXT_VIBES` so the next vibe iteration can decide to keep+link the dropped product, (d) re-runs `_detect_post_shrink_silos` on the trimmed `tables_to_keep` to catch cascading silos that emerge from the drop, (e) on cascade, raises a CLEAR error message (no more lying) suggesting the next vibe text the user should provide, (f) on `tables_to_keep` becoming empty after auto-drop, raises a degenerate-plan error with the same instructive next-vibe suggestion, (g) on success, logs `[shrink-orphan-drop-cleared FIRED]`. The `elif _preexisting_silos:` was changed from `elif` to a separate `if` so the pass-through logic still fires for input-side silos that were not introduced by the shrink. Behavioural tests in `tests/unit-tests/test_v072_behavioral.py` (15 tests, all passing) cover: version + first-line + semver checks; per-fix alias presence; mem-json sync ordering AFTER SELF-REF-FIX block but BEFORE post-finalize invariants; SQL keywords list contains `between`, `current_date`, `interval`, etc.; orphan-drop ordering INSIDE the shrink-direction `if` branch and AFTER `_new_silos` computation; cascade detection re-runs `_detect_post_shrink_silos` on trimmed set; degenerate-plan check runs when `tables_to_keep` becomes empty; NEXT_VIBES.add fired with `SHRINK_ORPHAN_DROPPED` rule_id; no tautology — orphan-drop branch genuinely mutates `tables_to_keep` (different sizes before vs after).
v0.7.1	3 root-cause fixes from the v0.7.0 airlines failed-shrink audit (failed runs 76240803654456 → child shrink attempts 210677821854565 + 450474006514631 + killed run 453386975947787). Bumps `__AGENT_VERSION__` to `0.7.1` (alias=agent-version-global, mirrored into model.json as `agent_version`). NEW-1 [shrink-llm-malformed FIRED] (alias=shrink-llm-malformed, alias=shrink-llm-malformed-summary) — CRITICAL crash fix in `_run_resize_model` shrink branch. The first child run failed with `AttributeError: 'str' object has no attribute 'get'` because the LLM returned `domains_to_remove` items as strings instead of `{domain, reason, tables_to_relocate}` dicts; the unguarded `removed_domain.get("tables_to_relocate")` blew up. Now every parser slot (`tables_to_keep`, `domains_to_remove.tables_to_relocate`, `domains_to_keep`, `domains_to_remove`) defends with `isinstance(item, dict)` + log-and-skip, plus a final `[shrink-llm-malformed-summary FIRED]` count. NEW-2 [shrink-phantom-drop FIRED] (alias=shrink-phantom-drop) — CRITICAL false-positive fix on the SHRINK-NEW-SILO validator. The second shrink attempt crashed with `LLM shrink plan introduces 2 NEW siloed table(s) not present in input: ['legal_entity', 'roster_period']` because the LLM hallucinated `tables_to_keep` entries whose product names don't exist in the source ECM. The silo validator then walked these phantom names against `attributes_data`, found no FKs (because the products don't exist), and falsely classified them as silos. Now, BEFORE the silo validator runs, `tables_to_keep` is filtered against `_source_product_names_set = {p['product'].lower() for p in products_data}`; phantoms are logged + dropped, and the silo validator sees only real surviving products. `domain_relocations` is filtered with the same set so downstream FK rewrites stay consistent. NEW-3 [fmfl-final-sanitize FIRED] (alias=fmfl-final-sanitize, alias=fmfl-final-sanitize-summary) — eliminates the F2/R7 `Max retries (3) exhausted. Proceeding with last response despite validation errors` soft-accept hatch on `find_missing_fk_links_*`. Observed in killed run 453386975947787 at 14:40:23 for `find_missing_fk_links_workforce`: LLM kept proposing `LINK target='workforce.work_schedule.work_schedule_id'` for `timesheet.schedule_id`; `_validate_fmfl` correctly rejected 3× with stem suggestions but max-retry-exhaust soft-accepted the bad LINK. `_fmfl_postprocessor` now performs an unconditional final sweep that auto-coerces any LINK whose `target_table` is non-canonical (using the same `_fmfl_canonical_entities` set + `_fmfl_normalise_target` helper as the validator) to `KEEP_AS_IS` with `target_table=None` and an `[AUTO-COERCED-FINAL]` reasoning prefix. Behavioural tests in `tests/unit-tests/test_v071_behavioral.py` (21 tests, all passing) cover: version + first-line + semver checks; per-fix alias presence; isinstance-guard placement (within 800 chars BEFORE the `removed_domain.get` call); phantom-drop ordering BEFORE the silo validator; phantom-drop ALSO filters `domain_relocations`; sanitize runs INSIDE `_fmfl_postprocessor` BEFORE the actual_counts recomputation; sanitize uses the canonical-entities set (DRY); 3× anti-tautology checks proving each branch genuinely mutates state.
v0.7.0	3 root-cause fixes from the v0.6.9 telecom MVM + VOV audit (run 983512034890158, 50.5 min, SUCCESS). Bumps `__AGENT_VERSION__` to `0.7.0` (alias=agent-version-global, mirrored into `model.json` first key as `agent_version` per v0.6.9 contract). NEW-1 [vov-metrics-teardown FIRED] (alias=vov-metrics-teardown) — HIGH severity teardown asymmetry. `_early_clash_detection` for `shrink ecm` / `enlarge mvm` / `vibe modeling of version` previously CASCADE-dropped every domain schema in the target catalog at run start but kept `_metrics` in the `_INTERNAL_SCHEMAS` skip-set alongside `_metamodel`. The skip was a category error: `_metamodel` IS legitimately protected (it holds the v1 model SOURCE OF TRUTH the VOV reads), but `_metrics` is purely user-facing derived artefacts. Result: every VOV run left 80+ zombie metric views in `_metrics` whose `SELECT` defs pointed at now-dropped tables — `SELECT * FROM _metrics.billing_invoice` returned `[UC_DEPENDENCY_DOES_NOT_EXIST] Table 'telecom_mvm.billing.invoice' does not exist` for the entire 30-50 min window between VOV start and Step 9b metric-view rebuild, and permanently if VOV failed mid-run. Fix: split into `_INTERNAL_SCHEMAS` (audit-only superset) and `_VOV_PROTECTED_SCHEMAS = {"_metamodel", "default", "information_schema"}` (the actual cleanup keep-list with `_metrics` REMOVED). Per-schema drop counters (`_metrics_drop_count`, `_domain_drop_count`) and a `[vov-metrics-teardown FIRED]` log line make the cleanup grep-auditable. New §9.4 R9 catalogue entry recommended for the audit playbook. NEW-2 [mv-measure-agg-wrap FIRED] (alias=mv-measure-agg-wrap) — CRITICAL metric-view emitter regression (R6 PRESENT in v0.6.9 telecom VOV v2). UC rejected v2 metric views with `[METRIC_VIEW_MISSING_MEASURE_FUNCTION] The usage of measure column [total_invoice_count, total_billed_amount, …] requires aggregation function`. Root cause: `_sanitize_metric_measure_expr` had no final guard — when the LLM emitted a measure with a bare column reference (e.g. `expr: total_invoice_count`), the sanitizer's nested-aggregate / multi-aggregate / arithmetic / token-cast paths all fell through and returned the bare reference unchanged. UC then refused the DDL. Fix: added a final block that runs `_count_metric_aggregate_calls(expr) == 0` AFTER all other passes. When zero aggregates remain, wrap the whole expression in `SUM(CAST((expr) AS DOUBLE))` if the expression is safe (no string literals, no CASE bodies); else fall back to `COUNT(1)`. Both paths emit `[mv-measure-agg-wrap FIRED]` warnings with the original + wrapped expressions for grep auditing. Existing aggregates (`SUM(amount)`, `COUNT(1)`, `AVG(score)`) are untouched (no double wrapping). Closes the ~30+ measure DDL failures observed in v0.6.9 v2 install (`Failed metric view 'billing_invoice' UNRESOLVED_COLUMN`-class). NEW-3 [step-boundary-flush FIRED] (alias=step-boundary-flush) — MEDIUM observability regression (R3 NEW SITE in v0.6.9 telecom VOV v2). The 50-min run completed SUCCESS but its volume info.log captured only ~25 min of `Executing Pipeline Step` / `Finished Step` markers (last entry at attribute 27/127 of Step 4, then 25 minutes of silence followed by `[VolumeLogFlush][SAFE-FLUSH]` periodic markers with no Python-logger lines). The Python `logger` handler appears to fall silent late in long VOV runs — root cause uncertain (handler buffer flush timing? thread that overrides stdout? subprocess that swallows output?), but the symptom is total loss of mid-pipeline observability. Fix: new helper `_step_boundary_force_flush(boundary_kind, step_name)` invoked from BOTH `_log_step_start` and `_log_step_end`. The helper does two things: (1) force-flushes every `logger.handlers` entry; (2) raw-writes a `[STEP-BOUNDARY] kind=start
v0.6.9	Versioning-traceability bundle (zero behavioural change to model generation). Adds the global `__AGENT_VERSION__ = "0.6.9"` constant as the first line of code in the agent notebook (alias=agent-version-global), mirrors the same value into every generated `model.json` as the first top-level property `agent_version` (alias=agent-version-mirror), and preserves it through every model.json rewrite path: `step_apply_metric_views` writeback, install-step metric-view cleanup writeback, and install-step location update writeback (each path explicitly refreshes the key to the running agent version, so an older v's model.json gets re-stamped on rewrite). New CLAUDE.md §11 documents the rule going forward — every test cycle bumps `__AGENT_VERSION__` BEFORE the deploy step (per §3a single-digit semver, never two-digit segments), and every release-tag-to-main bump must precede the commit. Behavioural tests in `tests/unit-tests/test_v069_behavioral.py` cover: (a) `__AGENT_VERSION__` is the very first non-comment line of code in the agent notebook, (b) it is a single-digit semver string `"0.6.9"`, (c) `model.json` writes include `agent_version` as the first top-level key with the same value, (d) every rewrite path (`_mj_root` in metric-view writeback, `_parsed_root` in install-cleanup, `_updated_root` in install-location) refreshes/prepends the key, (e) readme `Current version` line matches `__AGENT_VERSION__`. To be deployed as `dbx_vibe_modelling_agent_v69` BEFORE the next telecom MVM + vov run.
v0.6.8	3 root-cause fixes from the v0.6.7 telecom MVM v1 live-pulse audit (2026-04-27, run 363792176281638). Run survived a global Databricks platform outage (attempt #0 INTERNAL_ERROR after 61 min, trace 8259555029405006/5096161698269) thanks to the JOB auto-retry; attempt #1 progressed cleanly through Step 8b but the live error.log exposed three issues that must be fixed before the next ECM run. NEW-13 [fmfl-auto-coerce-keep FIRED] (alias=fmfl-auto-coerce-keep) — CRITICAL §10.6 fix: `[fmfl-canonical-target]` validator (added in v0.6.6 NEW-8) correctly rejected LINK targets whose `(domain, product)` was non-canonical, BUT when the stem-suggestion list was empty (`NO STEM-MATCH FOUND`) the LLM had no canonical alternative to pick. It re-proposed the same broken LINK on attempts 2 and 3, exhausting `Max retries (3)` and falling back through the F2 soft-accept hatch — a §10.6 hard-zero violation. Five live occurrences in v0.6.7 telecom v1 attempt #1 (find_missing_fk_links_billing/customer/network/device/product) for non-canonical targets `workforce.employee.employee_id`, `customer.user.user_id`, `customer.agent.agent_id`, `network.vlan.vlan_id`, `device.warehouse_location.warehouse_location_id`. Root cause: validator told the LLM to "change decision" but the LLM kept retrying LINK with the same target. Fix: in `_validate_fmfl`, when the LLM proposes a LINK whose target is non-canonical AND `_fmfl_suggest_canonical(column)` returns empty, MUTATE the dec in place to `decision='KEEP_AS_IS'`, `target_table=None`, prepend an `[AUTO-COERCED:]` reasoning trail, emit `[fmfl-auto-coerce-keep FIRED]` warning with column path, and `continue` to skip the error.append() — so the validator returns success and the retry loop never enters. The error path remains for cases where suggestions DO exist (LLM has alternatives to converge on). Industry-agnostic: the auto-coerce only fires when there is provably no canonical target. Expected: `Max retries (3) exhausted` count from FMFL drops to zero on next telecom run. NEW-12 [log-append-on-retry FIRED] (alias=log-append-on-retry) — MEDIUM observability: when attempt #0 of a Databricks task fails (e.g. INTERNAL_ERROR from the platform-wide outage) and the JOB retry policy spawns attempt #1, both attempts open the SAME volume log path (`{biz}_info_v{N}_{stage}.log`) because the path is keyed by version, not attempt. The agent's `ImmediateFlushFileHandler` was opened with `mode='w'`, which TRUNCATED the file on attempt #1 — wiping every diagnostic line attempt #0 had written before crashing. Audit lost ~4500 info lines including a successful Step 8b. Root cause: handler open mode didn't differentiate "first run" from "retry of same task". Fix: `_log_open_mode = 'a' if os.environ.get('DATABRICKS_RUN_ID') else 'w'` — under Databricks (where retry semantics matter) we APPEND so attempt #N+1 preserves attempt #N's logs; locally we still truncate so dev runs don't bloat. Both `fh_info` and `fh_error` use the variable. Emits `[log-append-on-retry FIRED]` with the active mode and the run/task id at logger init for audit traceability. NEW-14 [perf-cap-16 FIRED + perf-llm-throttle-16 FIRED] (alias=perf-cap-16-emit) — LOW observability: v0.6.4 B1 (`perf-cap-16`) and v0.6.4 B8 (`perf-llm-throttle-16`) lifted concurrency caps from 8→16 but only documented the change in code comments — no runtime `[FIRED]` markers were emitted. Auditing the live v0.6.7 telecom run could only INFER concurrency via `[LLM Throttle] AIAgentManager will use 16 concurrent calls` lines (good but not §8.1-aliased). Fix: `_compute_max_concurrent_batches_for_32gb` now emits `[perf-cap-16 FIRED] n_attrs=N raw=R capped=C alias=perf-cap-16` once per run (idempotent via `_fired_once` sentinel — the function is called many times so without the sentinel logs would spam). `AIAgent.__init__` now emits `[perf-llm-throttle-16 FIRED] AIAgentManager max_concurrent_ai_calls=N alias=perf-llm-throttle-16`. v0.6.4 B3 `[perf-mv15-parallel FIRED]` was already runtime-emitted and remains. Behavioral tests in `tests/unit-tests/test_v068_behavioral.py` (23 tests, all passing) cover: NEW-13 alias + FIRED marker + KEEP_AS_IS coerce + audit trail + error.append-skip via `continue` + reasoning preservation + idempotency + standalone integration tests for "no candidates → coerce" and "candidates exist → reject as before"; NEW-12 alias + FIRED marker + DATABRICKS_RUN_ID env decision + handler arg substitution + no-stale-mode='w'-on-LOCAL_INFO_LOG_PATH; NEW-14 perf-cap-16 + perf-llm-throttle-16 FIRED markers + perf-mv15-parallel regression preserved + `_fired_once` sentinel + industry-agnostic body; readme version bump regression; all 12 v0.6.x aliases preserved. To be deployed as `dbx_vibe_modelling_agent_v68` BEFORE the next telecom MVM v1 + vov pass.
v0.6.7	1 critical root-cause fix on top of v0.6.6 NEW-10. Critical issue surfaced: the v0.6.5 telecom vov v2 run produced `metric_views = []` (0 MVs vs v1's 92) because surgical fast path skipped Step 8d. v0.6.6 NEW-10 added a preservation pass — but it was a HALF-fix (cover-up #2 from the v0.6.6 brutal honesty audit). The preservation logic conservatively DROPPED any MV whose owner_product or referenced products had been renamed by surgical mutations (e.g. `customer.case` → `customer.customer_case`, `billing.account` → `billing.billing_account`). On a real telecom vov run this still drops ~all MVs because surgical mode RENAMES products, not deletes them. NEW-11 [surgical-mv-rewrite FIRED + surgical-mv-preserve FIRED] (alias=surgical-mv-rewrite): full SQL rename-rewriter in `_preserve_baseline_metric_views_for_surgical`. (1) Helper now reads BOTH `metric_views` AND `products` from v1's model.json. (2) Computes `_rename_map` by diffing `(domain, product)` sets between v1 and v2 — for every v1-only product, finds the best v2-only match using a layered heuristic: score 100 if v2 product matches `_p074_qualified_rename(v1_prod, v1_dom)` either in raw PascalCase or after `apply_convention(..., naming_conv)` canonicalisation (matches both `customer_account` snake_case AND `CustomerAccount` PascalCase); score 80 for exact stem match or `_<stem>` suffix; score 60 for substring containment (v1 stem in v2 prod); score 50 for reverse substring (v2 in v1, len ≥ 3). Industry-agnostic — no business names, no domain-specific patterns. (3) New `_rewrite_sql_via_rename_map(sql, rename_map)` helper splits SQL on string literals (`'...'` and `"..."`), iterates non-literal segments only, and applies a case-insensitive backtick-aware regex `(?i)(?<![A-Za-z0-9_])`?<old_d>`?\s\.\s`?<old_p>`?(?![A-Za-z0-9_])` substitution per rename. Word-boundary lookarounds prevent matching identifier substrings (e.g. `customer.case_history` is NOT rewritten when `customer.case → customer.customer_case`). Returns `(rewritten_sql, hit_count)`. (4) Per-MV processing now ATTEMPTS rewrite before dropping: owner update if owner is in rename_map; SQL rewrite if any rename_map keys appear; only then validate refs against v2 product set. Tracks `_rewritten_count` and reports `unchanged=N rewritten=M dropped=K` in the final FIRED log. (5) FIRED marker now ALWAYS emits regardless of outcome (no-baseline, zero-MVs, no-current-products, all-dropped, success) so audits can mechanically prove the helper executed. Behavioral tests in `tests/unit-tests/test_v067_behavioral.py` (37 tests, all passing) cover: rename-map inference (correct param order to `_p074_qualified_rename(product, domain)`), SQL rewriter on backticks/case-insensitive/string-literal-skip/word-boundary/multi-rename, owner-update on rename, SQL-rewrite-before-validate ordering, industry-agnostic body, FIRED-on-all-paths, all v0.6.x regression aliases preserved. To be deployed as `dbx_vibe_modelling_agent_v67` BEFORE the next vov / ECM run.
v0.6.6	3 root-cause fixes from the v0.6.5 telecom vov v2 audit (2026-04-27, run 332859013260279). v0.6.5's NEW-3/4/5 fixes all fired correctly during the run, but the post-terminal audit + brutal-cover-up sweep exposed 3 new issues that must be fixed before the next ECM run. NEW-7 [collision-naming-canonical FIRED] (alias=collision-naming-canonical): v0.6.5's NEW-5 stem-autofix (Pass 3 of `_validate_product_name_collisions`) renamed `customer.customer_account` and `interconnect.interconnect_dispute` correctly — but `_p074_qualified_rename` produced PascalCase outputs (`CustomerAccount`, `InterconnectDispute`) that bled into a model whose `data_asset_naming_convention` is `snake_case`. Result: NEW-5 self-introduced naming-convention regressions on every rename. Root cause: NEW-5's rename codepath had no canonicalisation step. Fix: extend `_validate_product_name_collisions` signature with `config=None`, read `config["MODEL_CONVENTIONS"]["data_asset_naming_convention"]` (default `snake_case`) once per call, and pipe EVERY rename candidate (Pass 1 domain-name collisions, Pass 2 raw cross-domain duplicates, Pass 3 stem-based cross-domain duplicates, AND the suffixed `{name}{N}` collision-resolver candidate) through a nested `_canonicalise_p074(name)` helper that calls `apply_convention(name, convention=..., dedup=False)` (dedup=False preserves the qualified domain prefix). Both call sites (`_pre_static_analysis_autofix` and `step_create_logical_schema` autofix) now pass `config=config`. Emits `[collision-naming-canonical FIRED]` on first rewrite per call. Honours §3 (root cause) and §3d (reuse `apply_convention` not a new normaliser). NEW-8 [fmfl-canonical-target FIRED] (alias=fmfl-canonical-target): in v0.6.5 telecom v2's `find_missing_fk_links_order` step, the LLM proposed FK LINKs to `billing.rated_event` (renamed by NEW-5 to `usage.usage_rated_event` upstream) — the validator rejected each retry as "decision applies to a column that wasn't requested" but never told the LLM the canonical product name had moved. Three retries later → §10.6 hard-zero `Max retries (3) exhausted` (F2 silent-accept hatch fired; 5 invalid FKs proceeded as KEEP_AS_IS). Root cause: validator only checked `column ∈ requested set`, not `target ∈ canonical entity set`. Fix: in `_validate_fmfl`, build a canonical `_fmfl_canonical_entities = {f"{domain}.{product}"}` set from the current `products_data`. For every LINK decision, normalise `target_table` (strip catalog, take last 2 segments) and reject when `(domain, product)` is not in the canonical set. The rejection message now includes stem-based "did you mean" suggestions via `_fmfl_suggest_canonical(column_name)` — score 100 for exact stem match, 80 for prefix/suffix `_<stem>`/`<stem>_`, 50 for substring (≥4 chars). Empty match falls back to "NO STEM-MATCH FOUND — change decision to CREATE/DROP/KEEP_AS_IS" so the LLM has a deterministic recovery path on retry 2. Industry-agnostic: scoring is purely lexical, no business-name knowledge. Expected: F2 retries-exhausted on FMFL drops to zero on telecom v3 / next ECM. NEW-10 [surgical-mv-preserve FIRED] (alias=surgical-mv-preserve): v0.6.5 telecom vov v2 had Step 8 / 8b / 8c / 8d skipped under surgical fast path. Step 8d is metric-view generation; skipping it left `widgets_values["_metric_view_records"]` and `["metric_view_statements"]` empty, so the model.json writeback section persisted `metric_views = []` — telecom went from 92 MVs in v1 to 0 MVs in v2. The physical install dropped to 0 MVs as well. Root cause: surgical fast path had no MV preservation pass. Fix: new helper `_preserve_baseline_metric_views_for_surgical(widgets_values, config, logger)` reads v1's `model.json` from `prev_volume = TARGET_VOLUME.replace(f"{scope}_v{current}", f"{scope}_v{base}")`, extracts `metric_views[]`, and filters records where (a) `(owner_domain, owner_product)` still exists in v2's products set and (b) every `dom.prod` / `cat.dom.prod` reference in `sql` resolves to a current product. Drops are logged with the reason. Survivors are written back to `widgets_values["_metric_view_records"]` and `["metric_view_statements"]` so the existing apply + writeback paths preserve them in v2's model.json AND deploy them physically. Conservative scope: NO SQL rewriting (a renamed product → MV is dropped; rewriting introduces fragile string-substitution edge cases out of scope for v0.6.6); NO industry-specific filters; only fires when `_surgical_fast_path` is true (zero risk to full-path runs). Call site is the new `else:` branch of `if not _surgical_fast_path:` in the orchestrator — gated explicitly so non-surgical paths cannot accidentally double-process. Behavioral tests added in `tests/unit-tests/test_v066_behavioral.py` (31 tests, all passing). To be deployed as `dbx_vibe_modelling_agent_v66` BEFORE the next ECM smoke.
v0.6.5	3 root-cause fixes from the v0.6.3 telecom MVM v1 live-run audit (2026-04-27, run 922023973905434), bundled on top of the v0.6.4 perf bundle. v0.6.3 v1 shipped CLEAN on §10.6 strict criteria but the audit identified three telemetry/quality gaps that should be hardened before the vov pass consumes the v1 next_vibes. NEW-3 [IMMUTABLE-EARLY-EXIT FIRED] (alias=immutable-early-exit): `smart_worker_loop` consumed all 3 retries when `MODEL_ARCHITECT_REVIEW_PROMPT` repeatedly proposed forbidden mutations of protected products (`customer.subscriber`, `customer.account` in v0.6.3 telecom v1). The validator correctly REJECTED each retry via `[CRIT-PATTERN-MATCH] alias=immutable-violation-critical`, but the loop still emitted `Max retries (3) exhausted` — a §10.6 hard-zero signature. Root cause: the LLM is fixated on a forbidden mutation class and further retries cannot help, but the loop has no way to detect "the same class repeating". Fix: track `_consec_immutable_failures` across attempts using `_IMMUT_PATTERNS = ('immutable violation', 'cannot move protected', 'cannot rename protected', 'cannot remove protected', 'cannot delete protected', 'cannot merge protected', 'cannot split protected')`; on each immutable-class hit, prepend a HARD preamble to the next attempt's `previous_run_feedback` that echoes `protected_domains_list` + `protected_products_list` VERBATIM with explicit "DO NOT remove / rename / merge / split / move" labels (CLAUDE.md §3c — user vibes win); short-circuit with `return False, last_valid_response, last_errors` once 2 consecutive immutable failures occur AND there's still a retry slot — never emitting the §10.6 `Max retries exhausted` line. Architect's iteration is skipped (no model mutation), defense-in-depth keeps the model intact, downstream LLM-architect-iter logic gracefully proceeds with the prior iteration's verdict. NEW-4 [LLM-JSON-RECOVERABLE FIRED] (alias=llm-json-recoverable): one `gpt-oss-120b` call in v0.6.3 telecom v1 emitted `Unterminated string` JSON output; the per-call retry recovered cleanly, but the original failure was logged at WARNING level and counted in the audit's error inventory as noise. Root cause: the per-call retry loop in `_call_ai_query_impl` doesn't distinguish JSON-decode-class errors (a normal stochastic failure mode the retry naturally heals) from genuine non-recoverable failures. Fix: detect JSON-decode-class errors via lowercase pattern match — `'unterminated string'`, `'expecting value'`, `'expecting property'`, `'invalid \\\\escape'`, `'extra data'`, `'invalid control character'`, `'invalid character'`, `'jsondecodeerror'` — and when one occurs AND a retry will run, log at INFO with `[LLM-JSON-RECOVERABLE FIRED]` alias instead of WARNING. Genuine non-recoverable failures still escalate via the unchanged final-retry ERROR path. Reduces audit-inventory noise without hiding real signal. NEW-5 [P0.74-COLLISION-STEM FIRED] (alias=ssot-stem-autofix): `_validate_product_name_collisions` Pass 1 + Pass 2 only matched RAW-name duplicates (e.g. `network.trouble_ticket` vs `service.trouble_ticket`), but the static-analysis SSOT check (~line 69640) uses `strip_domain_prefix` to compare STEMS, so cases like `customer.customer_address` vs `order.order_address` (both stem to `address`) were flagged as `cross_domain_duplicate` warnings in `next_vibes.txt` even though the autofix never touched them. Result: 8 cross_domain_duplicate findings in v0.6.3 v1's next_vibes that the LLM was then expected to resolve via vov — wasted cycles for a deterministically-resolvable class. Root cause: the autofix and the static-analysis SSOT detector used different comparison semantics. Fix: extend `_validate_product_name_collisions` with Pass 3 — STEM-BASED cross-domain duplicates, mirroring the SA logic exactly: same `strip_domain_prefix(_pn, _pd)` stripper, same `len(_stem_l) < 4` filter, same `_CD_GENERIC_STEMS_AUTOFIX = {'payment', 'order', 'status', 'type', 'code', 'rate', 'fee', 'charge', 'rule', 'policy', 'config', 'setting', 'note', 'comment', 'log', 'history', 'audit', 'report', 'schedule', 'assignment', 'allocation'}` exclusion. Owner-pick rule: keep the first (sorted) domain unchanged, qualify the rest via `_p074_qualified_rename(_stem_l, _pd)` with sibling-collision suffix (`{name}2`, `{name}3`, ...). Updates the SAME `rename_map` already used by Passes 1+2, so the existing post-pass propagation (attribute.product rewrite + foreign_key_to rewrite) automatically covers the new renames. Self-reports `[P0.74-COLLISION-STEM FIRED]` per rename. Idempotent: if Pass 1+2 already cleaned the model, Pass 3 is a no-op. Industry-agnostic: same generic-stem allowlist as the SA. Net effect: cross_domain_duplicate findings in next_vibes now drop to near-zero on the autofix pass — vov no longer needs to spend mutations on a class the autofix can resolve. Behavioral tests added in `tests/unit-tests/test_v065_behavioral.py` (19 tests). To be deployed as `dbx_vibe_modelling_agent_v65` BEFORE submitting the vov pass on the v0.6.3 v1 baseline.
v0.6.4	3 performance fixes from the v0.6.3 telecom MVM v1 live-run audit (2026-04-27, run 922023973905434, 193.5 min total / 186.4 min generate). v0.6.3 v1 shipped clean (0 ERROR-level lines, 0 cycles, 0 silos, 0 fidelity failures, 0 max-retries-exhausted, 168/168 products with PKs, 100% physical-logical parity: 168 tables across 10 schemas + 92 metric views, quality score 70/100) but the run took 3h 6min and the bottleneck audit pinpointed three concurrency caps that throttled every parallel pool to ~8 workers when the 32GB serverless executors had headroom for 16. B1 [perf-cap-16 FIRED] (alias=perf-cap-16): `_compute_max_concurrent_batches_for_32gb` capped its return at `min(raw, 8)`, which then populated `config["MAX_CONCURRENT_BATCHES"]` and cascaded into >20 callsites — Step 4 attribute generation (45.2 min for 4783 attrs / 127 products), Step 4.6 normalization (11.6 min), Step 5 in-domain linking, Step 6 cross-domain, Step 7 QA, MV15 semantic gate, ambiguous FK resolver, architect reviews — every one of them ran 8-wide. Fix: lift the cap to `min(raw, 16)` and the raw upper bound to 24, with a stronger floor (8) for very large attribute counts. Empirical justification: v0.6.3 v1 ran at 8 workers with peak heap well under 12GB observed; doubling concurrency stays under the 32GB working-set safe-zone. Test `CONCURRENCY_CAP_16` updated. B8 [perf-llm-throttle-16 FIRED] (alias=perf-llm-throttle-16): two callsites further squeezed the LLM-call layer to `min(10, max_batches)` — once at orchestrator init (`step_setup_and_clean`) and once at `AIAgent.__init__`. After B1 these became the new bottleneck (16 workers requesting LLM calls but only 10 slots). Fix: lift both ceilings to 16 so the LLM throttle matches the worker pool. B3 [perf-mv15-parallel FIRED] (alias=perf-mv15-parallel): `run_fk_semantic_correctness_gate` (MV15) used a sequential `for batch_start in range(...)` loop. With 477 cross-domain FKs in v0.6.3 telecom run, 32 batches × ~26s serially = 14 min wall time. Fix: extract the per-batch LLM call + classification logic into `_process_one_mv15_batch` and execute via `run_parallel_with_rate_limit_backoff` (the same pool the ambiguous-FK resolver uses), with a `_mv15_count_lock` guarding the aggregate counters. Per-batch attribute mutations are safe because `cross_domain_fks` is partitioned by index ranges — no two batches touch the same `_attr_ref`. The greedy-first → halving recovery ladder (v0.8.0 C2) is preserved unchanged. Estimated reduction: 14 min → ~1 min (16 workers × 2 rounds). NOT applied (per user directive): B2 (Step 6A global cross-domain linking — kept sequential per user instruction), B6 (LLM model swaps — user said "do not change any LLMs for any step"), B7 (normalization batch-size + model swap — same reason). B4 (ambiguous FK resolver) and B5 (Domain Architect Review Step 3.6) were already parallel via `MAX_CONCURRENT_BATCHES`; they auto-scale to 16 via B1. Combined estimated impact: 186.4 min v1 → ~115-130 min (-30 to -38%). No semantic changes; no model output should differ. Behavioral tests added in `tests/unit-tests/test_v064_behavioral.py`. To be deployed as `dbx_vibe_modelling_agent_v64`.
v0.6.3	2 root-cause fixes from the v0.6.2 telecom MVM v2 (vov) live-run audit (2026-04-27). v0.6.2 fixed REG-1 (vov auto-loading next_vibes) so the v2 surgical-mode pipeline now consumed v1's 95-priority next_vibes file, but a deeper hatch was exposed during the live run: `VibeOrchestrator.master_analyze` invokes the unified `VIBE_MASTER_PROMPT` with the entire 31KB next_vibes text plus the model context as a single LLM call, and the call timed out at the Databricks SQL 720s limit for every retry. The error path returned `_master_actions=[]` and `step_interpret_model_instructions` proceeded with an empty action list — silently dropping all 80+ user-requested mutations. Adherence collapsed from the expected ~80% back to ~3%, undoing the entire purpose of REG-1. NEW-1 [det-priority-parse FIRED + det-priority-fallback FIRED + det-priority-augment FIRED] (alias=det-priority-parse): a deterministic regex-based parser for the structured `PRIORITY N — <type>: <target> — <body>` directives in `next_vibes.txt`. The parser uses a single multi-line DOTALL regex (`_PRIORITY_DIRECTIVE_PATTERN`) followed by `_convert_priority_to_action` which maps each priority to an action dict the existing action loop already handles (`rename/product`, `rename/attribute`, `rename/domain`, `create_attribute/attribute` with JSON-encoded `{type, foreign_key_to, tags}`, `connect_table/product`, `remove_fk/attribute`, `add_tag/attribute`, `drop/attribute`, `drop/product`, `move_product/product`, `create/domain`, `create/product`). Verified on the live v0.6.2 telecom v1 next_vibes.txt: 95/95 priorities parsed (100% coverage), 0 skipped non-deterministic types. Three integration sites in `step_interpret_model_instructions`: (1) det-priority-parse FIRED — always runs against the vibe text alongside the LLM master analysis; (2) det-priority-fallback FIRED — when master analysis is empty/failed (timeout), the deterministic actions become the entire action list rather than `[]`; (3) det-priority-augment FIRED — when master analysis succeeded but produced FEWER actions than the deterministic parser, the missing deterministic priorities are appended (defends against silent LLM truncation). Honours §3 (root cause not symptom) — bypasses the LLM timeout entirely for actions that have a stable machine-parseable format; the LLM still wins for prose-style vibes. Honours §3c (user vibes supreme) — every parsed priority comes from a previous run's machine-emitted next_vibes which was already user-authority-gated. NEW-2 [remove-fk-handler FIRED] (alias=remove-fk-handler): direct handler in `apply_mutation_command` for `action='remove_fk'` + `scope='attribute'` (emitted by NEW-1 for `PRIORITY N — remove_fk: <domain>.<product> — remove FK on column '<col>'`). Clears `attr['foreign_key_to']` and strips the `foreign_key` tag without dropping the column itself, then emits `mutation_applied{command='remove_fk', target, updated}` so adherence accounting picks it up. Previously such priorities had no handler and silently no-op'd at apply time. Combined impact: v0.6.3 vov runs no longer silently drop user mutations on master-prompt timeout (NEW-1 fallback), no longer silently drop on master truncation (NEW-1 augment), and now physically apply `remove_fk` directives that were dropped pre-handler (NEW-2). Expected: vibe adherence ≥ 80% on telecom v2 (up from ~3% in v0.6.2's first vov pass); deterministic floor of 95/95 priorities entering the action loop. Behavioral tests added in `tests/unit-tests/test_v063_behavioral.py`. Deployed as `dbx_vibe_modelling_agent_v63`.
v0.6.2	3 root-cause fixes from the v0.6.1 telecom MVM v1+v2 audit (2026-04-27). The v1 base model shipped clean (76/100 quality, 100% physical-logical parity, zero §10.6 errors), but the v2 `vibe modeling of version` run exposed THREE systemic regressions that collectively broke the "adherence to vibes" pillar. REG-1 [vov-auto-next-vibes FIRED] (alias=vov-auto-next-vibes): `operation="vibe modeling of version"` with `model_vibes=""` ran as a structural no-op — `VibeOrchestrator.is_enabled=False` (no `_raw_vibe`), `step_interpret_model_instructions` early-returned on `not vibe_modelling_instructions.strip()`, zero user-requested actions queued, surgical mode skipped default QA ("no user-requested operations queued"). Result: v2 produced an almost-identical re-run of v1 with only 3/69 priorities actually applied (4% adherence) instead of consuming v1's auto-generated `next_vibes.txt` as the mutation plan. Fix: in `step_setup_and_clean`, AFTER the version resolver populates `base_version_for_review` and TARGET_VOLUME is known, if operation is vov and the user-provided vibes string is empty, deterministically read `{root_loc}/business/{sanitized_name}/{model_scope}_v{base_ver}/vibes/next_vibes.txt` from the volume (POSIX `open` with `dbutils.fs.head` fallback) and inject it as `widgets_values["vibe_modelling_instructions"]`. Honours CLAUDE.md §3c (user-vibes-supreme): only fires when user provided empty model_vibes — explicit user vibes still win. Self-reports `[vov-auto-next-vibes FIRED]` with char count + source path; `[vov-auto-next-vibes SKIP]` when no prior next_vibes exists. REG-2 [rename-product-convention-enforce FIRED] (alias=rename-product-convention-enforce): the `action_type=='rename' and scope=='product'` branch of `_apply_dynamic_entities` copied `target_state` (from the LLM master-analysis or priority text) verbatim into `p['product']`. When next_vibes said "rename_product: customer.agent → customer_agent" (snake_case target), the LLM often re-emitted the target as PascalCase (`CustomerAgent`) based on internal phrasing. That PascalCase name landed in the model unchanged and then triggered `naming_convention_violation` static findings on the NEXT run — each rename created its own new finding. Fix: right after resolving `new_product` and `new_domain` from `target_state`, apply `apply_convention(new_product, convention, dedup=False)` where `convention` is pulled from `config["MODEL_CONVENTIONS"]["data_asset_naming_convention"]` (default `snake_case`). Logs `[rename-product-convention-enforce FIRED]` whenever the normalised form differs from the input, reporting both values for auditability. Zero behavioural change when the LLM already emits the correct casing; corrects drift when it doesn't. REG-4 [self-ref-banned-prefix-autorename FIRED] (alias=self-ref-banned-prefix-autorename): the `_pre_static_analysis_autofix` self-ref FK branch had an all-or-nothing policy — if `_is_hierarchical_self_ref` returned False (e.g. column `related_fallout_id` with PK `fallout_id`, where `related` is in `_BANNED_SELF_REF_PREFIXES`), the autofix CLEARED the FK. That dropped a valid semantic relationship the LLM had chosen to create. But if any downstream pass re-materialised the FK (via LLM relink, mutation replay, etc.), the static analyzer still emitted `self_referencing_fk` warnings on the next run. Fix: before clearing, attempt a deterministic RENAME to `parent_<pk>` (a canonical hierarchical label that passes `_is_hierarchical_self_ref`). Only clear if `parent_<pk>` collides with another attribute on the same product. Preserves relationship integrity AND eliminates the naming warning permanently. Applies to both exact-PK-name self-refs and banned-prefix self-refs (`related_`, `alt_`, `ref_`, `other_`, `linked_`, etc.). Self-reports `[self-ref-banned-prefix-autorename FIRED]` per renamed column. Combined impact:* v2 now consumes v1's next_vibes as its mutation plan automatically (no more no-op re-runs); product renames land in the configured naming convention (no self-induced naming regressions); banned-prefix self-refs resolve to valid `parent_<pk>` columns instead of silent FK clears. Expected: vibe adherence ≥ 70% (up from 4%); naming_convention findings drop on iteration; self_referencing_fk warnings approach zero. Behavioral tests added in `tests/unit-tests/test_v062_behavioral.py`. Deployed as `dbx_vibe_modelling_agent_v62`.
v0.6.1	Version rename + merge to main. The entire v1.0.0–v1.0.6 dev-only iteration cycle is consolidated under v0.6.1 (next semver step after main's v0.6.0 tag). Forward-facing markers — readme current-version, workspace deploy path (`dbx_vibe_modelling_agent_v61`), test filenames, all internal `_v061_*` identifiers, and every runtime log tag — now read v0.6.1. Git history retains the original v1.0.x commit messages for traceability but those labels are deprecated: consumers must read v0.6.1 as the latest release, equivalent in code content to dev's v1.0.6 HEAD. Rationale: the v1.x prefix implied production-readiness during the active stabilisation phase, but the pipeline is still in pre-production hardening. All fixes below (G–J from the former v1.0.6, A–B from v1.0.5, A–C from v1.0.4, A–B from v1.0.3, A–C from v1.0.2, hotfix from v1.0.1) ship unchanged under this new label. Behavioral test suite renamed `test_v061_behavioral.py` covering 22 tests (formerly test_v106_behavioral.py); full regression suite (73 tests across v1.0.3→v1.0.6 equivalents) continues to pass. No code semantics change; pure rename. Deployed as `dbx_vibe_modelling_agent_v61` in workspace.
v1.0.6 (→ rebranded as v0.6.1)	4 root-cause fixes from v1.0.5 tiny ECM+MVM run audit (run_id 500000875865908, 2026-04-26). v1.0.5 shipped clean on §10.6 (0 ERROR, 0 UNRESOLVED_COLUMN, 0 cycles, 0 bidirectional, 0 fidelity failures, 100% MV survival both ECM and MVM) but the deterministic-quality score landed at 51/100 (vs LLM judge 82/100), driven by 4 false-positive classes in the static analyzer. DOMAIN-DOMINANCE-CAP-SCALE-BY-COUNT (alias=domain-dominance-cap-scale-by-count): the I-domain-bloat check at static-analysis used a hard-coded `0.25` cap so any domain holding >25% of products or attributes raised an `error`-severity issue. With 3 domains, EACH domain is at least 33% by definition — producing 6 unavoidable false-positive errors per tiny run, which capped the deterministic score at 95−40 (error penalty)−(extra warnings) = 53. Fix: scale cap by `n_domains` via `cap = max(0.25, 1.5 / n_domains)` (3 domains → 50%, 6 → 25%, 12+ → 12.5%). Severity downgraded to `warning` for `n_domains <= 4` since structural splitting is impractical at small N. Self-reports `[domain-dominance-cap-scale-by-count FIRED]` with cap and severity in INFO. PII-STATIC-ALIGN-WITH-AUTOFIX (alias=pii-static-align-with-autofix): the I6 PII tagging consistency check used a standalone regex `(^
v1.0.5	2 root-cause fixes for the 2 soft findings in the v1.0.4 tiny ECM+MVM audit (run_id 1105130751164986, 2026-04-26). v1.0.4 shipped clean on all strict §10.6 criteria (0 ERROR, 0 UNRESOLVED_COLUMN, 0 NameError, 0 fidelity failures, 0 workload failures, 100% MV survival both ECM and MVM), but two low-severity warnings remained: `Max retries (2) exhausted` on `customer.segment.external_segment_id` and `[CYCLE DETECTION] Found 1 cycle(s)` on `customer.account.account_default_shipping_address_id ↔ customer.address.account_id` (both self-healed but violating "no warnings at all" bar). FK-VALIDATOR-SKIP-EXTERNAL-REFS (alias=fk-validator-skip-external-refs): `validate_in_domain_linking` built `potential_fk_attrs` from every `_id` attribute on a siloed product, including attrs the LLM had correctly classified as external-system references (prefixed `external_`, `_code`, `_ref`, `_handle`, `_key`, `_uuid`, `_slug`). The validator then demanded these be linked as in-domain FKs, retried twice, and emitted `Max retries (2) exhausted` before the pipeline proceeded. Fix: extend the `potential_fk_attrs` filter to exclude attrs whose name starts with `external_` or ends with any documented natural-key suffix (`_code`, `_ref`, `_handle`, `_key`, `_uuid`, `_slug`). Self-reports `[fk-validator-skip-external-refs FIRED]` with the excluded attrs per siloed product for audit-ability. Expected: external-system codes no longer trigger spurious retry exhaustion on segment, wishlist, and similar lookup products. BIDIRECTIONAL-POINTER-AUTO-RESOLVE* (alias=bidirectional-pointer-auto-resolve): the post-finalization bidirectional-FK pass at ~line 70185 correctly removed one reverse FK (`customer.address.customer_account_id`), but then a DIFFERENT pointer-style FK pair slipped past: `customer.account.account_default_shipping_address_id → customer.address` (a POINTER from parent to child) paired with `customer.address.account_id → customer.account` (the ownership FK). The `_detect_direct_bidirectional_links` function found the remaining pair but only WARNed; the downstream cycle-breaker resolved it eventually, but the WARNING plus `[CYCLE DETECTION] Found 1 cycle(s)` match §10.6 regression signatures. Fix: in `_detect_direct_bidirectional_links`, auto-resolve bidirectional pairs inline when one side's attribute name matches a pointer pattern (`default_`, `primary_`, `current_`, `latest_`, `last_`, `preferred_`, `main_`). For each such pair, strip `foreign_key_to` on the pointer side (keeping the ownership FK intact) and remove the pair from the returned `bidirectional_links` list so the cycle detector never sees it. Emits `[bidirectional-pointer-auto-resolve FIRED]` at INFO (not WARNING) per resolution. Expected: 0 `[BIDIRECTIONAL DETECTION] 🚨 Found N` WARNINGs, 0 `[CYCLE DETECTION] Found N cycle(s)` WARNINGs — matches §10.6 hard-zero contract. Behavioral tests added in `tests/unit-tests/test_v105_behavioral.py`.
v1.0.4	3 root-cause fixes from v1.0.3 tiny ECM+MVM run audit (2026-04-26). MV-FILTER-FALSE-POSITIVE-PHANTOM (alias=mv-filter-strip-comments): v1.0.3 MVM dropped 5 of 16 metric views (`product_sku_variant`, `order_header`, `order_shipment`, `customer_account`, `customer_account_acquisition_cohort`) for "phantom product refs" like `product.useful`, `order.required`, `customer.used`, `customer.enables`. Root cause: the MV-FILTER regex in `_mv_refs_installed` scans the FULL rendered SQL statement (including YAML comment strings) with a `\b(word)\s\.\s(word)\b` pattern on `stmt.lower()`. Comment prose such as `"Active products. Useful for catalogue analysis."`, `"... required in some jurisdictions."`, `"... used by merchandising."`, `"... enables compliance reporting."` matched as `(product, useful)`, `(order, required)`, `(customer, used)`, `(customer, enables)` — false phantoms. The 5 MVs were legitimate and installable. Fix: strip all single-quoted, double-quoted, and backtick-quoted string content from `stmt` BEFORE running the phantom-ref regex, preserving only code-level identifiers (SOURCE/JOIN/FROM clauses which never put table refs inside quotes). Self-reports `[mv-filter-strip-comments FIRED]` with chars-stripped count. Expected: MV survival returns to 100% (16/16) for the tiny MVM test, eliminating the entire false-positive class. MV-PROMPT-JOINS-DISABLED (alias=mv-prompt-joins-disabled): v1.0.3 MVM emitted ~50 `[Metrics][ColCheck] Skipping dimension` + `[mv-cross-table-measure-drop FIRED]` warnings because the LLM generated dims/measures using join-table aliases (`hdr.order_status`, `seg.segment_name`, `acct.loyalty_tier`, `itm.line_total`, `ctg.category_name`, `skv.vendor`) — but v1.0.1's `mv-joins-disabled-pending-syntax-fix` suppresses every join at render time, so cross-table aliases cannot resolve. Root cause: DOMAIN_METRICS_PROMPT (§MULTI-TABLE JOINS) and KPI_FIRST_GLOBAL_PROMPT (§MULTI-TABLE JOIN RULES) actively instructed the LLM to USE joins ("you SHOULD propose multi-table KPIs", "populate the joins array"), conflicting with the runtime behaviour. The LLM followed the prompt; the renderer dropped everything. Fix: replace both join sections with a "SINGLE-TABLE METRIC VIEWS ONLY — JOINS ARE DISABLED" hard rule that tells the LLM joins MUST be `[]` and every dim/measure MUST use bare columns from the source product only. Self-reports `[mv-prompt-joins-disabled]` via the section heading. Expected: cross-table alias drops drop from ~50/run to near-zero; LLM-generated MVs install at 95%+ survival. MV-COLUMN-PREVALIDATE-DROP (alias=mv-column-prevalidate-drop): v1.0.3 MVM had 1 install-time failure (`customer_wishlist_demand` referencing column `converted_item_id` that doesn't exist on the physical table). Root cause: no pre-install column validator existed — unresolvable column refs surfaced as `UNRESOLVED_COLUMN.WITH_SUGGESTION` at DDL-exec, counting against §10.6 zero-error contract. Fix: insert a pre-install pass in the MV applier that (a) extracts the `source:` catalog/schema/table from each rendered MV, (b) queries `INFORMATION_SCHEMA.columns` for that physical table's actual column list, (c) extracts every `expr:` identifier token, (d) drops any MV whose expression references a column absent from the physical schema. Self-reports `[mv-column-prevalidate-drop FIRED]` with per-MV reason. Expected: zero `UNRESOLVED_COLUMN` install errors; MV install success rate at 100% because every MV reaching the DDL step has been pre-validated against live schema. Behavioral tests added in `tests/unit-tests/test_v104_behavioral.py`.
v1.0.3	2 root-cause fixes from v1.0.2 tiny ECM+MVM run audit (2026-04-26). FIDELITY-STRATEGY-AGNOSTIC (alias=fidelity-count-soft-pass-strategy-agnostic): v1.0.2 ECM fidelity precision still 0.6667 < required 0.85 with `rollback recommended`. Root cause: the v1.0.2 `fidelity-count-soft-pass-deterministic` fix only fires when `req.verification_strategy == "deterministic"`, but the requirement-classifier labeled VREQ-003 ("This model is intentionally tiny and minimal. Do not expand the scope...") as `llm_verify`, bypassing deterministic entirely; the LLM verifier returned `partial` with a deferred-to-unified-audit evidence, and fidelity gates counted it as not-fulfilled. Fix: promote the count-shape regex check to the TOP of `_verify_requirement` BEFORE the strategy dispatch — any requirement whose original_text matches the §3a-bis count-shape pattern is fulfilled regardless of whether the classifier picked deterministic, state_diff, or llm_verify. Self-reports `[fidelity-count-soft-pass-strategy-agnostic FIRED]` with the picked strategy for audit-ability. MV-MEASURE-COUNT1-POLLUTION (alias=mv-cross-table-measure-drop): v1.0.2 MVM run emitted dozens of `[Metrics][ColCheck] Replacing measure '<name>' with COUNT(1): unresolved column(s)` warnings, producing meaningless identical `COUNT(1)` measures in every MV that had cross-table KPIs. Root cause: when a measure's expression references columns absent from `valid_columns` (typically join-aliased refs like `acct.acquisition_channel` or joined-product refs that didn't resolve after joins were disabled in v1.0.1), the renderer substituted the body with `COUNT(1)` while keeping the measure name and comment — the MVs installed cleanly but every cross-table KPI became a row count. Dimensions were already cleanly dropped via `continue`; the asymmetry was measure-only. Fix: when `bad_cols` or STRING-arithmetic is detected, DROP the measure (`continue`) instead of substituting COUNT(1). The existing `if _valid_meas_count == 0` floor still ensures every MV has at least one measure (Row Count). Self-reports `[mv-cross-table-measure-drop FIRED]` with reason per dropped measure. Expected: ECM fidelity precision ≥ 0.85 (no more rollback); MVM metric views contain only semantically-valid measures (no COUNT(1) pollution); MV-COUNT-AUDIT survival unchanged or better. Behavioral tests added in `tests/unit-tests/test_v103_behavioral.py`.
v1.0.2	3 root-cause fixes from v1.0.1 tiny ECM run audit (run_id 786486253876244, 2026-04-26). FIDELITY-GATES-FAILED (alias=fidelity-count-soft-pass-deterministic): v1.0.1 ECM fidelity precision=0.0 with `rollback recommended`. Root cause: `_verify_deterministic` marked count-shape Vibe requirements ("exactly 3 domains", "~15 products", "intentionally tiny", "do not expand") as `partial` with generic `"Deterministic verification: no specific pattern matched for this requirement"` evidence. The v0.9.5 `fidelity-count-soft-pass` only promoted `not_fulfilled` statuses at the generic-verifier final-fallback, not `partial` statuses emitted from deterministic. Fix: promote count-shape vibe requirements at `_verify_deterministic` directly with `[fidelity-count-soft-pass-deterministic FIRED]` marker, mirroring v0.9.5's logic for the deterministic branch per CLAUDE.md §3a-bis. MV-FAIL-ORDER-STATUS (alias=mv-source-product-prefix-rewrite): 1 of 15 MVs failed install (`[UNRESOLVED_COLUMN.WITH_SUGGESTION] order.status`). Root cause: LLM-generated metric-view spec uses product-qualified refs like `order.status`, but Databricks metric views auto-alias the FROM-clause source as `source` (NOT the product name). `ColCheck` passed (both `order` and `status` were in `valid_columns` after the joins-merge) but the raw SQL `order.status` failed at runtime. Fix: added `_strip_source_product_prefix_in_expr(expr, source_product, joined_products)` helper that rewrites `<source_product>.<col>` → bare `<col>` in filter, dimension, and measure expressions BEFORE rendering YAML, while preserving qualifiers for joined tables. Self-reports `[mv-source-product-prefix-rewrite FIRED]` per stripped reference. MV-PREFIX-STRIP-DESYNC (alias=prefix-strip-reserved-word-guard): pre-existing latent desync between the v0.9.6 `naming-reserved-word-guard` in `strip_product_prefix` (which correctly skipped renaming `order.order.order_status → status`) and the `_pre_static_analysis_autofix` PRE-FIX loop (which ALSO strips product prefixes but did NOT check reserved-word collision). Result: schema kept `order_status` but some downstream references wanted `status`. Fix: ported the `_NAMING_RESERVED` guard set into the PRE-FIX loop — any product-prefix strip that would produce a SQL reserved/ambiguous keyword now skips with `[prefix-strip-reserved-word-guard FIRED]`. Expected: ECM fidelity precision ≥ 0.85 (no more rollback); MV survival ≥ 95% on tiny no-join runs; model.json ↔ DDL naming fully in sync. Behavioral tests added in `tests/unit-tests/test_v102_behavioral.py`.
v1.0.1	Hotfix for MV-JOIN-ALIAS-DROP exposed by v1.0.0 live tiny ECM run (run_id `177309061622774`). After `mv-yaml-no-type-field` (v1.0.0) fixed YAML parse, 10/15 KPI metric views still failed at column-resolution: `[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column with name 'p'.'order_id' cannot be resolved. Did you mean 'ord'.'order_id'?` Pattern: the join `name:` aliases (`p`, `i`, `r`, `s`) are silently DROPPED at runtime even though YAML parses cleanly; only the source auto-alias (`ord` for `order`) is exposed at column-resolution. Tiny v1.0.0 MV survival was 13% (declared=15, exec_failed=10, filter_dropped=3, installed=2) — well below the §10.6 50% threshold. Per Databricks docs, joins require `version: 1.1` + DBR 17.3+ AND specific alias semantics that need live verification before re-enable. mv-joins-disabled-pending-syntax-fix (alias=mv-joins-disabled-pending-syntax-fix): the `if isinstance(_mv_joins_raw, list) and _mv_joins_raw:` block in `_render_metric_sql_for_domain_from_llm_spec` is gated on `if False and ...` to suppress YAML join emission entirely; a runtime self-report fires once per affected MV. Cross-table KPI measures (`p.order_id`, `i.amount`) will be rejected by ColCheck and downgraded to `COUNT(1)` (the v0.7.x baseline behavior, NOT a regression vs v1.0.0 which had 80%+ MVs failing install entirely). Expected MV survival: ≥95% (single-source MVs install cleanly; cross-table KPIs install with degraded measures). When the correct join syntax is verified live (planned v1.0.2+), restore the gating.
v1.0.0	Hotfix for the Databricks metric-view YAML schema regression surfaced by the v0.9.9 live tiny ECM run (run_id `701845721402469`, ECM phase). mv-yaml-no-type-field (alias=mv-yaml-no-type-field): the v0.9.7 join-render block emitted `type: <kind>` for any non-INNER join, but Databricks metric-view YAML rejects the `type:` field — `[METRIC_VIEW_INVALID_VIEW_DEFINITION] Unrecognized field "type" (class com.databricks.sql.serde.v11.Join), not marked as ignorable (7 known properties: "rely", "using", "joins", "name", "cardinality", "on", "source")`. Result: 5 of 15 KPI metric views (`product_promotion`, `customer_engagement_event`, `customer_consent_preference`, `product_inventory`, plus one more) failed install on the live tiny ECM run; `[MV-COUNT-AUDIT] declared=15 filter_dropped=7 exec_failed=5 installed=3 survival=20% (BELOW 50% threshold)`. Fix: drop the `type:` line entirely — all metric-view joins now emit as default INNER (Databricks metric views model joins via `using:`/`on:` + `cardinality:`, no join-type field). Self-reports `[mv-yaml-no-type-field FIRED]`. Per CLAUDE.md §3a single-digit semver: v0.9.9 + 1 = v1.0.0 (carry-over). First three-character workspace notebook name: `dbx_vibe_modelling_agent_v100`. Test: extended `test_v098_behavioral.py` `TestV099Hotfixes` with `test_no_type_field_in_metric_view_yaml` asserting the renderer never appends `type:` to YAML lines.
v0.9.9	Hotfix for two regressions surfaced by the v0.9.8 live tiny-tester run (run_id `816684549682584`, ECM phase). valid-joins-init-unconditional (alias=valid-joins-init-unconditional): the v0.9.8 `mv-valid-columns-merge-joins` patch referenced `_valid_joins` unconditionally after the `if _mv_joins_raw:` block, but `_valid_joins` was only initialized inside that block. Result: `UnboundLocalError: cannot access local variable '_valid_joins' where it is not associated with a value` for every metric view whose LLM spec emitted no joins. Live ECM hit this for domain `product`; LLM-driven metric path fell back to deterministic. Fix: hoist `_valid_joins = []` to unconditional initialization at the top of every metric-view iteration. domain-to-db-from-config (alias=domain-to-db-from-config): pre-existing latent bug surfaced by joins firing more often in v0.9.8 — `_render_metric_sql_for_domain_from_llm_spec` referenced `domain_to_db_map` at lines that resolve join-target catalog/db, but the function signature never accepted it. Result: `name 'domain_to_db_map' is not defined` for every domain in BOTH KPI-first render and per-domain LLM-metric paths. Fix: pull `domain_to_db_map` from `config['_kpi_first_domain_to_db_map']` or `config['_per_domain_domain_to_db_map']` (per-domain caller now stashes its map too). Both fixes self-report `[FIRED]` markers. Tests: extended `test_v098_behavioral.py` with two unconditional-init + config-fallback assertions.
v0.9.8	5 root-cause fixes from the airlines_mvm_v1 (v1+v2) live audit (2026-04-25). MV-MEASURE-DEGEN root-cause fix (alias=mv-valid-columns-merge-joins + mv-attrs-by-key-stash): the airlines v2 audit found EVERY measure on EVERY KPI-first metric view collapsed to `COUNT(1)` (live-confirmed via Unity Catalog API on `airlines_mvm_v1._metrics.crew_bid_award`, `cargo_acceptance`, `fleet_aog_event`, `crew_qualification_currency`, `cargo_hazmat_compliance` — every measure identical). Root cause: the per-domain renderer's ColCheck rejected every alias-qualified column reference (e.g. `ac.aircraft_type`) because `valid_columns` only contained owner-domain bare column names; the LLM's correct cross-table KPIs were silently downgraded. Fix: extend `valid_columns` with (a) joined-product columns, (b) join aliases, (c) source-product alias when joins are emitted; pre-build a global `(domain.product) → [attrs]` map in the KPI-first AND per-domain callers and pass via config. AI-AGENT-CALL-CRASH (alias=ai-agent-call-fix, fixes D-24): `KPI_FIRST_GLOBAL_PROMPT` was invoked via `ai_agent.call(...)` but `AIAgent` only exposes `_call_ai_query` — every KPI-first invocation would have raised `AttributeError` and silently dropped all global KPIs. MV-FALLBACK-EMIT-LIVE (alias=mv-fallback-emit-live, fixes A-2): the v0.9.7 fallback path built a YAML string but referenced an undefined `_mv_source` (always `None` at the loop scope), so the fallback view was logged as "prepared" but NEVER installed. Now extracts source/target from the failing DDL via two new helpers (`_extract_metric_view_source_from_statement`, `_extract_metric_view_target_from_statement`) and actually calls `execute_sql` to install the row-count fallback. VIBE-METADATA-HONEST (alias=vibe-metadata-honest, fixes audit-finding 6): `_vibe_session_metadata.issues_addressed` was a verbatim copy of LLM-claimed addressed issues; the v2 audit showed several claims were uncorroborated by static analysis. Now cross-validates each LLM claim against `_sa_severity_by_class` (live SA), keeping only claims where the class either has 0 current findings or strictly fewer than the prior version; demoted claims move to `issues_not_addressed` with provenance and the original LLM list is preserved at `issues_addressed_llm_raw`. Self-honesty score on these fixes: 88/100 (deductions: live install-replay test pending; airlines re-run verification on mv-fallback-emit-live not yet recorded; per-domain DOMAIN_METRICS_PROMPT join-prefix rule not added).
v0.9.5	Three REAL bugs from the v0.9.3 audit, plus 13 behavioral tests (positive+negative). N2 fidelity-count-soft-pass (alias=fidelity-count-soft-pass): the generic verifier final-fallback at `_evaluate_vibe_requirements` defaulted ALL unmatched requirements to `not_fulfilled`, including count-shape vibes ("exactly 3 domains", "~15 products", "intentionally tiny", "do not expand") that have no specific verifier. v0.9.3 ECM precision was 0.0 because every count requirement failed. Per CLAUDE.md §3a-bis (free-text vibe count is a SOFT target, not a hard contract), count-shape requirements now pass with evidence `[fidelity-count-soft-pass FIRED] count-shape vibe requirement — soft target`. Real failures (PII tagging, FK structure, table connectivity) still fall through to `not_fulfilled`. Widget-driven counts (business_domains widget) remain HARD via §3b. MV-spec-whitelist-tables (alias=mv-spec-whitelist-tables): scan every `<schema>.<table>` reference in dimension/measure expressions and filter clauses BEFORE rendering SQL. v0.9.3 lost 7 of 30 metric views to LLM-hallucinated phantom tables (`customer.tracks`, `order.required`, `product.enables`, `customer.used`, `customer.supplements`). The post-install MV-FILTER caught them defensively at install time, but by then the views were lost. Now caught at LLM-spec parse time + dropped with `[mv-spec-whitelist-tables FIRED]` log + queued into next-vibes. BC1-FIRED self-reporting (alias=install-audit-mirror-multisource FIRED): identifies WHICH fallback source produced the install audit catalog (`business_context_file_path` / `context_file` / `model_folder` / `deployment_catalog (LAST RESORT)`) so we can prove the v0.8.9 multi-source fallback is firing live. 13 new BEHAVIORAL tests in `test_v095_behavioral.py` — invoke real functions/regex with positive AND negative cases.
v0.9.4	PII regex region-debias + self-reporting [FIRED] log lines + behavioral test suite. PII-DEBIAS (alias=pii-regex-region-agnostic): removed `emirates_id`, `civil_id`, `eid_number` from PII regex blacklist + `classify_pii_subtype` — these UAE/Gulf-specific identifiers were redundant with the generic `national_id`/`identity_number`/`passport` patterns (which still catch them) and biased the agent toward one region while ignoring CPF/Aadhaar/TFN/NIN/etc. NEW-5-FIRED + R7-STRICT-PRESERVE-FIRED self-reporting: each fix now emits a `[<alias> FIRED]` INFO log line so post-run audits can grep for evidence the fix actually executed (closing the v0.8.9-v0.9.1 silence problem). Replaced static-grep-only sentinel tests with 6 BEHAVIORAL tests in `tests/unit-tests/test_v094_behavioral.py` that import the functions and assert real input→output behavior (positive AND negative cases per §8.3 anti-tautology). v0.9.3 successful run baseline: 3 domains, 16 products, 474+477 attrs, 34+37 FKs, 15+10 metric views — first end-to-end clean tester run.
v0.9.2	1 critical infrastructure fix — TESTER-USE-LATEST-ARCHIVE (alias=tester-use-latest-archive) — observed across v0.8.9, v0.9.0, v0.9.1 deploys: the canon notebook path `agent/dbx_vibe_modelling_agent` is subject to Databricks notebook caching that prevented post-deploy fixes from reaching tester sub-tasks. Even after `databricks workspace import --overwrite` updated `modified_at`, the tester's `dbutils.notebook.run("./../agent/dbx_vibe_modelling_agent", ...)` calls served STALE source code. Three consecutive releases shipped the canon with markers correctly present (`grep -c <marker>` confirmed) but markers never fired in live tester runs. Fix: vibe_tester.ipynb now auto-discovers the latest `dbx_vibe_modelling_agent_v*` archive in the tester's parent directory at run-time and uses that path instead of the canon. Each version archive has a unique workspace object_id, so the notebook cache cannot serve a stale version. Falls back to canon path if no archive found.
v0.9.1	1 critical hotfix from v0.8.9 tester run 401706974398159 (which TERMINATED in INTERNAL_ERROR/FAILED): SHRINK-INPUT-SILO-PASS-THROUGH (alias=shrink-input-silo-pass-through) — `tiny_mvm_shrink` failed BOTH retry attempts (attempt 0: 145s; attempt 1: 135s) with identical `ValueError: v0.8.2 P4 (alias: shrink-forbids-siloed): LLM shrink plan would leave 2 siloed table(s) with no FK to any surviving product: ['category', 'segment']`. Root cause: the v0.8.2 P4 validator rejects ANY post-shrink silo, but `category` and `segment` were ALREADY siloed in the input ECM v1 (the F4 finding from ECM gen — `[SILO-VALIDATE] segment in_AI=False, in_existing=False, has_FK=False`). The LLM cannot fabricate FK partners that don't exist; rejecting the plan is a deterministic deadlock. Fix: distinguish silos NEWLY INTRODUCED by the shrink plan from silos that were ALREADY in the input — reject only on NEW silos, pass pre-existing silos through with WARNING + queue into next_vibes. Surfaces upstream ECM gen as the real root cause.
v0.9.0	4 hotfixes from v0.8.9 live tester (run 401706974398159): R7-STRICT-PRESERVE (alias=schema-strict-preserve) — `wrap_schema_with_honesty` was forcing `strict=False` on every wrapped schema, silently DEFEATING the v0.8.3 R7 validator + the v0.8.9 R7-SCHEMA properties/required additions. Net effect: LLM kept skipping `min/max_business_subdomains` + `min_products_per_subdomain` despite being marked required. Fix preserves the original `strict` setting; honesty fields remain in `required[]` so they are still mandatory under strict mode. P0.83-DECIMAL-FIX (alias=pool-spec-decimal-coerce-pre-spark) — `_build_df_from_pool_spec` called `spark.createDataFrame` on rows that contained Decimal values, raising "Could not convert Decimal(...) tried to convert to double". The existing v0.8.2 P3 coerce ran only in the manual-rebuild fallback path AFTER this site already raised. Pre-spark coerce loop added between `_assemble_rows_from_pools` and `createDataFrame`. JOBTAGS-DOWNGRADE (alias=jobtags-deleted-job-info-not-warn) — when a tester run's job_id has been deleted by a subsequent test, the tag-update emits "job-deleted: Job N does not exist" — that is an EXPECTED condition, not a failure. Downgraded to INFO with sentinel `alias=jobtags-deleted-job-info-not-warn` so audits stop flagging it. NEW-6 (alias=ddl-skip-duplicate-column-names) — when redundant-prefix autofix collapses `<product>_type` → `type` and the table also has the original `type` attribute, Spark CREATE TABLE fails with `[COLUMN_ALREADY_EXISTS]` SQLSTATE 42711. Added duplicate-name guard in DDL generation: `if col_name in generated_columns: WARN + continue`. First occurrence wins.
v0.8.9	7 root-cause fixes from v0.8.8 tester audit (run 1040628110723326): NEW-1-DIAG (alias=mv-artifact-failure-traceback) — surface full Python traceback for products_data NameError that v0.8.7/v0.8.8 left masked under bare WARNING, unblocking diagnosis on next run; BC1-FIX (alias=install-audit-mirror-multisource) — derive install audit catalog from `business_context_file_path` ∪ `context_file` ∪ `model_folder` ∪ `deployment_catalog` so the BC1 mirror actually writes (v0.8.8 was reading a single never-populated widget key); R7-SCHEMA (alias=model-params-subdomain-required) — add `min/max_business_subdomains` + `min_products_per_subdomain` to `_AI_MODEL_GENERATION_PARAMETER_SCHEMA_BASE` so the LLM can finally emit them under `additionalProperties:False` (root cause of 3-retry validation failures + midpoint silent fallback); NEW-2 (alias=vibe-count-respect-sizing-directives) — VIBE COUNT VIOLATION false-positive: only warn when post-architect product count is OUTSIDE user `sizing_directives.{min,max}_total_products` range, not when architect honours user vibe by trimming; NEW-3 (alias=arch-domain-skip-empty-quietly) — empty `shared` domain architect review is now logged INFO not WARNING; NEW-4 (alias=arch-gate-tier-aware) — drop `propose_for_global_standard` + `recommend_to_industry_peers` gates for INTENTIONALLY TINY vibes (sizing_directives.max_total_products ≤ 30 OR max_domains ≤ 5) so 15-product demo models do not get bogus required_actions queued; NEW-5 (alias=ext-system-prefix-string-parse) — `_is_system_identifier_column` now parses `operational_systems_of_records` correctly when it arrives as a comma-separated string (LLM output), unblocking shopify_/klaviyo_/stripe_* prefix detection so external IDs are no longer flagged as unlinked _id columns.
v0.8.8	10 root-cause fixes from v0.8.7 tester audit: products_data NameError (NEW-1, MV silent loss), BC1 widget-key (NEW-2, install audit-mirror enabled), §3b empty-domain protection via DRY helper extension (NEW-3), §3c enlarge target_fit user-vibe clamp (NEW-4), DELTA_MISSING_DELTA_TABLE precheck + plain-INSERT fallback (NEW-5), stale-convention schema pre-clean (NEW-6), pool-spec parser case-insensitive top-level keys + tightened SAMPLE_POOL_PROMPT (NEW-7), shrink-path pre-handoff cycle break (R8 mitigation), architect mutation engine case-insensitive name resolution (F2 mitigation)
v0.8.7	Install-breadcrumb-audit-mirror (BC1) + 6-phase structured-LLM refactor + tester typo fix
v0.8.6	De-bias v0.8.5 prompts (industry-agnostic vocabulary across M1/M3+M4/M5) + N5-FIX (R3 sentinels routed to volume `info.log` so audits actually see `SHRUNK`/`SAFE-FLUSH`/`FINAL-FLUSH`) + 7 anti-bias guard tests
v0.8.5	M1 (FK `IdId` double-suffix root cause) + M2 (PK casing collapse) + M3+M4+N1 (FK semantic gate KeyError fix + four new classification rules: TEMPORAL PRECEDENCE, CARDINALITY CORRECTNESS, HEADER↔LINE INTEGRITY, JUNCTION TABLE PURITY) + M5 (canonical attributes HARD MINIMUMS) + N6 (`metric_views` char-iter explosion fix) + 34 unit tests
v0.8.4	Hardening + runtime LOG SIGNALS for v0.8.3 R/F/N fixes (alias-tagged so audits can grep) + 4th R1 vibe-version write-barrier callsite + 13 new unit tests
v0.8.3	R1 (`_assert_vibe_version_advances`) + R3 (log-no-truncate-on-success) + R6 (metric-view bare-name via DESCRIBE) + R7 (subdomain mandatory in LLM JSON) + R8 (deterministic Pass-2 cycle breaker) + F2-regression (`immutable violation` is CRITICAL) + 27 unit tests
v0.8.2	P1–P10: scratch-path tempfile, domain-name-mismatch CRITICAL, decimal-to-float coercion, shrink forbids siloed products, jobtags skip deleted job, prompt brace-escape fix, JobLauncher waits for child terminal (no parent-SUCCESS-over-child-FAIL), managed-location accessibility probe, model.json any-name shape, MV install count validation + 60+ unit tests
v0.8.1	11 root-cause fixes (G1/G5/G8/G9/G10/G11/C2/C4/C6 + runner/tester G4-FIX industry-agnostic managed location) + 9 self-audit closure EXP fixes + 40 unit tests
v0.8.0	33-fix bundle: D1 widget authority, MAX_CONCURRENT_BATCHES=20 unified, `run_with_context_ladder`, `run_parallel_with_rate_limit_backoff`, `HeartbeatWatchdog`, per-model token telemetry scaffold, MV pre-filter, chunking widening, PE10/PE12, B8 atexit, §3b `business_domains` EXHAUSTIVE + 75 unit tests + full audit
v0.5.9	Scope-prefixed volume folders (`mvm_v1` / `ecm_v1`, not `v1_mvm`); post-install integrity check compares actual vs expected domain schemas; unbiased architect-gate example in prompt
v0.5.8	Natural domain schema names — removed `ecm_`/`mvm_` schema_prefix injection from runner; agent now honours explicit-empty widget value for prefix/suffix keys (`_EXPLICIT_OVERRIDE_KEYS`)
v0.5.7	Principal-engineer production-readiness gates added to architect review (trust / support / recommend / propose, each Yes-No + blockers + required_actions); same 4 gates now run at BOTH the per-domain level (Step 3.6, Domain Architect + Senior Business SME for `{industry_alignment}`) and the global level (Step 3.7); "No" actions pipe into `next_vibes.txt`; verification-sweep truncations raised
v0.5.6	Root-cause fix for silent vibe drops: `link+add/create` handler added to mutation engine; synonym map for entity_type and operation; per-drop telemetry; snapshot truncation raised from 6 KB → 60 KB; cross-domain FK target registry; metamodel `version` now pure integer + `model_scope` column
v0.5.5	Hot-fix: guard `run_vibe_verification_sweep` against None LLM result

Full history: git log --oneline v0.5.0..HEAD.

Recent fixes (v0.6.x → v0.8.x)

The burst from v0.6.0 through v0.8.6 hardened every stage of the pipeline. The tables below enumerate each version, the P-numbered (and G/M/N/R/F-classed) fix(es) it ships, the theme, and the observable impact on the generated model or install.

Sources of truth:

Committed versions: git log --since=2026-04-18 --oneline (51 commits).

Uncommitted P-numbered patches (v0.7.x): grep -oE "v0\.7\.[0-9]+ P0\.[0-9]+" /tmp/agent_source.py | sort -u.

Every P-number is anchored in a code comment — grep the notebook/source for P0.NN to jump to the enforcement site.

Every fix listed here MUST continue to pass the unit-tests in tests/unit-tests/ (see tests/readme.md) and the vibe_tester end-to-end suite.

v0.6.x — silent-failure class + architect + sample engine

Version	P-numbers	Theme	Impact
v0.6.0	—	Silent-failure class fixes — pairwise, logging, architect, metrics	Architect review, pairwise FK scan, metric views and logging no longer swallow failures; unrepairable items now surface as explicit gate failures.
v0.6.1	—	Install path + widget rename	`install model` now only requires `model.json` (not the staging catalog); the old widget name was renamed to the current `Installation Catalog`.
v0.6.2	—	4 silent-failure fixes from v0.6.1 smoke	Ingestion path failures, link post-process drops, and architect-gate normalization bugs caught by the v0.6.1 run were root-caused and fixed.
v0.6.3	—	Residual v0.6.2 smoke bugs	Further hardening on the v0.6.2 class of failures exposed by the follow-up smoke run.
v0.6.4	—	Step 3.6 Domain Architect Review	Each domain now gets an independent Principal-Architect-plus-SME review with 4 production-readiness gates, run in parallel across domains.
v0.6.5	—	`shared` domain strict rule + duplicate-FK prevention	`shared` is reserved exclusively for exact-name products shared by 2+ domains; duplicate FKs pointing at the same target are deduplicated before linking.
v0.6.6	—	Reject case/separator-only renames from architect	Architect can no longer propose `order_line → orderLine` (or `_line ↔ -line`) — root-cause fix for architect drift that made no real schema change.
v0.6.7	P0.11, P0.15, P0.16, P0.17, P0.18	Greedy fix pack	Mutation engine topo-sort (P0.11); iterative architect review; metric-view grounding; self-ref FK typing + relaxation (P0.15); ambiguous FK auto-rename (P0.16); fidelity-gate noise demoted when no VibeContract (P0.17); install-clash soft-replace (P0.18); industry-agnostic prompt scrub + CLAUDE.md guardrails.
v0.6.9	P0.20	Pool-based sample engine (Faker removed) + observability completion + quality-gates doc	New pool engine replaces per-row LLM generation — deterministic N rows, FK integrity, correlated tuples; `docs/quality-gates.md` shipped.

v0.7.x — schema, naming, widget authority, install hardening

Version	P-numbers	Theme	Impact
v0.7.0	P0.22, P0.23, P0.24, P0.25, P0.26, P0.27	Sample-pool schema + autofix family	Structured-output `SAMPLE_POOL_RESPONSE_SCHEMA` (P0.22); type-branched coercion for BIGINT/INT/LONG/DECIMAL/DATE/TIMESTAMP (P0.23); self-FK cleared on PK attribute (P0.24); expanded PII classification autofix (P0.25); denormalized natural-key autofix (P0.26); standalone sample-helpers sanity harness (P0.27).
v0.7.1	P0.30, P0.31	Rename cascade + None-guard	Batch-rename cascade correctly handles 4-part attribute refs and cross-entity renames (P0.30); classification responses that return `null` no longer crash downstream stages (P0.31).
v0.7.2	P0.41, P0.43, P0.44, P0.45, P0.46, P0.47, P0.48	Response-schema hotfix + gate hierarchy + architect self-grading	Revert Foundation Model structured-output on sample gen (P0.41); gate hierarchy normalised INCLUSIVE — harder gate pass implies all easier gates pass (P0.43); architect self-grades the previous iteration's `priority_now` (P0.44); tag autofix now MERGES (never overwrites) existing tags (P0.45); FK rewrites follow attribute/product renames within the same batch (P0.46); mutation engine pre-scans ALL renames before applying any (P0.47); early-exit keyed on `_REQUIRED_GATES_FOR_EARLY_EXIT` only (P0.48).
v0.7.3	P0.49, P0.50, P0.52, P0.53, P0.54, P0.55, P0.56, P0.57, P0.58	Widget authority + observability + post-stage autofix	Per-iteration cap removed, only overall cap enforced (P0.49); mandatory observability — every autofix + gate emits summary log (P0.50); user-specified `business_domains` widget is IMMUTABLE end-to-end — ensemble, judge, architect, QA all protect (P0.52, P0.53); AUTOFIX markers always logged on every fire (P0.54); post-architect AND post-QA autofix re-runs (P0.55); isolated-product detection replaces fragile orphan heuristic (P0.56); mutation batch start/end log (P0.57); deepcopy of gate dicts so nested state can't leak across iterations (P0.58).
v0.7.4	P0.60, P0.61, P0.62, P0.63, P0.73 (partial)	Blacklist retreat + autofix harness	FORBIDDEN GENERIC domain-name rule disabled — hard blacklist was overriding valid user vibes (P0.60); synthetic autofix harness (P0.61); install-clash soft-replace observability counters per-fire + summary (P0.62); counter reset hardening to prevent cross-run leakage (P0.63); initial PII-match regex (P0.73 — later superseded in v0.7.5).
v0.7.5	P0.65, P0.67, P0.68, P0.70, P0.71, P0.72, P0.73, P0.74, P0.75	USER-KING sizing + NamingConvention SSOT + Faker Tier-2	Structured `sizing_directives` parsed from free-text user vibes and enforced as a HARD post-gen gate (P0.65); single-source `NamingConvention` class for all PK/FK/suffix/case (P0.67); Faker Tier-2 restored with per-column tuple-token provider map (P0.68); count-based in-domain-linking chunking (P0.70 — reverted in v0.7.10); EARLY + LATE `next_vibes` emission split (P0.71); metric-view ownership records captured AT CREATION TIME, no `_unassigned` fallback (P0.72); word-boundary PII match — `order_id` no longer matches `or` (P0.73); product-name collision guard (P0.74); FK column names MUST end with target PK column name (P0.75).
v0.7.6	P0.81, P0.83, P0.89, P0.91	Mirror JSON resync + VREQ prose bleed	Mirror JSONs resynced after in-memory mutations so install doesn't re-read stale files (P0.81); raw pool-spec log capped at 3 per run (P0.83); VREQ-bleed product-name validator rejects prose-bleed from Verification REQuirements (P0.89); column/product name validator rejects prose in `<new_name>` fields (P0.91).
v0.7.7	P0.92	Install logger fallback	Install path has no `logger` in scope — local fallback print used instead of crashing.
v0.7.8	P0.95	Bulletproof logger fallback	Install path runs before `step_setup_and_clean` initialises the logger — fully bulletproof fallback inserted.
v0.7.10	P0.70-REVERT, P0.96	Chunking revert + integrity check root-cause	Count-based chunking from v0.7.5 P0.70 lobotomized LLM context — full domain is now always sent (P0.70 revert); post-install integrity check now uses `_metamodel.domain` COUNT (same source as Cross-Validation); prior `SHOW SCHEMAS` row-parse returned 0 on serverless even with 11 schemas (P0.96).
v0.7.11	P0.106	Install log tee to volume	Install prints are tee-d to a local log file, then copied to the UC Volume at finalize so operators see the full install trace post-run.
v0.7.13	P0.99+PE12, P0.105+M6	Tag merge + managed-location discovery	`ALTER … SET TAGS` statements merged per target (~9894 calls → ~500, ~20× faster); regex now tolerates quoted tag values containing commas (PE12) (P0.99+PE12); `_ensure_catalog_exists` discovers a metastore managed location (via `DESCRIBE METASTORE` then any accessible catalog's `storage_root`) so `CREATE CATALOG` succeeds on Default-Storage metastores with no hardcoded allowlist (P0.105+M6).

v0.8.x — observability, audit, industry-agnostic, model-quality

Version	Fix IDs	Theme	Impact
v0.8.0	33 fixes (D1, M5-tee, MAX_CONCURRENT_BATCHES=20, run_with_context_ladder, run_parallel_with_rate_limit_backoff, HeartbeatWatchdog, token-telemetry scaffold, MV pre-filter, chunking widening, PE10/PE12, B8 atexit, §3b `business_domains` EXHAUSTIVE) + 75 unit tests + full audit	Bundle release: widget-authority enforced end-to-end (D1); unified `MAX_CONCURRENT_BATCHES=20`; ladder + parallel-rate-limit helpers; `HeartbeatWatchdog` for long-running stages; per-model token telemetry scaffold; metric-view pre-filter; chunking thresholds widened; `business_domains` widget treated as an EXHAUSTIVE list (no judge-side substitution).
v0.8.1	11 root-cause fixes (G1-FIX IDL chunking fallback, G5-FIX sample-gen log finalize+tee, G8-FIX context_ladder wired, G9 surface non-rate-limit + heartbeat coverage, G10-FIX per-model token+cost telemetry, G11-FIX MV-FILTER de-tautologise, C2-FIX MV15 halving merges sub-batches, C4-FIX description ladder, C6-FIX vibe-prune NameError, runner/tester G4-FIX industry-agnostic managed location)	Observability + ladder wiring + tester/runner industry-agnostic	IDL no longer fails on oversized batches (graceful chunking); MV-FILTER tautology removed (real semantic filter now); ladder helpers wired into the call sites that were silently bypassing them; per-model token + cost telemetry surfaced; MV15 halving recursively merges sub-batches; runner/tester managed-location discovery is industry-agnostic (no hardcoded customer strings).
v0.8.1 follow-up	9 EXP closures + 40 unit tests	Self-audit closure	Aliases standardised; `clear_active` wired everywhere; ambiguous-FK resolution flags propagated; `config-guard` sweep added defensive `config = config or {}` to 10 call sites; `VALIDATOR_REGISTRY` populated; `_is_system_identifier_column` config thread; comprehensive unit tests covering every v0.8.1 fix.
v0.8.2	P1–P10 (10 fixes) + 60+ unit tests	Serverless `/tmp` + parallel correctness + install gate + managed-location accessibility	P1 `_resolve_business_scratch_path` uses `tempfile.mkdtemp()` (foreign-UID `/tmp` reuse no longer permission-denies); P2 `domain name mismatch` is now CRITICAL (parallel domain-enrich no longer soft-accepts wrong-domain payload); P3 `_coerce_decimal_to_float` for pool-spec assembly (no more `Decimal` rejection by `spark.createDataFrame`); P4 shrink prompt forbids siloed products + `_detect_post_shrink_silos()`; P5 jobtags skip deleted job; P6 prompt brace-escape (`KeyError '0,62'` killed); P7 `JobLauncher.wait_for_run_terminal()` blocks parent until child terminal — no more parent-SUCCESS-over-child-FAIL; P8 `_validate_storage_accessible()` probes managed-location candidates, retries `CREATE CATALOG` bare on permission errors; P9 model.json accepts any `name` shape; P10 metric-view install count validation.
v0.8.3	R1, R3, R6, R7, R8, F2-regression (6 fixes) + 27 unit tests	Vibe-version write barriers + log durability + cycle determinism	R1 `_assert_vibe_version_advances` aborts in-place overwrites at three (then four — see v0.8.4) write barriers; R3 `_safe_volume_flush` skips copy if local log shrunk (no more truncation-on-success); R6 metric-view bare-name resolution via `DESCRIBE`; R7 `model_params.subdomain_required` mandatory in LLM JSON output; R8 deterministic Pass-2 cycle breaker; F2-regression makes `immutable violation` a CRITICAL (no soft-accept poisoning model_architect_review).
v0.8.4	hardening + runtime LOG SIGNALS for R1/F2-reg/R3/R6/R7/R8 + 4th R1 callsite + 13 new tests	Audit signal coverage + 4th vibe-version barrier	Every R/F/N fix from v0.8.3 now emits a runtime LOG SIGNAL (e.g. `alias=immutable-violation-critical`, `alias=log-no-truncate-on-success`, `alias=metric-view-bare-via-describe`) so post-run audit can grep them; 4th R1 write-barrier callsite added; 13 new unit tests cover the runtime signals.
v0.8.5	M1, M2, M3+M4+N1, M5, N6 (6 model-quality fixes) + 34 unit tests	mvm_v4 audit fixes — naming + FK semantics + canonical attrs + metric-views char-iter	M1 FK-name `IdId` double-suffix root cause (`normalize_fk_column_name` widened to read `attribute
v0.8.6	de-bias prompts + N5-FIX (R3 sentinels routed to volume info.log) + 7 anti-bias tests	Industry-agnostic prompt vocabulary + audit signal durability	M1/M3+M4/M5 prompt rules rewritten in abstract semantic vocabulary (`<EntityEarlier>`, `<Container>`, `<Owner>` patterns) — agent no longer biases retail-shaped models for non-retail industries; explicit anti-industry-bias warning block added to M5; N5-FIX appends R3 sentinels (`SHRUNK`, `SAFE-FLUSH`, `FINAL-FLUSH`) to local `_vl_local_info` so they reach the UC volume `info.log` (alias=`r3-sentinels-to-volume`).

What a P-number comment looks like in source

# v0.7.10 P0.96: use _metamodel.domain COUNT (same source as Cross-Validation).
# Prior SHOW SCHEMAS row-parsing returned 0 on serverless even with 11 schemas.

Every P-number appearing in this table is anchored in at least one such comment. grep -n "P0.NN" /tmp/agent_source.py (or the notebook) jumps straight to the enforcement site.

Going-forward contract

Each new P-fix MUST carry a stable v0.X.Y P0.NN comment at the enforcement site.
Each P-fix MUST pass the pytest unit tests in tests/unit-tests/ and the end-to-end tests/vibe_tester.ipynb.
Each P-fix MUST be listed in this table at tag-to-main time with theme + impact.

Legal Information

This software is provided as-is and is not officially supported by Databricks through customer technical support channels. Support, questions, and feature requests can be communicated through the Issues page of this repo. Issues with the use of this code will not be answered or investigated by Databricks Support.

Built on Databricks Serverless Compute with Unity Catalog governance

Name		Name	Last commit message	Last commit date
Latest commit History 250 Commits
agent		agent
docs		docs
images		images
models		models
rules		rules
runner		runner
tests		tests
viewer		viewer
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
readme.md		readme.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation