Skip to content

amralieg/vibe-modelling-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

250 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Databricks Vibe Modelling Agent

Generate production-grade enterprise data models from natural language

Databricks Serverless Unity Catalog LLM Powered

Describe your business. Get a data model. Vibe it until it's perfect.


Concepts · Getting Started · Widget Reference · Vibing Workflow · Action Catalog · Troubleshooting


Table of Contents


📚 Documentation

Document Description
docs/ Documentation index — whitepaper, design guide, integration guide
docs/design-guide.md Technical design reference
docs/integration-guide.md UI/consumer integration protocol
docs/whitepaper.md Philosophy and complete rules catalog
runner/readme.md Pipeline orchestrator guide
tests/readme.md Test suite reference

What Is Vibe Modelling?

flowchart LR
    IN["Business description<br/>+ natural-language vibes"] --> AG["Vibe Modelling Agent<br/>(multi-model LLM ensemble)"]
    AG --> MM["Metamodel<br/>(_business, _domain,<br/>_product, _attribute)"]
    AG --> PS["Physical schemas<br/>(UC catalogs, schemas,<br/>tables, FKs, tags)"]
    AG --> SM["Sample data<br/>(CSV per table)"]
    AG --> DC["Docs + artifacts<br/>(README, Excel, DBML,<br/>ontology, dictionary)"]
    AG --> NV["Next Vibes<br/>(self-suggested<br/>improvements)"]
Loading

Vibe Modelling is a Databricks-native, LLM-powered approach to generating enterprise data models from natural language. Instead of manually drawing ER diagrams, writing DDL, or importing pre-built industry templates, you describe your business in plain English and the agent builds a complete, production-grade data model — domains, tables, columns, foreign keys, tags, sample data, and documentation — end to end.

The name "Vibe" reflects the core workflow:

Generate a base model → review → vibe it with natural language → repeat → deploy

Each iteration produces a new version. The agent carries forward your context so nothing is lost between runs. You are never locked into a static template — the model evolves with your business.

Current version: v0.7.7 — see Version history and Recent fixes below.

Note on numbering: the dev branch tracked four internal iterations labelled v0.7.1, v0.7.2, v0.7.3, v0.7.4 between the v0.7.0 and v0.7.1 release tags. The v0.7.1 release tag on main consolidates all of those iterations into a single public release, so __AGENT_VERSION__ = "0.7.1" is stamped into every model.json produced from this point forward. The earlier per-iteration version-history rows below are kept verbatim as the engineering trace.


🚀 Three Ways to Run

1. Quick mode (5-minute start)

Fill only these 4 widgets and run the agent notebook:

Widget Value
business_name e.g. Airlines
business_description 2–3 sentences describing the business
data_model_scopes Minimum Viable Model - MVM (lean) or Expanded Coverage Model - ECM (full)
deployment_catalog Unity Catalog target, e.g. industry_mvm_v1

Everything else auto-fills. You get a complete metamodel + physical schemas + sample data + documentation + next_vibes.txt for iteration.

2. Runner mode (both ECM + MVM with test install)

When you want both ECM and MVM produced and installed in one go — plus a staging round-trip to prove the install works before landing in the permanent catalogs — use the vibe_runner notebook instead of the agent directly.

Runner launches a 4-task Databricks Job:

  1. ECM generate — builds the full ECM into a staging catalog
  2. ECM install — deploys the ECM into {business}_ecm_v1
  3. MVM shrink — derives MVM from ECM in a staging catalog (reuses the ECM so you don't regenerate from scratch)
  4. MVM install — deploys the MVM into {business}_mvm_v1

Tasks 2 and 3 run in parallel once Task 1 is done; Task 4 waits for Task 3.

Inputs: a single my-industries.json file listing one or more businesses (name + description + optional vibes) plus the standard widget defaults. Runner reads the file, auto-discovers the latest dbx_vibe_modelling_agent_v* notebook in the same directory, and orchestrates the job.

See runner/readme.md for the exact file format and submit recipe.

3. Full mode (every widget)

Use the Widget Reference section below when you need fine control over naming conventions, tag prefixes, sample data volume, or installation behavior.

Widget semantics note (v0.5.8+): for schema_prefix, schema_suffix, catalog_prefix, catalog_suffix, tag_prefix, tag_suffix — an explicitly empty widget value overrides whatever is in the model.json. For all other widgets, empty widget = "use file value if present".


📖 Concepts

Industry Data Models vs. Business Data Models

What is an Industry Data Model?

An industry data model is a generic, one-size-fits-all template designed for an entire vertical — retail, banking, healthcare, telecoms, etc. Organizations like the TM Forum (telecoms), ARTS (retail), ACORD (insurance), and HL7 (healthcare) publish canonical schemas that attempt to cover every conceivable entity and relationship across the industry.

The problem: When you adopt an industry model, you inherit everything — including the 60–80% of tables that your business will never use. You then spend months trimming, renaming, and reshaping the model to fit your actual processes.

What is a Business Data Model?

A business data model is tailored, contextualized, and specific to YOUR organization. It reflects your actual business processes, your product lines, your org structure, your regulatory environment, your terminology, and your governance requirements.

Vibe Modelling generates business data models. The LLM understands your industry deeply (via the complexity tier system), but the model it produces is shaped entirely by your context.

Aspect Industry Data Model Business Data Model (Vibe)
Scope Entire industry vertical Your specific business
Customization Post-delivery (manual pruning) Built-in (LLM-driven from your context)
Relevance 20–40% directly applicable 90–100% directly applicable
Time to production Months of adaptation work Hours with iterative vibing
Naming Committee-standard naming Your business terminology and conventions
Evolution New version = re-adoption Vibe the next version from the previous one
Cost License fees + adaptation labor Compute cost of LLM generation

The Vibe Philosophy

1. GENERATE  →  Describe your business, get a base model
2. VIBE IT   →  Review output, provide natural-language refinements
3. REPEAT    →  Each iteration = new version; agent auto-suggests next vibes
4. DEPLOY    →  Physical Unity Catalog schemas, tables, FKs, tags, sample data

The Four-Level Hierarchy: Divisions → Domains → Products → Attributes

Every Vibe model follows a strict four-level hierarchy:

graph TD
    B["🏢 BUSINESS<br/><i>e.g., Contoso Manufacturing</i>"]
    B --> D1["⚙️ Operations Division"]
    B --> D2["💼 Business Division"]
    B --> D3["🏛️ Corporate Division"]
    
    D1 --> DOM1["📦 logistics"]
    D1 --> DOM2["🏭 production"]
    D1 --> DOM3["📋 inventory"]
    
    D2 --> DOM4["👤 customer"]
    D2 --> DOM5["💰 sales"]
    D2 --> DOM6["🧾 billing"]
    
    D3 --> DOM7["👥 hr"]
    D3 --> DOM8["📊 finance"]
    
    DOM5 --> P1["🗂️ order"]
    DOM5 --> P2["🗂️ order_item"]
    DOM5 --> P3["🗂️ quote"]
    
    P1 --> A1["🔑 order_id <i>(PK, BIGINT)</i>"]
    P1 --> A2["📅 order_date <i>(DATE)</i>"]
    P1 --> A3["🔗 customer_id <i>(FK → customer.customer)</i>"]
    P1 --> A4["💲 total_amount <i>(DECIMAL)</i>"]
    P1 --> A5["📌 status <i>(STRING)</i>"]
Loading

Level 1: Divisions

Divisions are the top-level organizational grouping. Every domain belongs to exactly one division.

Division Purpose Typical Domains
Operations Core operational backbone — mechanisms, infrastructure, processes Logistics, production, inventory, service delivery, supply chain, quality control
Business Revenue-generating and customer-facing functions Customer/party, billing/revenue, product catalog, sales, subscriptions
Corporate Supporting functions for governance (NOT directly revenue-generating) HR, finance, legal/compliance, marketing, procurement, IT

Division Balance Rules:

  • Operations + Business MUST be >= 80% of all domains
  • Corporate is capped at <= 20% of total domains
  • No Corporate domain allowed until Operations AND Business each have >= 2 domains

Level 2: Domains

A domain is a logical grouping of related data products (tables). When deployed, each domain maps 1:1 to a Unity Catalog schema.

  • Named in snake_case, exactly one word (e.g., customer, fulfillment, logistics)
  • Count depends on model scope (MVM vs ECM) and industry complexity tier
  • The shared domain is reserved — the pipeline auto-creates it during SSOT consolidation; never create it manually

Level 3: Products (Tables)

A product is a data table within a domain — a first-class business entity with its own identity, lifecycle, and attributes.

  • Every product gets a PK column: <product_name>_<pk_suffix> (default: <product_name>_id)
  • Classified as CORE (entities stakeholders query directly) or HELPER (supporting entities)
  • Tagged by data type: master_data | transactional_data | reference_data | association_table

The First-Class Entity Test: A product must have its own identity, its own lifecycle, and at least 5 unique business attributes. Anything less should be an attribute on another table or merged.

Level 4: Attributes (Columns)

Each attribute has:

Property Example Purpose
attribute customer_email Column name (snake_case)
type STRING Apache Spark SQL data type
tags restricted,pii_email Classification + PII tags
value_regex ^[a-zA-Z0-9._%+-]+@ Validation pattern or enum
business_glossary_term Customer Email Address Human-readable business name
description Primary contact email for the customer What this attribute represents
reference GDPR Article 6 Regulatory or standard reference
foreign_key_to customer.customer.customer_id FK target (domain.product.pk), or empty

DAG Enforcement — No Circular Dependencies

FK relationships MUST form a Directed Acyclic Graph (DAG):

✅ VALID:   order → customer → address
❌ INVALID: order → customer → address → order  (cycle!)

Why it matters: Circular dependencies prevent clean data loading order, break ETL pipelines, and indicate modeling errors.

How the agent enforces it:

  1. Detect — Python DFS (Depth-First Search) cycle detection during QA
  2. Break — LLM Cycle Break specialist determines which FK to remove (always the weakest link)
  3. Verify — Re-run DFS to confirm graph is now a valid DAG
  4. Iterate — Up to 5 rounds to resolve all cycles, including residual ones

Allowed exception: Self-referencing hierarchical FKs (e.g., parent_category_id → category.category_id) are permitted — they represent tree structures, not multi-node cycles.


Single Source of Truth (SSOT)

Each core business concept has exactly ONE authoritative domain and ONE authoritative product (table) that owns it. No concept is duplicated across domains.

What SSOT prevents:

  • Two customer tables in different domains
  • A product table in both sales and inventory
  • An invoice table in both billing and finance

How SSOT is enforced:

Phase Mechanism
Generation LLM places each entity in its authoritative domain — the domain that CREATES/OWNS that data
QA Deduplication Global pass detects same-name products + synonym pairs (e.g., customer vs client) with 60%+ attribute overlap
Consolidation Overlapping products merged into shared domain with discriminator column
Cross-Domain References Other domains use FK columns to reference the authoritative table — no duplication

The "Where Do I Go?" Test: For any entity, ask: "If a business user needs this information, is there exactly ONE place to get it?" If the answer is ambiguous, there is an SSOT violation.


Model Scopes: MVM vs. ECM

MVM (Minimum Viable Model) ECM (Expanded Coverage Model)
Size 30–50% of ECM table count Full coverage
Attribute depth SAME as ECM — full production-grade (same min/max per tier) Full production-grade
Domains Essential business functions only All functions including corporate back-office
Ideal for SMBs, rapid deployments, POCs, dev/test Fortune 100, multinational enterprises
Lightness Fewer domains & tables (NEVER thinner attributes) Maximum breadth

MVM is NOT a skeleton or demo toy. It is a production-ready subset where every delivered table is fully-featured.

Subdomains

Every domain is organized into subdomains — semantic groupings of related products within a domain. Subdomains provide an additional organizational layer between domains and products.

Subdomain Rules

Rule Description
Count Minimum and maximum subdomains per domain are defined by the complexity tier. Never exactly 1 subdomain per domain.
Naming EXACTLY 2 words per subdomain name. 1 word = rejected. 3+ words = rejected.
Min Products Every subdomain must contain at least the minimum products per subdomain defined by the tier: ECM tier_1–tier_4: 3, ECM tier_5: 2; MVM tier_1–tier_3: 3, MVM tier_4–tier_5: 2.
No Overlap No two subdomains within the same domain may share any word in their names.
Business Terms Use business terminology, NOT technical terms.
Balanced Products distributed as evenly as possible across subdomains.
No Placeholders NEVER use: "Sub Domain1", "Category 1", "Group A", "N/A", "Other", "General", "Miscellaneous", "Uncategorized".
No Drift Each subdomain belongs to exactly one parent domain.

Example: In the party domain:

  • identity: individual, organization, party_identification, kyc_verification
  • engagement: party_interaction, consent_record, loyalty_enrollment

Industry Complexity Tiers

The agent auto-classifies your business into one of five tiers:

Classification is based on 7 scoring dimensions (scored 0 or 1 each):

  1. Regulatory density — 3+ distinct regulatory bodies imposing data/reporting requirements
  2. Party complexity — 3+ distinct party types (customers, suppliers, partners, employees, etc.)
  3. Product hierarchy depth — 50+ product variants with complex bundling/pricing/lifecycle
  4. Infrastructure management — Owns physical or digital infrastructure requiring asset tracking
  5. Industry canonical model — 200+ entity types defined by an industry standards body
  6. Transaction complexity — 10+ distinct transaction types with multi-step lifecycles
  7. Operational system landscape — 5+ major systems of record across business functions

Rule: If the business falls between two tiers, classify UP (prefer the more complex tier).

Tier Label Hallmarks ECM Domains ECM Products/Domain Attrs/Product Subdomains/Domain
tier_1 Ultra-Complex 5+ regulatory bodies, multi-entity structures (banking, insurance, pharma) 15–22 14–28 15–50 3–6
tier_2 Complex 2–4 regulatory bodies, multi-channel (telecoms, energy, healthcare) 12–18 14–26 12–50 2–5
tier_3 Moderate 1–2 regulatory bodies, 3+ business functions (manufacturing, retail) 10–15 12–24 10–45 2–5
tier_4 Standard Light regulation, regional complexity (logistics, agriculture) 8–12 10–20 10–40 2–4
tier_5 Simple Minimal regulation, service-based (consulting, SaaS, media) 5–8 8–18 8–35 2–4

MVM Tier Sizing (attribute depth is the SAME as ECM):

Tier MVM Domains MVM Products/Domain Subdomains/Domain
tier_1 9–14 8–16 2–4
tier_2 8–12 8–14 2–4
tier_3 6–10 7–13 2–4
tier_4 5–8 6–11 2–3
tier_5 3–6 5–10 2–3

MVM counts are automatically derived as ~30–50% of the ECM counts for each tier. Attribute depth (min/max attributes per product) is the same for both MVM and ECM within each tier.


🚀 Getting Started

Quick Start Recipes

Recipe 1: Generate Your First Model (Minimal Input)
Widget Value
01. Business My Company Name
02. Description A brief description of what your company does
03. Operation new base model
05. Model Scope Minimum Viable Model - MVM
09. Installation Catalog my_catalog (or leave blank for logical-only)
Everything else Defaults

Run the notebook. Done.

Recipe 2: Generate a Rich Model (Recommended)
Widget Value
01. Business Contoso Manufacturing
02. Description A multinational aluminum smelting company operating across 12 countries with 15,000 employees
03. Operation new base model
05. Model Scope Expanded Coverage Model - ECM
06. Business Domains production, quality, supply, customer, sales, logistics, billing
07. Org Divisions Operations, Business and Corporate
09. Installation Catalog contoso_dev
10. Sample Records 10
Recipe 3: Vibe an Existing Model
Widget Value
01. Business Contoso Manufacturing
02. Description (same as before)
03. Operation vibe modeling of version
04. Version 1
08. Model Vibes Add a warranty domain. Remove the corporate_strategy domain. Run quality checks.
Recipe 4: Use Auto-Generated Next Vibes

After any pipeline run, find next_vibes.txt in the vibes/ output folder. Copy the suggested vibes into widget 08.

Widget Value
01. Business Contoso Manufacturing
02. Description (same as before)
03. Operation vibe modeling of version
04. Version Next version number (previous + 1)
08. Model Vibes (paste recommended vibes from next_vibes.txt)
Recipe 5: Deploy to a New Catalog
Widget Value
03. Operation install model
09. Installation Catalog prod_catalog
11. Model JSON File /Volumes/catalog/schema/volume/vibes/contoso/ecm_v1/model.json
10. Sample Records 0 (or a number for test data)

⚙️ Operations

The 03. Operation widget selects which pipeline mode to run:

Operation Purpose Key Requirements
new base model Generate a brand-new data model from scratch Business name + description (via widgets)
vibe modeling of version Apply natural-language instructions to refine an existing version Version + vibes (widget 08)
shrink ecm Convert an ECM to a leaner MVM Version + deployment catalog
enlarge mvm Expand an MVM into a comprehensive ECM Version + deployment catalog
install model Deploy a logical model into physical Unity Catalog objects Model JSON file (widget 11) + deployment catalog
uninstall model version Remove a version's physical artifacts from the catalog Business name + version + catalog
generate sample data Generate synthetic sample records for an existing deployed model Model JSON file (widget 11) + deployment catalog

🎵 The Vibing Workflow

This is the core power of Vibe Modelling — iterative refinement via natural language:

graph LR
    A["🆕 New Base Model<br/>Version 1"] --> B["🔍 Review Output"]
    B --> C["📝 Write Vibes<br/><i>or use auto-generated next_vibes.txt</i>"]
    C --> D["🎵 Vibe Modeling<br/>Version 2"]
    D --> E["🔍 Review"]
    E --> F["🎵 Vibe v3"]
    F --> G["..."]
    G --> H["✅ Deploy"]
Loading

What Vibes Can Do

Vibes are free-form natural language. The agent interprets them and translates them into specific actions:

"Add a compliance domain with regulatory_filing and audit_trail tables"
"Remove the HR domain, we don't need it"
"Merge the customer_support domain into the customer domain"
"Add a source_system column to every table"
"Run a full quality check and fix any issues found"
"The order table should have a shipping_address_id FK to the address table"
"Rename all tables starting with dim_ to remove that prefix"
"Normalize the order domain to 3NF"
"Mark all email columns as PII"
"Generate an ontology (RDF/RDFS) for the model"
"Keep only billing-related tables, drop everything else"
"Make this model ECM — I need the large version"

Auto-Generated Next Vibes

After every pipeline run, the agent produces next_vibes.txt and current_vibes.txt in the vibes/ folder, containing:

  • Your current business context (preserved as-is)
  • Recommended vibes for the next iteration (based on QA findings)
  • Model health metadata: confidence score, warning counts, issue breakdown
  • Version history and progression tracking

To use them: copy the recommended vibes into widget 08. Model Vibes, set operation to vibe modeling of version, set the next version, and run.


🎛️ Widget Reference

The notebook exposes 28 configurable widgets. Below is the complete reference.

Core Configuration Widgets (01–11)

# Widget Type Mandatory Default Description
01 Business (name) Text Yes Your business/organization name
02 Description Text Recommended What your business does (richer = better model)
03 Operation Dropdown Yes new base model Pipeline operation to run
04 Version Dropdown Conditional1 Model version number (1–100)
05 Model Scope Dropdown Yes Minimum Viable Model - MVM MVM (lean) or ECM (comprehensive)
06 Business Domains Text No Comma-separated seed domains
07 Included Org Divisions Dropdown Yes Operations and Business Which divisions to include
08 Model Vibes Text Conditional2 Natural language instructions — inline (max 2,000 chars) or file path to a .txt on a UC Volume
09 Installation Catalog Text Conditional3 Unity Catalog for physical deployment
10 Sample Records Dropdown No 0 Synthetic records per table (0 = none)
11 Model JSON File Text Conditional4 Path to a previously generated model.json for re-install or continuation

1 Required for all operations except new base model (auto-assigned). 2 Required for vibe modeling of version. 3 Required for install, uninstall, generate sample data, shrink, enlarge. Optional for new base model and vibe modeling. 4 Required for install model, generate sample data when re-deploying a previously generated model.

Detailed Widget Descriptions (click to expand)

01. Business (name)

The name of your business. Used as the top-level identifier across the entire model. Case-insensitive matching (stored via LOWER()).

Sample values: Contoso Inc, Acme Healthcare, Global Telecom Corp, NextGen Retail

02. Description

A rich description of what your business does. Include industry, size, geography, key products/services. The LLM uses this to determine the industry complexity tier and tailor the model.

Sample values: A multinational aluminum smelting and manufacturing company operating across 12 countries, A digital-first healthcare provider specializing in telemedicine

03. Operation

Options: new base model | vibe modeling of version | shrink ecm | enlarge mvm | install model | uninstall model version | generate sample data

See the Operations section for full details on each mode.

04. Version

Options: Empty, or 1100

For new base model, leave empty — auto-assigned as 1 (or auto-incremented). For vibe modeling of version, this is the version you are modifying; output creates version N+1.

05. Model Scope

Options: Minimum Viable Model - MVM | Expanded Coverage Model - ECM

MVM = lean core (fewer domains/tables, same attribute depth). ECM = comprehensive Fortune 100 coverage.

06. Business Domains

Comma-separated list of specific domains you want. If blank, the LLM auto-generates the optimal set for your industry.

Sample values: customer, sales, billing, inventory, logistics | clinical, pharmacy, research

07. Included Org Divisions

Options: Operations | Operations and Business | Operations, Business and Corporate

Controls which organizational divisions contribute domains to the model.

08. Model Vibes

Supports two input modes:

  • Inline text — type your vibe instructions directly into the widget (max 2,000 characters). Best for short, targeted changes.
  • File path — provide a path to a .txt file on a Unity Catalog Volume (e.g., /Volumes/catalog/schema/vol/my_vibes.txt). Best for longer, multi-paragraph instructions that exceed the inline limit.

See The Vibing Workflow for examples of what vibes can do.

09. Installation Catalog

The Unity Catalog where physical schemas, tables, FK constraints, tags, and sample data will be created. Must already exist. You need CREATE SCHEMA privileges. If blank for new base model/vibe modeling, only the logical model (JSON artifacts) is produced.

Sample values: dev_catalog, prod_data_models, contoso_lakehouse

10. Sample Records

Options: 0, 5, 10, 15, 20, 25, 50, 100

0 = skip sample data generation. 10 = good default for review. 50100 = for load testing / demos. Sample data respects FK relationships (child records reference valid parent IDs).

11. Model JSON File

(Optional) Path to a previously generated model.json file on a Unity Catalog Volume. Used when re-installing or continuing from a prior run. After every run, the agent generates next_vibes.txt in the vibes/ folder — review it for recommended next vibe instructions.

Sample values: /Volumes/my_catalog/my_schema/vol/vibes/contoso/ecm_v1/model.json

Model Convention Widgets (12–24)

# Widget Type Default Options / Format
12 Naming Convention Dropdown snake_case snake_case, camelCase, PascalCase, SCREAMING_CASE
13 Primary Key Suffix Dropdown _id _id, _key, _pk, id, key
15 Schema Prefix Text (empty) e.g., stg_, raw_, dw_
16 Tag Prefix Text dbx_ e.g., dbx_, vibe_, mdl_
17 Table ID Type Dropdown BIGINT BIGINT, INT, LONG, STRING
18 Boolean Format Dropdown Boolean (True/False) Boolean (True/False), Int (0/1), String (Y/N)
19 Date Format Dropdown yyyy-MM-dd yyyy-MM-dd, dd/MM/yyyy, MM/dd/yyyy, yyyy/MM/dd, dd-MM-yyyy
20 Timestamp Format Dropdown yyyy-MM-dd'T'HH:mm:ss.SSSXXX 4 ISO/standard options
21 Classification Levels Text restricted=restricted, confidential=confidential, internal=Internal, public=public Comma-separated key=Label pairs
22 Housekeeping Columns Dropdown No No, Yes — adds created_by, created_at, updated_by, updated_at
23 History Tracking Columns Dropdown No No, Yes — adds valid_from, valid_to, is_current (SCD Type 2)
09a Cataloging Style Dropdown One Catalog One Catalog, Catalog per Division, Catalog per Domain
09b Catalog Prefix Text (empty) e.g., dev_, prod_
09c Catalog Suffix Text (empty) e.g., _lakehouse, _dw
15a Schema Suffix Text (empty) e.g., _db, _schema
16a Tag Suffix Text (empty) e.g., _tag
24 Vibe Session ID Text (empty) UUID for external UI progress tracking

Widget #14 does not exist (numbering gap between 13 and 15).


📄 Auto-Generated Next Vibes

After every run, the agent produces next_vibes.txt in the vibes/ output folder with recommended next vibe instructions. Copy these into widget 08. Model Vibes for the next iteration.

The model.json file includes session metadata:

{
  "_next_vibe_metadata": {
    "generated_from_version": "mvm_v1",
    "model_version": "1",
    "status": "needs_work",
    "confidence_score": 78,
    "summary": "Model has 3 unlinked columns and 1 siloed table",
    "issues_addressed": ["..."],
    "issues_not_addressed": ["..."],
    "data_modeler_notes": "Recommend adding a warehouse domain",
    "model_stats_at_generation": { "domains": 8, "products": 47, "attributes": 523 },
    "issue_counts": { "error": 0, "warning": 3, "info": 5 },
    "version_history": ["..."]
  }
}

🎯 The Complete Action Catalog

Vibes are translated into 190+ specific actions organized into categories. You never need to name these actions — just describe what you want in natural language and the agent maps your intent.

Entity Management (1–25)
# Action What It Does
1 drop Remove a domain, product, attribute, tag, or link
2 create Add a new domain, product, or attribute
3 rename Change the name of an entity
4 alter_description Modify the description
5 change_type Change an attribute's data type
6 merge Combine two entities into one
7 split Divide one entity into multiple
8 move_product Move a product to another domain
9 move_attribute Move an attribute to another product
10 delete_attribute Remove an attribute
11 create_link Create a foreign key relationship
12 drop_link Remove a foreign key relationship
13–17 Prefix/suffix ops Add, remove, or change name prefixes/suffixes
18–24 Metadata ops Update glossary terms, references, regex, tags
25 modify Regenerate an entity with specific guidance
Quality Check & Analysis (26–38)
# Action What It Does
26 run_quality_checks Compound: runs detect_duplicates + dedupe_attributes + detect_cycles + detect_siloed + fix_fk_anomalies
26b run_product_domain_fit LLM audit: are products in the right domains? Relocates misplaced ones.
27 detect_duplicates Find SSOT violations (semantic duplicate products across domains)
28 fix_duplicates Auto-merge/remove detected duplicates
29 dedupe_attributes Remove duplicate columns within products
30 detect_cycles Find circular FK dependencies
31 break_cycles Auto-fix cycles by removing weakest FK links
32 detect_siloed Find completely disconnected tables (no FKs in any direction)
33 fix_siloed Connect disconnected tables via linking attempts
34 review_links Audit all FK relationships for anomalies
35 fix_fk_anomalies Repair broken, mismatched, or orphaned FK references
36 fix_ambiguous_fks Resolve FKs that match multiple target tables
37 merge_small_tables Analyze and consolidate tables with < 5 attributes
38 identify_core_products Identify business-critical products for protection
95 model_checkup Mega-compound: runs static analysis + auto-queues ALL appropriate fixes
FK & Linking Operations (39–46, 133–138)
# Action What It Does
39 run_linking Compound: in-domain + cross-domain + M:N detection
40 run_in_domain_linking Link FKs within each domain
41 run_cross_domain_linking Link FKs across different domains
42 detect_many_to_many Find potential M:N relationships
43 create_junction_tables Create bridge/association tables for M:N
45 redirect_fk Redirect FK to a different target table
46 find_unlinked_columns Find *_id columns without FK relationships
133 remove_product_prefix Remove redundant table-name prefix from columns
134 fix_fk_column_naming Fix FK columns that don't end with the target PK name
135 connect_table Connect a specific disconnected table
136 link_specific_columns Link an explicit list of unlinked _id columns
138 find_missing_fk_links Comprehensive: LLM classifies each unlinked column as LINK, CREATE, DROP, or KEEP_AS_IS
Bulk Operations (62–64, 110–114)
# Action What It Does
62 bulk_rename_products Rename multiple products by pattern
63 bulk_drop_products Drop products matching a pattern (e.g., stg_*)
64 bulk_move_products Move products matching a pattern to another domain
110 bulk_change_type Change data type for attributes by pattern
111 bulk_set_nullable Set nullable for attributes by pattern
113 bulk_remove_attributes Remove attributes matching a pattern from all tables
Tag & Classification Operations (54–61, 98–106, 156–163)
# Action What It Does
54–57 Tag add/remove Add or remove tags from products or domains
58 conditional_tag Tag entities matching a condition
59–60 Bulk tagging Tag by name pattern or by column presence
98–101 PII/sensitive marking Mark as PII, sensitive, encrypted, deprecated
103 set_table_type Classify as dimension, fact, lookup, bridge, staging, archive
156 classify_table_tier Medallion architecture: bronze / silver / gold
161 set_data_owner Assign data owner/steward
162 set_update_frequency Document expected freshness (real_time, daily, monthly, etc.)
163 map_to_source_system Map table/column to source system (SAP, Salesforce, etc.)
Template Column Actions (148–155)
# Action Columns Added
148 add_scd_columns effective_from, effective_to, is_current, row_hash
149 add_audit_columns created_at, updated_at, created_by, updated_by
150 add_soft_delete_columns is_deleted, deleted_at, deleted_by
151 add_temporal_columns valid_from, valid_to, system_from, system_to
152 add_versioning_columns version_number, version_valid_from, version_valid_to, is_latest_version
153 add_multitenancy_columns tenant_id
154 add_lineage_columns source_system, source_table, ingestion_timestamp, etl_job_id
155 add_gdpr_columns consent_status, consent_date, data_subject_request_id, right_to_erasure_date
Structural Transformations (164–191)
# Action What It Does
164 normalize_to_3nf Apply Third Normal Form normalization (LLM-powered)
165 denormalize_for_analytics Create wide/denormalized tables for BI
184 promote_to_table Extract an attribute into its own lookup/reference table + FK
185 inline_table Merge a child table into its parent (denormalize)
186 swap_domains Atomically swap two domain names
187 impact_analysis Show what would break if a table/domain were dropped (read-only)
190 enlarge_model Wholesale expansion from MVM to ECM scope
191 shrink_model Wholesale reduction from ECM to MVM scope
178 VIBE_PRUNE_PROMPT LLM-powered: keep only tables related to a focus area
179 drop_domains_except Drop all domains except a specified keep-list
Artifact Generation (139–147)
# Action Output
139 generate_readme README documentation
140 generate_data_model_json Complete model as JSON
141 generate_ontology RDF/RDFS ontology (Turtle format)
142 generate_dbml DBML schema for visualization tools
143 generate_release_notes Changelog/release notes for the version
144 generate_excel Excel workbook export
145 generate_data_dictionary Comprehensive data dictionary
146 export_model_report Full model documentation report
147 generate_test_cases Data quality test case specifications
Metric View Operations (126–134)
# Action What It Does
126 run_metric_modeling Generate Databricks metric views (dimensions + measures)
129 add_metric_measure Add a specific KPI measure to metric views
130 remove_metric_measure Remove a measure
131 add_metric_dimension Add a grouping dimension
132 remove_metric_dimension Remove a dimension
133 alter_metric_filter Set/update metric view filter logic
134 drop_metric_view Remove an entire metric view

🔄 Pipeline Stages

When you run the agent, it executes these progress stages in order (the # column is for reference only — no numeric stage IDs are emitted in progress events):

# Stage Duration What Happens
1 Setup and Configuration 2–10s Validates inputs, creates metamodel tables
2 Interpreting Instructions 10–30s (Vibe mode only) Parses vibes into structured action plan
3 Collecting Business Context 10–30s LLM enriches your description across business/industry dimensions
4 Designing Domains 15–60s Generates domains following division model + SSOT
5 Creating Data Products 1–10m Products per domain with domain-architect (Step 3.6) and global-architect (Step 3.7) review
6 Enriching Data Products with Attributes 5–40m All columns for every product
7 Cross-Domain Linking 1–5m In-domain → global sweep → pairwise comparison
8 Quality Assurance 30s–3m 9 sub-checks (naming, PK/types, overlaps, topology, auto-remediation)
9 Applying Naming Conventions 10–30s Final naming consistency pass
10 Model Finalization 10–30s Finalize logical model snapshot before physical deployment
11 Subdomain Allocation 10–30s Allocates products into semantic subdomains
19 Generating Metric View Artifacts 10–30s Exports metric view definitions (legacy stage ID)
17 Generating Artifacts 30s–2m README, Excel/CSV, model JSON, data dictionary, model report
12 Physical Schema Construction 1–10m Creates UC schemas + Delta tables (if catalog set)
13 Applying Foreign Keys 30s–2m FK constraints on physical tables
14 Applying Tags 2–15m Classification, PII, data type tags on schemas/tables/columns
15 Applying Metric Views 30s–5m Databricks metric views for KPI tracking
16 Generating Sample Data 1–15m Synthetic records respecting FK relationships
18 Consolidation and Cleanup 10–30s Consolidates/merges metadata and cleanup

📦 Output Artifacts

Logical Artifacts (Always Generated)

File Description
model.json Complete model export (primary output)
readme.md Human-readable model documentation
vibes/current_vibes.txt Vibe instructions used in this run
vibes/next_vibes.txt Recommended next vibe instructions
domains.json All domain definitions
products.json All product (table) definitions
attributes.json All attribute (column) definitions
docs/*.xlsx / docs/*.csv Excel/CSV export of the entire model (CSV fallback if openpyxl unavailable)
readme.md (parent folder) Model overview document comparing MVM and ECM scopes
schemas/*.sql SQL DDL files (per-domain schemas, cross-domain FKs, catalogs)
diagram/*_dbml_*.txt DBML schema diagram
ontology/*_rdf_*.ttl RDF/Turtle ontology representation
docs/releasenotes.txt Auto-generated release notes

Conditional Artifacts (Generated on Demand via Queued Operations)

File Description
docs/*_data_dictionary_*.txt Column-level reference guide
docs/*_model_report_*.txt Full documentation report
docs/*_test_cases_*.txt Generated test cases

Physical Deployment (When Catalog Is Set)

Object Example
Schemas catalog.customer, catalog.sales, catalog.logistics
Tables Delta tables with all columns and correct Spark SQL types
FK Constraints Physical foreign key constraints between tables
Tags Unity Catalog tags on schemas, tables, and columns
Metric Views Databricks metric views for KPI calculation
Sample Data Synthetic records with valid FK references

👁️ Viewing a Generated Model

Once the agent has produced a model (a model.json file plus the artifacts under <industry>/<scope>_v1/), you have two ways to explore it: open it in the model-viewer Databricks App for an interactive ER-graph view, or load model.json programmatically.

The model-viewer app

The viewer/model_viewer_app_installer.ipynb notebook in this repo provisions a Databricks App that renders any model.json as an interactive entity-relationship graph. It exposes three navigable views — full model, single domain, single product — and lets you load models either directly from a GitHub repo (e.g. amralieg/vibe-business-data-models) or by uploading a local model.json file.

Install the viewer

  1. Open viewer/model_viewer_app_installer.ipynb in your Databricks workspace.
  2. Run all cells. The installer creates the Databricks App and prints the app URL when it completes.
  3. Open the app URL in a browser.

Load a model

Inside the app you have two loaders:

  • Load from repo — paste any GitHub owner/repo (default: amralieg/vibe-business-data-models) and pick the industry + flavour from the dropdown. Note: GitHub rate-limits anonymous API calls — if you hit a 429 / "rate limit exceeded" message, fall back to the JSON loader.
  • Load from JSON — upload a model.json directly. Use this when (a) you just generated a model with the agent and want to inspect it before deploying, or (b) GitHub is rate-limiting the repo loader.

What you see — three views

1. Full-model overview. Every entity in the model arranged on a single canvas, FK relationships drawn between them, domains colour-coded as rectangles, products on the perimeter:

Full-model overview — Retail MVM

2. Domain drill-down. Click any domain to zoom in. You see the domain's sub-domains as named groups and the products inside each, with the FK web restricted to within-domain links:

Domain drill-down — order domain in Retail MVM

3. Single-product radial view. Click any product (table) to centre it. Every other product it relates to via FK fans out around it, grouped by domain, so you can see at a glance every join path leaving that table:

Single-product radial view — order.order_line in Retail MVM

Programmatic inspection (no app required)

import json
m = json.load(open('<industry>/mvm_v1/model.json'))['model']
for d in m['domains']:
    for p in d.get('products', []):
        print(f"{d['name']}.{p['name']}: {len(p['attributes'])} columns, "
              f"{sum(1 for a in p['attributes'] if a.get('foreign_key_to'))} FKs")

The _manifest.json next to every model.json gives you a quick counts summary (domains, products, attributes, FKs, metric views) without parsing the full model.


🛡️ Quality Rules & Enforcement

The agent enforces a comprehensive rule system during generation and QA:

Rule Group Key Rules
Naming (G01) snake_case default; domains: 1 word, singular, max 20 chars; products: 1–3 words, max 30 chars, no domain prefix; attributes: max 50 chars, no product prefix (except PK); FKs must end with target PK name; preserve unit qualifiers
Semantic Dedup (G02) First-Class Entity Test (5 criteria); SSOT violation detection; 60%+ overlap → merge to shared; 90%+ overlap same domain → remove; attribute dedup >80% confidence; semantic distinction rules (method vs channel, target vs actual, lifecycle timestamps)
FK Integrity (G03) FK target must exist; no bidirectional FKs; DAG required (no cycles); FK type compatibility; system identifiers exempted; hierarchical self-refs exempt
PK Rules (G04) Every product has exactly 1 PK; PK = {product}_{suffix}; PK type = configured type (default BIGINT); PK exempt from prefix stripping
Normalization (G05) 3NF enforcement; orphaned FK detection; denormalized attribute detection with >95% confidence; point-in-time snapshots exempt; measurement attributes exempt; geographic coordinates on physical entities exempt
Division Balance (G06) Ops + Business ≥ 80% of domains; Corporate ≤ 20%; no Corporate until Ops AND Business each have ≥ 2; cross-division relocation forbidden; forbidden generic domain names; Org Chart Test; Fragmentation Test (30%+ overlap → merge)
Data Types (G07) Spark SQL types only; no ARRAY/STRUCT/MAP; no calculated metrics/aggregates
Tags (G08) Classification in tags only; PII = RESTRICTED + specific PII tag (pii_email, pii_phone, pii_financial, pii_health, pii_identifier, pii_address, pii_biometric, pii_name, pii_dob); empty tags for regular data; custom user tags always applied
Graph Topology (G09) DAG enforcement; DFS cycle detection; hierarchical self-ref exemption; zero siloed tables; each domain ≥2 cross-domain connections; parent-child FKs never broken in cycle resolution
Honesty Check (G11) 0–100 scale; below 55 = permanently discarded; 55–70 = borderline retry; ≥90 = accepted; contradiction penalty applied post-processing
Product Design (G12) M:N requires 3 indicators with ≥2 of 3 strong; association ratio ECM ≤15%, MVM ≤5%; core products 1–3 per domain; forbidden product suffixes (_analysis, _analytics, _report, _prediction); Silver Layer only (no analytics products)
Sample Data (G13) Exact record count; sequential PK from 10001; FK random [10001, 10001+N-1]; regex compliance; 3-letter country codes; no Lorem Ipsum; realistic business data
Vibe Constraints (G14) Dedup overlap thresholds; max relocation percentage per pass; normalization confidence 95%; mutation budgets by mode (surgical, holistic, generative); domain hard ceiling factor 1.5x
Physical Schema Deployment (G15) Schema and table creation factories; attribute dict factory; product dict factory; FK dependency ordering; consolidation and cleanup
Subdomains Exactly 2-word names; min products per subdomain per tier; no overlapping words; balanced distribution; no placeholder names

🤖 LLM Architecture

The agent uses a multi-model ensemble with automatic demotion and recovery:

Order Role Purpose Endpoint Input Tokens Output Tokens
10 Thinker (large) Complex reasoning, architecture reviews, QA decisions databricks-claude-opus-4-6 200,000 128,000
20 Worker (large) High-volume generation: products, attributes, FKs, dedup databricks-claude-sonnet-4-6 200,000 64,000
30 Thinker (large) Fallback thinker databricks-claude-opus-4-5 200,000 64,000
40 Worker (large) Fallback worker databricks-claude-sonnet-4-5 200,000 64,000
50 Worker (small) Simpler tasks: domain generation, tag classification databricks-gpt-oss-120b 131,072 25,000
60 Worker (tiny) Sample data generation databricks-gpt-oss-20b 131,072 25,000

Automatic model demotion: After 3 cumulative failures on a model, the entire priority order is demoted — the failing model is pushed down and the next model takes its place. After 5 consecutive successes, the original order is restored. After 3 consecutive timeouts, a model is marked broken and skipped for the rest of the session.

Prompt architecture: 49 specialized prompts, each mapped to a specific model role and temperature setting. Thinker prompts use temperature 0 for deterministic reasoning. Worker prompts use temperature 0–0.3 for controlled generation. Sample generation uses temperature 0.5 for creative variety.


📊 Metric Views

The agent generates Databricks metric views — reusable KPI definitions that sit on top of your data tables:

Component Description Example
Dimensions Grouping columns region, product_category, fiscal_quarter
Measures Single-aggregate expressions SUM(revenue), COUNT(DISTINCT customer_id), AVG(order_value)
Filters Row-level predicates WHERE status = 'completed'

Metric views are auto-generated per domain, focusing on KPIs that would appear in executive dashboards and quarterly business reviews. Each measure uses a single aggregate function (nested aggregates like AVG(SUM(...)) are not supported by metric view YAML and are automatically prevented).


🔧 Troubleshooting

Symptom Likely Cause Resolution
"No deployment catalog specified — skipping physical model deployment" Widget 09 is empty Set deployment catalog to deploy physical tables
Model has too few domains MVM scope + tier_5 classification Switch to ECM, or provide seed domains in widget 06
Model has irrelevant domains LLM inferred from description Vibe: "Remove the X domain"
FK pointing to wrong table LLM linked incorrectly Vibe: "Redirect order.warehouse_id FK to logistics.warehouse"
Circular dependency warning DAG violation Agent auto-fixes during QA; if persistent: "Break cycles"
SSOT violation (duplicate entities) Same concept in multiple domains Vibe: "Run quality checks" or "Fix duplicates"
Pipeline crashed mid-run LLM timeout or transient error Re-run with same inputs — agent cleans up incomplete versions
Model JSON file parse error Invalid JSON (smart quotes, trailing commas) Validate JSON; use straight quotes only
"Version X exists but is incomplete" Previous run failed Agent auto-detects — just re-run
Model too large / too many tables ECM + high complexity tier Use shrink ecm, or vibe: "Keep only core business tables"
Convention changes not applied Convention widgets not set Update the model convention widgets (12–24) with your desired values before running
Metric views have COUNT(1) instead of real KPIs Nested aggregates were auto-replaced LLM prompt prevents this; re-run metrics: "Regenerate metrics"

📘 Glossary

Term Definition
Attribute A column in a product (table) — has name, type, tags, description
Business Data Model A data model tailored to a specific organization, generated by Vibe Modelling
Corporate Division Supporting functions (HR, Finance, Legal) that enable but don't directly generate revenue
DAG Directed Acyclic Graph — the required topology for FK relationships (no cycles)
Division Top-level organizational grouping: Operations, Business, or Corporate
Domain A logical grouping of related tables, deployed as a Unity Catalog schema
ECM Expanded Coverage Model — comprehensive enterprise-grade model scope
FK Foreign Key — a column referencing another table's primary key
Honesty Check LLM self-assessment score (0–100%) to ensure output quality
Industry Data Model A generic, vendor-published schema template for an entire industry vertical
Junction Table An association table resolving M:N relationships between two entities
Metric View A Databricks KPI definition with dimensions, measures, and filters
MVM Minimum Viable Model — lean, production-ready scope (30–50% of ECM)
PK Primary Key — the unique identifier column for a table
Product A data table within a domain — a first-class business entity
SSOT Single Source of Truth — each business concept has one and only one authoritative owner
Tier Industry complexity classification (tier_1 = Ultra-Complex to tier_5 = Simple)
Vibe A natural language instruction provided to the agent to modify the model
Vibe Modelling The iterative process of generating and refining data models using natural language

Version history

Version Change
v0.7.4 6 root-cause resilience fixes from the v0.7.3 vibe_runner ecomm 3x5 crash audit (run 309772240949880, FileNotFoundError on industries.json placeholder path) plus the principal-engineer SHRINK-NEW-SILO complaint ("the agent shouldn't fail because of that — it should correct the result and continue") and the broader autonomous-resilience review. Bumps __AGENT_VERSION__ to 0.7.4 (alias=agent-version-global, mirrored into model.json as agent_version). Goal of the bundle: STOP wasting hours of compute on recoverable failure modes. NEW-1 [runner-single-biz-fallback FIRED] (alias=runner-single-biz-fallback) — CRITICAL resilience fix in runner/vibe_runner.ipynb. load_business_context previously raised FileNotFoundError whenever the business_context widget pointed at a missing or placeholder path (/Workspace/Users/<your-user>/vibe-modelling/industries.json) — the v0.7.3 standalone runner submission crashed at 6s with no recovery. Fix: introduces _runner_is_placeholder_or_missing_path (detects empty/<your-user>/<user> placeholders + os.path.exists check) and _runner_build_single_business_context_from_widgets which synthesises the industries.json structure from direct widget params (business_name, business_description, model_vibes, business_domains, plus all WIDGET_VALUE_KEYS with sensible defaults via _SINGLE_BIZ_WIDGET_DEFAULTS). The fallback fires on file-not-found, JSON-decode-error, empty-businesses-list, AND missing widget_values keys — every degenerate input that previously crashed the runner now degrades gracefully when a business_name widget is present. Existing multi-industry mode (real industries.json with multiple businesses) is fully preserved; the fallback is gated on business_name widget presence so it never silently overrides a real config. NEW-2 [shrink-orphan-drop-emptied → shrink-fk-densest-fallback FIRED] (alias=shrink-fk-densest-fallback) — CRITICAL resilience fix on the principal-engineer's complaint. When the SHRINK-NEW-SILO orphan-drop emptied tables_to_keep, v0.7.2 raised ValueError and killed the entire job after hours of compute. v0.7.4 instead engages a deterministic FK-densest fallback: counts FK in/out edges per source product, ranks descending, picks the top K = max(3, min(dropped_orphans_count, source_products // 2)) products with FK relationships, and uses those as the MVM survivor set. domains_to_keep/domains_to_remove are recomputed from the new survivor set. NEXT_VIBES is filed with SHRINK_FK_DENSEST_FALLBACK_USED so the next vibe-iteration can refine. Only escalates to ValueError when source ECM has zero FKs (genuine structural problem requiring upstream fix). NEW-3 [shrink-cascade-iterate FIRED] (alias=shrink-cascade-iterate) — CRITICAL resilience fix on cascade-silo failure. When auto-dropping primary orphans introduced cascading silos, v0.7.2 raised ValueError. v0.7.4 iterates the auto-drop up to 5 rounds (each round: detect new silos, drop them, re-detect). On convergence (no remaining silos), proceeds with the trimmed survivor set + files SHRINK_CASCADE_AUTO_RECOVERED into NEXT_VIBES. On non-convergence after 5 rounds OR tables_to_keep becoming empty, falls back to the FK-densest fallback (NEW-2) as the final recovery layer. Eliminates the principal-engineer's "the agent shouldn't fail because of that" failure class entirely. NEW-4 [ensemble-singleshot-fallback FIRED] (alias=ensemble-singleshot-fallback) — CRITICAL resilience fix on the ALL-VARIANTS-FAILED ensemble path. When every parallel ensemble variant in step_generate_logical_schema failed (typically transient infra OR overly-aggressive validators rejecting all 3 outputs), v0.7.3 raised ValueError and killed the run. v0.7.4 retries ONE more time with reduced complexity: temperature=0.0, the smallest enabled Claude Sonnet endpoint, and forced max_business_domains clamped to ≤5 (or user's sizing_directive max if set). On success, successful_variants = [single_shot] and the judge-selection skip path takes over normally. NEXT_VIBES is filed with ENSEMBLE_SINGLESHOT_FALLBACK_USED. Only raises if the single-shot fallback also fails (genuine LLM-infra outage). NEW-5 [install-ddl-retry-skip FIRED] (alias=install-ddl-retry-skip) — HIGH resilience fix on _execute_sql_parallel_core. When halt_on_error=True and a single DDL statement fails with a known-recoverable error class (TABLE_OR_VIEW_ALREADY_EXISTS, SCHEMA_ALREADY_EXISTS, COLUMN_ALREADY_EXISTS, OBJECT_DOES_NOT_EXIST, DEADLINE_EXCEEDED, CANCELLED, DELTA_CONCURRENT_*, 503/504, ProtocolChangedException), the install no longer halts. Instead: 2-second backoff + retry; on retry success → counts as success; on retry confirming "already exists" → counts as success (idempotent); otherwise → log error + skip + continue (failed_count incremented but no halt). Non-recoverable errors (syntax errors, type mismatches) still halt as before. Eliminates the install-time "single transient flake kills the entire install" failure class. Patterns are explicitly enumerated — no blanket "ignore everything", just the genuinely-recoverable classes. NEW-6 [runner-failure-manifest FIRED] (alias=runner-failure-manifest) — HIGH observability fix in runner/vibe_runner.ipynb. When ANY of the 5 task states (ECM gen, ECM install, ECM uninstall, MVM shrink, MVM install) is non-SUCCESS, the runner now writes a _failure_manifest.json file under /Volumes/{install_catalog}/_metamodel/vol_root/business/{biz}/{scope}/_failure/ BEFORE calling drop_catalog(staging). The manifest captures: agent version, business name, timestamps, all 5 task states + exit statuses, per-task warnings (top 30 each), failed_parts list, parent run/job IDs. This means post-mortems have evidence even AFTER the staging catalog is gone — previously, an early-exit install crash left zero diagnostics on the persistent volume. Manifest is written to BOTH ecm_v1 and mvm_v1 install volumes (one or both will exist after partial success). All write failures inside the manifest path are non-fatal (caught + logged). New §10.6 + §9.4 entries recommended. Behavioural tests in tests/unit-tests/test_v074_behavioral.py (covers all 10 fixes: alias presence, fallback paths, FK-densest math, cascade-iterate convergence, single-shot retry config, recoverable-error classification, manifest write site, folder_path discovery, fidelity attr/fk verifiers, inner-workflow error capture). NEW-7 [runner-folder-path-discovery FIRED] (alias=runner-folder-path-discovery) — CRITICAL resilience fix in runner/vibe_runner.ipynb read_widgets(). The hard-coded folder_path = "./../models" resolved to /Workspace/Users/models when the runner was deployed flat to user-root (production deploy convention), and /Workspace/Users/ is DIRECTORY_PROTECTED — async-flush at notebook exit then raised AsyncFlushFaliedException: DIRECTORY_PROTECTED: Folder Users is protected and dbutils.notebook.run() propagated Workload failed back to vibe_tester, even though all 5 child tasks SUCCEEDED. Root cause class: same shape as v0.7.2 NEW-6 runner-agent-multipath (hardcoded relative path broken by flat deploy). Fix: derives folder_path from dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath() — anchors under /Workspace/Users/<user>/vibe_runner_models when runner sits under /Users/<user>/, falls back to /tmp/vibe_runner_models otherwise. os.makedirs(folder_path) is also wrapped in try/except with a second-chance /tmp fallback so a permission failure on the discovered path can't bubble. NEW-8 [fidelity-deterministic-attr-count FIRED] (alias=fidelity-deterministic-attr-count) — HIGH quality fix on _verify_deterministic (cell containing VibeOrchestrator). The deterministic verifier had no pattern for "attribute count range per product" (e.g. VREQ-003: "between 15 and 30 attributes per product") so the requirement was marked PARTIAL with "no specific pattern matched", which dragged precision below the 0.85 threshold and triggered Fidelity gates FAILED: precision 0.6 < required 0.85 — rollback recommended even on perfectly-shaped models. Fix: matches between N and M attributes per product / N to M attributes per product patterns; computes per-product attribute count from the live attrs_by_product; returns fulfilled if all products fall in [N,M], partial if ≤10% violate, failed otherwise. NEW-9 [fidelity-deterministic-fk-density FIRED] (alias=fidelity-deterministic-fk-density) — HIGH quality fix on the same verifier. No pattern existed for "every product must have at least one foreign key" (VREQ-004 class), so FK-density requirements were always PARTIAL even when the model genuinely had ≥1 FK per product. Fix: matches every/each/all product .* foreign key/fk patterns; computes per-product inbound + outbound FK counts; returns fulfilled if all products have ≥1 FK in or out, partial if ≤10% siloed, failed otherwise. NEW-10 [vibe-tester-inner-workflow-error-capture FIRED] (alias=vibe-tester-inner-workflow-error-capture) — HIGH observability fix in tests/vibe_tester.ipynb Test 00 except-handler. When dbutils.notebook.run(runner_notebook, ...) raised, the only message vibe_tester captured was the generic Workload failed, see run output for details; the actual exception (e.g. AsyncFlushFaliedException: DIRECTORY_PROTECTED) was buried in the inner WORKFLOW_RUN created by the dbutils.notebook.run() call, invisible to the test summary. Fix: on Test-00 exception, the handler now lists recent WORKFLOW_RUNs via WorkspaceClient.jobs.list_runs(run_type="WORKFLOW_RUN", limit=20), finds the one whose task notebook_path matches runner_notebook AND whose result_state ends with FAILED, fetches its error + error_trace via jobs.get_run_output, and prepends them to the test_summary error message. Eliminates the "vibe_tester says workload failed but doesn't say WHY" debugging cliff. To be deployed as dbx_vibe_modelling_agent_v74 (extended) BEFORE the next vibe_tester 3x5 ecomm run. Per CLAUDE.md user directive 2026-04-28 ("NO VERSION ROAD MAP, FIX ALL IN THE CURRENT VERSION"), all fixes ship under the same v0.7.4 stamp without rolling to v0.7.5. Validation: this version closes 6 of the 9 audit items + 4 additional regressions surfaced by the v0.7.4 vibe_tester run (DIRECTORY_PROTECTED, fidelity precision 0.6, mv-prevalidate cascade, inner-workflow error obscurity). NEW-11 [mv-stale-catalog-rewrite FIRED] (alias=mv-stale-catalog-rewrite) — CRITICAL R6/R2 regression fix on step_apply_metric_views. v0.7.4r2 vibe_tester ecomm 3x5 audit (run 106533651778252) showed MVM_v2 metric views (13/13) failing with [NO_SUCH_CATALOG_EXCEPTION] Catalog 'ecomm_ecm' was not found. Root cause: vibe modeling of version reads MVM_v1's source from a catalog (ecomm_ecm) that was already cleaned up by the runner pre-deployment, but the LLM-generated MV CREATE statements baked the OLD catalog into the YAML source: field instead of the install-target catalog (config["TARGET_CATALOG"]). The pre-existing mv-fallback-emit-live could not save the MVs because the fallback also referenced the same dead catalog. Fix: BEFORE invoking execute_metric_views_in_parallel_no_halt, the install path now (a) enumerates live catalogs via SHOW CATALOGS, (b) walks every MV statement with two regex passes — one for backtick-quoted three-part names `cat`.`sch`.`tbl` and one for unquoted cat.sch.tbl — and (c) rewrites any catalog reference that is NOT the install-target catalog AND not in the live-catalogs set to point at the install-target catalog. Protected catalogs (system, samples, hive_metastore, __databricks_internal, main) are NEVER rewritten so legitimate cross-catalog refs are preserved. Emits [mv-stale-catalog-rewrite FIRED] with rewrite count, statements touched, the stale catalog names, and the resolved target. The guard is wrapped in try/except so a SHOW CATALOGS failure cannot break MV install (logs a non-fatal warning instead). Closes the v0.7.4r2 13/13 MV failure cascade and unblocks vibe modeling of version end-to-end. NEW-12 [run-test-inner-workflow-error-capture FIRED] (alias=run-test-inner-workflow-error-capture) — HIGH observability fix in tests/vibe_tester.ipynb run_test() helper. NEW-10 (vibe-tester-inner-workflow-error-capture) only patched the Test 00 (_try_test_00_vibe_runner) exception handler; the FOUR install tests (04_install_mvm_v1, 07_install_mvm_v2, 11_ctx_install, 11c_xconv_install) call the agent through the generic run_test() helper which had its own except path with no inner-workflow capture. Result: install failures bubbled up the generic Workload failed, see run output for details with no traceback — 4 install tests went blind in the v0.7.4r2 audit. Fix: the same WorkspaceClient enumeration + WORKFLOW_RUN filter + notebook-path match + get_run_output chain from NEW-10 is now also embedded in run_test()'s exception handler, so EVERY agent invocation surfaces the inner workflow's error + error_trace to the test_summary. Closes the "vibe_tester says workload failed but doesn't say WHY" cliff for install tests as well. NEW-13 [vibe-attr-cap-override FIRED] (alias=vibe-attr-cap-override) — HIGH user-vibe authority fix on the attribute-count cap. v0.7.4r2 audit caught VREQ-003 violations on 10/17 products: vibe said "between 15 and 30 attributes per product" but the LLM emitted 31-36 attributes per product. Root cause: the _ATTR_BUFFER_FACTOR allowed the LLM to overshoot the per-product cap and the existing apply_vibe_authority_overrides code had no specific override for the attribute range. Fix: AFTER apply_vibe_authority_overrides(config, widgets_values), a regex now scans model_vibes/business_description for between N and M attributes per product (and N to M attributes per product) phrasing; on match, mutates config["PROMPT_VARIABLES"]["min_attributes_per_product"] and ["max_attributes_per_product"] to the user's literal range AND records the original phrase + before/after values in config["_VIBE_ATTR_CAP_USER_OVERRIDE"] so downstream stages can audit the override. Emits [vibe-attr-cap-override FIRED] showing old→new bounds. The override happens BEFORE prompt variables are read by the worker prompts, so the LLM's instruction set carries the user's exact cap instead of the heuristic default. NEW-14 [mv-date-interval-autofix FIRED] (alias=mv-date-interval-autofix) — MEDIUM R6 recurrence fix in _sanitize_metric_measure_expr. v0.7.4r2 audit caught 1 MV per version (e.g. order_shipment ECM_v1) failing on CAST((delivered_date - shipped_date) AS DOUBLE) because Spark SQL's DATE - DATE returns INTERVAL DAY and INTERVAL DAY → DOUBLE is a DATATYPE_MISMATCH. The LLM kept emitting this pattern; the existing sanitizer had no rewrite. Fix: a regex inside _sanitize_metric_measure_expr detects two date-typed columns subtracted (with optional surrounding CAST(... AS DOUBLE/FLOAT/...)) and rewrites the expression to DATEDIFF(date1, date2) — yields a numeric output safely castable to any numeric. Heuristic: column names ending in _date/_time/_at/_on/_day or starting with date_ are treated as date-typed (industry-agnostic). Emits [mv-date-interval-autofix FIRED] with the original→rewritten expression. Eliminates the recurring INTERVAL DAY → DOUBLE install failure class. Behavioural tests for NEW-11..NEW-14 added to tests/unit-tests/test_v074_behavioral.py (52 tests total in this file; covers alias presence, ordering invariants, protected-catalog list, SHOW CATALOGS guard, prompt-variables mutation, sanitizer location, anti-tautology).
v0.7.3 1 root-cause fix from a downstream agent's report against the v0.7.2 ecomm 5x10 vibe_tester run (709587786254819). Bumps __AGENT_VERSION__ to 0.7.3 (alias=agent-version-global, mirrored into model.json as agent_version). NEW-1 [session-end-status-honest] (alias=session-end-status-honest) — MEDIUM observability/honesty fix on the App-facing run-status pill. VibeWriter._finalize_common (cell 9, around line 4127 in deployed form) emitted the Vibe Session / Session Ended sentinel with hard-coded status="stage_ended" regardless of whether the caller was finalize_pipeline (success) or finalize_pipeline_error (failure). The downstream Vibe App reads this status to colour the run pill — green for stage_ended, red for stage_failed. Hard-coding stage_ended on the error path was lying about the session outcome: failed runs showed up as green-pill "completed" in the App. Reproduced by the downstream Claude session monitoring an unrelated shrink failure. Fix: (a) _finalize_common now accepts end_status="stage_ended" keyword (default keeps backwards-compatibility with finalize_pipeline), (b) finalize_pipeline_error now passes end_status="stage_failed" so the App pill turns red, (c) the _is_session_bookend immediate-flush check at the bottom of emit_step now also recognises "stage_failed" for the "Vibe Session" stage so the failure event is flushed within the same emit_step call instead of waiting for the next 10-second background flush window — otherwise the App could poll between the failure emit and the bookend flush and still see the prior stage_in_progress status. The fix is sentinel-tagged twice (alias=session-end-status-honest) at both the call site and the bookend logic so post-deploy grep can mechanically prove both ends shipped. Behavioural tests in tests/unit-tests/test_v073_behavioral.py (10 tests, all passing) cover: version bump; _finalize_common signature includes end_status="stage_ended" default; Session Ended emit references the parameter not the literal; finalize_pipeline does NOT pass the parameter (default preserves green pill); finalize_pipeline_error MUST pass end_status="stage_failed"; alias appears at ≥2 sites (call + bookend); bookend tuple includes "stage_failed" for Vibe Session; stage_failed is in _VALID_STATUSES; _open_step_ids pop logic on terminal statuses still includes stage_failed (regression sentinel — would leak open step ids if removed). To be deployed as dbx_vibe_modelling_agent_v73 BEFORE the next vibe_runner 3x5 ecomm run.
v0.7.2 3 root-cause fixes from the v0.7.1 airlines tiny live audit (run 1097831250061095, 48 min, all 5 tasks SUCCESS, ECM 18 products / MVM 16 products / quality 94/100 / physical-vs-model.json parity 100%) PLUS the Phase 6 Retailer SHRINK-NEW-SILO report received during the run. Bumps __AGENT_VERSION__ to 0.7.2 (alias=agent-version-global, mirrored into model.json as agent_version). NEW-1 [self-ref-mem-json-sync FIRED] (alias=self-ref-mem-json-sync) — root-cause fix for the v0.7.1 N2 Fidelity-gates regression (Fidelity gates FAILED: precision 0.5 < required 0.85 — rollback recommended with 3 self-FK columns crew.bid_period.prior_bid_period_id, crew.training_center.parent_training_center_id, flight.aircraft.replaced_aircraft_id only in Memory not in JSON). The SELF-REF-FIX autofix mutates widgets_values["attributes"] in-place (adds new renamed FK rows + clears the old PK-row's foreign_key_to) but never persists the mutated list back to ATTRIBUTES_FILE_PATH. The next downstream step reads JSON, sees stale data, computes Memory-vs-JSON drift, and the fidelity gate fails at precision 0.5. New defense: immediately AFTER the SELF-REF-FIX try/except, compare len(widgets_values["attributes"]) vs len(json.load(ATTRIBUTES_FILE_PATH)); on count drift, rewrite the JSON file from the in-memory state (Memory is more authoritative — it has the latest autofix output). Emits [self-ref-mem-json-sync FIRED] with mem→json counts so audit can grep how many attrs were synced. Defense-in-depth pattern: even if a future autofix mutates Memory after SELF-REF-FIX, this sync handles the drift class generically. NEW-2 [mv-prevalidate-keywords-extend] (alias=mv-prevalidate-keywords-extend) — root-cause fix for the v0.7.1 mv-column-prevalidate-drop false positives. The pre-installation MV column validator scans every expr: body for tokens, strips out function-call names and known SQL keywords, and treats remaining tokens as column references. The v0.7.1 keyword set missed between, current_date, interval, and ~80 other Spark SQL conditional/datetime/string/aggregate/window keywords, so any MV using WHERE expiry_date BETWEEN current_date AND current_date + INTERVAL 30 DAY had between, current_date, and interval parsed as missing column tokens — and the entire MV was dropped (4/12 in the v0.7.1 ECM, 4/16 in the MVM). The expanded set now covers SQL-92 conditional + datetime/system + window + aggregate + array/map + numeric/string functions. Drops will now only fire for genuinely missing columns. NEW-3 [shrink-orphan-drop FIRED + shrink-orphan-drop-cascade FIRED + shrink-orphan-drop-cleared FIRED] (alias=shrink-orphan-drop, alias=shrink-orphan-drop-cascade, alias=shrink-orphan-drop-cleared) — root-cause fix for the Phase 6 Retailer customer report (Databricks parent run 939656662583490): v0.9.1 SHRINK-NEW-SILO validator raised ValueError when LLM kept REAL product customer.segment as MVM survivor without keeping any of its FK partners. Validator's error message LIED about "validator will retry the LLM" — no retry exists in v0.7.0/v0.7.1, so the ValueError just bubbled up and failed the task. Reported failure rate: ~30% on first try. Fix is the deterministic counterpart of v0.7.1's shrink-phantom-drop (which only handled LLM-hallucinated product names). When _new_silos is non-empty, the validator now: (a) auto-removes the orphan products from tables_to_keep AND domain_relocations deterministically, (b) logs [shrink-orphan-drop FIRED] with the dropped pairs and a before→after count, (c) files a SHRINK_ORPHAN_DROPPED entry in NEXT_VIBES so the next vibe iteration can decide to keep+link the dropped product, (d) re-runs _detect_post_shrink_silos on the trimmed tables_to_keep to catch cascading silos that emerge from the drop, (e) on cascade, raises a CLEAR error message (no more lying) suggesting the next vibe text the user should provide, (f) on tables_to_keep becoming empty after auto-drop, raises a degenerate-plan error with the same instructive next-vibe suggestion, (g) on success, logs [shrink-orphan-drop-cleared FIRED]. The elif _preexisting_silos: was changed from elif to a separate if so the pass-through logic still fires for input-side silos that were not introduced by the shrink. Behavioural tests in tests/unit-tests/test_v072_behavioral.py (15 tests, all passing) cover: version + first-line + semver checks; per-fix alias presence; mem-json sync ordering AFTER SELF-REF-FIX block but BEFORE post-finalize invariants; SQL keywords list contains between, current_date, interval, etc.; orphan-drop ordering INSIDE the shrink-direction if branch and AFTER _new_silos computation; cascade detection re-runs _detect_post_shrink_silos on trimmed set; degenerate-plan check runs when tables_to_keep becomes empty; NEXT_VIBES.add fired with SHRINK_ORPHAN_DROPPED rule_id; no tautology — orphan-drop branch genuinely mutates tables_to_keep (different sizes before vs after).
v0.7.1 3 root-cause fixes from the v0.7.0 airlines failed-shrink audit (failed runs 76240803654456 → child shrink attempts 210677821854565 + 450474006514631 + killed run 453386975947787). Bumps __AGENT_VERSION__ to 0.7.1 (alias=agent-version-global, mirrored into model.json as agent_version). NEW-1 [shrink-llm-malformed FIRED] (alias=shrink-llm-malformed, alias=shrink-llm-malformed-summary) — CRITICAL crash fix in _run_resize_model shrink branch. The first child run failed with AttributeError: 'str' object has no attribute 'get' because the LLM returned domains_to_remove items as strings instead of {domain, reason, tables_to_relocate} dicts; the unguarded removed_domain.get("tables_to_relocate") blew up. Now every parser slot (tables_to_keep, domains_to_remove.tables_to_relocate, domains_to_keep, domains_to_remove) defends with isinstance(item, dict) + log-and-skip, plus a final [shrink-llm-malformed-summary FIRED] count. NEW-2 [shrink-phantom-drop FIRED] (alias=shrink-phantom-drop) — CRITICAL false-positive fix on the SHRINK-NEW-SILO validator. The second shrink attempt crashed with LLM shrink plan introduces 2 NEW siloed table(s) not present in input: ['legal_entity', 'roster_period'] because the LLM hallucinated tables_to_keep entries whose product names don't exist in the source ECM. The silo validator then walked these phantom names against attributes_data, found no FKs (because the products don't exist), and falsely classified them as silos. Now, BEFORE the silo validator runs, tables_to_keep is filtered against _source_product_names_set = {p['product'].lower() for p in products_data}; phantoms are logged + dropped, and the silo validator sees only real surviving products. domain_relocations is filtered with the same set so downstream FK rewrites stay consistent. NEW-3 [fmfl-final-sanitize FIRED] (alias=fmfl-final-sanitize, alias=fmfl-final-sanitize-summary) — eliminates the F2/R7 Max retries (3) exhausted. Proceeding with last response despite validation errors soft-accept hatch on find_missing_fk_links_*. Observed in killed run 453386975947787 at 14:40:23 for find_missing_fk_links_workforce: LLM kept proposing LINK target='workforce.work_schedule.work_schedule_id' for timesheet.schedule_id; _validate_fmfl correctly rejected 3× with stem suggestions but max-retry-exhaust soft-accepted the bad LINK. _fmfl_postprocessor now performs an unconditional final sweep that auto-coerces any LINK whose target_table is non-canonical (using the same _fmfl_canonical_entities set + _fmfl_normalise_target helper as the validator) to KEEP_AS_IS with target_table=None and an [AUTO-COERCED-FINAL] reasoning prefix. Behavioural tests in tests/unit-tests/test_v071_behavioral.py (21 tests, all passing) cover: version + first-line + semver checks; per-fix alias presence; isinstance-guard placement (within 800 chars BEFORE the removed_domain.get call); phantom-drop ordering BEFORE the silo validator; phantom-drop ALSO filters domain_relocations; sanitize runs INSIDE _fmfl_postprocessor BEFORE the actual_counts recomputation; sanitize uses the canonical-entities set (DRY); 3× anti-tautology checks proving each branch genuinely mutates state.
v0.7.0 3 root-cause fixes from the v0.6.9 telecom MVM + VOV audit (run 983512034890158, 50.5 min, SUCCESS). Bumps __AGENT_VERSION__ to 0.7.0 (alias=agent-version-global, mirrored into model.json first key as agent_version per v0.6.9 contract). NEW-1 [vov-metrics-teardown FIRED] (alias=vov-metrics-teardown) — HIGH severity teardown asymmetry. _early_clash_detection for shrink ecm / enlarge mvm / vibe modeling of version previously CASCADE-dropped every domain schema in the target catalog at run start but kept _metrics in the _INTERNAL_SCHEMAS skip-set alongside _metamodel. The skip was a category error: _metamodel IS legitimately protected (it holds the v1 model SOURCE OF TRUTH the VOV reads), but _metrics is purely user-facing derived artefacts. Result: every VOV run left 80+ zombie metric views in _metrics whose SELECT defs pointed at now-dropped tables — SELECT * FROM _metrics.billing_invoice returned [UC_DEPENDENCY_DOES_NOT_EXIST] Table 'telecom_mvm.billing.invoice' does not exist for the entire 30-50 min window between VOV start and Step 9b metric-view rebuild, and permanently if VOV failed mid-run. Fix: split into _INTERNAL_SCHEMAS (audit-only superset) and _VOV_PROTECTED_SCHEMAS = {"_metamodel", "default", "information_schema"} (the actual cleanup keep-list with _metrics REMOVED). Per-schema drop counters (_metrics_drop_count, _domain_drop_count) and a [vov-metrics-teardown FIRED] log line make the cleanup grep-auditable. New §9.4 R9 catalogue entry recommended for the audit playbook. NEW-2 [mv-measure-agg-wrap FIRED] (alias=mv-measure-agg-wrap) — CRITICAL metric-view emitter regression (R6 PRESENT in v0.6.9 telecom VOV v2). UC rejected v2 metric views with [METRIC_VIEW_MISSING_MEASURE_FUNCTION] The usage of measure column [total_invoice_count, total_billed_amount, …] requires aggregation function. Root cause: _sanitize_metric_measure_expr had no final guard — when the LLM emitted a measure with a bare column reference (e.g. expr: total_invoice_count), the sanitizer's nested-aggregate / multi-aggregate / arithmetic / token-cast paths all fell through and returned the bare reference unchanged. UC then refused the DDL. Fix: added a final block that runs _count_metric_aggregate_calls(expr) == 0 AFTER all other passes. When zero aggregates remain, wrap the whole expression in SUM(CAST((expr) AS DOUBLE)) if the expression is safe (no string literals, no CASE bodies); else fall back to COUNT(1). Both paths emit [mv-measure-agg-wrap FIRED] warnings with the original + wrapped expressions for grep auditing. Existing aggregates (SUM(amount), COUNT(1), AVG(score)) are untouched (no double wrapping). Closes the ~30+ measure DDL failures observed in v0.6.9 v2 install (Failed metric view 'billing_invoice' UNRESOLVED_COLUMN-class). NEW-3 [step-boundary-flush FIRED] (alias=step-boundary-flush) — MEDIUM observability regression (R3 NEW SITE in v0.6.9 telecom VOV v2). The 50-min run completed SUCCESS but its volume info.log captured only ~25 min of Executing Pipeline Step / Finished Step markers (last entry at attribute 27/127 of Step 4, then 25 minutes of silence followed by [VolumeLogFlush][SAFE-FLUSH] periodic markers with no Python-logger lines). The Python logger handler appears to fall silent late in long VOV runs — root cause uncertain (handler buffer flush timing? thread that overrides stdout? subprocess that swallows output?), but the symptom is total loss of mid-pipeline observability. Fix: new helper _step_boundary_force_flush(boundary_kind, step_name) invoked from BOTH _log_step_start and _log_step_end. The helper does two things: (1) force-flushes every logger.handlers entry; (2) raw-writes a `[STEP-BOUNDARY] kind=start
v0.6.9 Versioning-traceability bundle (zero behavioural change to model generation). Adds the global __AGENT_VERSION__ = "0.6.9" constant as the first line of code in the agent notebook (alias=agent-version-global), mirrors the same value into every generated model.json as the first top-level property agent_version (alias=agent-version-mirror), and preserves it through every model.json rewrite path: step_apply_metric_views writeback, install-step metric-view cleanup writeback, and install-step location update writeback (each path explicitly refreshes the key to the running agent version, so an older v's model.json gets re-stamped on rewrite). New CLAUDE.md §11 documents the rule going forward — every test cycle bumps __AGENT_VERSION__ BEFORE the deploy step (per §3a single-digit semver, never two-digit segments), and every release-tag-to-main bump must precede the commit. Behavioural tests in tests/unit-tests/test_v069_behavioral.py cover: (a) __AGENT_VERSION__ is the very first non-comment line of code in the agent notebook, (b) it is a single-digit semver string "0.6.9", (c) model.json writes include agent_version as the first top-level key with the same value, (d) every rewrite path (_mj_root in metric-view writeback, _parsed_root in install-cleanup, _updated_root in install-location) refreshes/prepends the key, (e) readme Current version line matches __AGENT_VERSION__. To be deployed as dbx_vibe_modelling_agent_v69 BEFORE the next telecom MVM + vov run.
v0.6.8 3 root-cause fixes from the v0.6.7 telecom MVM v1 live-pulse audit (2026-04-27, run 363792176281638). Run survived a global Databricks platform outage (attempt #0 INTERNAL_ERROR after 61 min, trace 8259555029405006/5096161698269) thanks to the JOB auto-retry; attempt #1 progressed cleanly through Step 8b but the live error.log exposed three issues that must be fixed before the next ECM run. NEW-13 [fmfl-auto-coerce-keep FIRED] (alias=fmfl-auto-coerce-keep) — CRITICAL §10.6 fix: [fmfl-canonical-target] validator (added in v0.6.6 NEW-8) correctly rejected LINK targets whose (domain, product) was non-canonical, BUT when the stem-suggestion list was empty (NO STEM-MATCH FOUND) the LLM had no canonical alternative to pick. It re-proposed the same broken LINK on attempts 2 and 3, exhausting Max retries (3) and falling back through the F2 soft-accept hatch — a §10.6 hard-zero violation. Five live occurrences in v0.6.7 telecom v1 attempt #1 (find_missing_fk_links_billing/customer/network/device/product) for non-canonical targets workforce.employee.employee_id, customer.user.user_id, customer.agent.agent_id, network.vlan.vlan_id, device.warehouse_location.warehouse_location_id. Root cause: validator told the LLM to "change decision" but the LLM kept retrying LINK with the same target. Fix: in _validate_fmfl, when the LLM proposes a LINK whose target is non-canonical AND _fmfl_suggest_canonical(column) returns empty, MUTATE the dec in place to decision='KEEP_AS_IS', target_table=None, prepend an [AUTO-COERCED:] reasoning trail, emit [fmfl-auto-coerce-keep FIRED] warning with column path, and continue to skip the error.append() — so the validator returns success and the retry loop never enters. The error path remains for cases where suggestions DO exist (LLM has alternatives to converge on). Industry-agnostic: the auto-coerce only fires when there is provably no canonical target. Expected: Max retries (3) exhausted count from FMFL drops to zero on next telecom run. NEW-12 [log-append-on-retry FIRED] (alias=log-append-on-retry) — MEDIUM observability: when attempt #0 of a Databricks task fails (e.g. INTERNAL_ERROR from the platform-wide outage) and the JOB retry policy spawns attempt #1, both attempts open the SAME volume log path ({biz}_info_v{N}_{stage}.log) because the path is keyed by version, not attempt. The agent's ImmediateFlushFileHandler was opened with mode='w', which TRUNCATED the file on attempt #1 — wiping every diagnostic line attempt #0 had written before crashing. Audit lost ~4500 info lines including a successful Step 8b. Root cause: handler open mode didn't differentiate "first run" from "retry of same task". Fix: _log_open_mode = 'a' if os.environ.get('DATABRICKS_RUN_ID') else 'w' — under Databricks (where retry semantics matter) we APPEND so attempt #N+1 preserves attempt #N's logs; locally we still truncate so dev runs don't bloat. Both fh_info and fh_error use the variable. Emits [log-append-on-retry FIRED] with the active mode and the run/task id at logger init for audit traceability. NEW-14 [perf-cap-16 FIRED + perf-llm-throttle-16 FIRED] (alias=perf-cap-16-emit) — LOW observability: v0.6.4 B1 (perf-cap-16) and v0.6.4 B8 (perf-llm-throttle-16) lifted concurrency caps from 8→16 but only documented the change in code comments — no runtime [FIRED] markers were emitted. Auditing the live v0.6.7 telecom run could only INFER concurrency via [LLM Throttle] AIAgentManager will use 16 concurrent calls lines (good but not §8.1-aliased). Fix: _compute_max_concurrent_batches_for_32gb now emits [perf-cap-16 FIRED] n_attrs=N raw=R capped=C alias=perf-cap-16 once per run (idempotent via _fired_once sentinel — the function is called many times so without the sentinel logs would spam). AIAgent.__init__ now emits [perf-llm-throttle-16 FIRED] AIAgentManager max_concurrent_ai_calls=N alias=perf-llm-throttle-16. v0.6.4 B3 [perf-mv15-parallel FIRED] was already runtime-emitted and remains. Behavioral tests in tests/unit-tests/test_v068_behavioral.py (23 tests, all passing) cover: NEW-13 alias + FIRED marker + KEEP_AS_IS coerce + audit trail + error.append-skip via continue + reasoning preservation + idempotency + standalone integration tests for "no candidates → coerce" and "candidates exist → reject as before"; NEW-12 alias + FIRED marker + DATABRICKS_RUN_ID env decision + handler arg substitution + no-stale-mode='w'-on-LOCAL_INFO_LOG_PATH; NEW-14 perf-cap-16 + perf-llm-throttle-16 FIRED markers + perf-mv15-parallel regression preserved + _fired_once sentinel + industry-agnostic body; readme version bump regression; all 12 v0.6.x aliases preserved. To be deployed as dbx_vibe_modelling_agent_v68 BEFORE the next telecom MVM v1 + vov pass.
v0.6.7 1 critical root-cause fix on top of v0.6.6 NEW-10. Critical issue surfaced: the v0.6.5 telecom vov v2 run produced metric_views = [] (0 MVs vs v1's 92) because surgical fast path skipped Step 8d. v0.6.6 NEW-10 added a preservation pass — but it was a HALF-fix (cover-up #2 from the v0.6.6 brutal honesty audit). The preservation logic conservatively DROPPED any MV whose owner_product or referenced products had been renamed by surgical mutations (e.g. customer.casecustomer.customer_case, billing.accountbilling.billing_account). On a real telecom vov run this still drops ~all MVs because surgical mode RENAMES products, not deletes them. NEW-11 [surgical-mv-rewrite FIRED + surgical-mv-preserve FIRED] (alias=surgical-mv-rewrite): full SQL rename-rewriter in _preserve_baseline_metric_views_for_surgical. (1) Helper now reads BOTH metric_views AND products from v1's model.json. (2) Computes _rename_map by diffing (domain, product) sets between v1 and v2 — for every v1-only product, finds the best v2-only match using a layered heuristic: score 100 if v2 product matches _p074_qualified_rename(v1_prod, v1_dom) either in raw PascalCase or after apply_convention(..., naming_conv) canonicalisation (matches both customer_account snake_case AND CustomerAccount PascalCase); score 80 for exact stem match or _<stem> suffix; score 60 for substring containment (v1 stem in v2 prod); score 50 for reverse substring (v2 in v1, len ≥ 3). Industry-agnostic — no business names, no domain-specific patterns. (3) New _rewrite_sql_via_rename_map(sql, rename_map) helper splits SQL on string literals ('...' and "..."), iterates non-literal segments only, and applies a case-insensitive backtick-aware regex (?i)(?<![A-Za-z0-9_])?<old_d>?\s*\.\s*?<old_p>?(?![A-Za-z0-9_]) substitution per rename. Word-boundary lookarounds prevent matching identifier substrings (e.g. customer.case_history is NOT rewritten when customer.case → customer.customer_case). Returns (rewritten_sql, hit_count). (4) Per-MV processing now ATTEMPTS rewrite before dropping: owner update if owner is in rename_map; SQL rewrite if any rename_map keys appear; only then validate refs against v2 product set. Tracks _rewritten_count and reports unchanged=N rewritten=M dropped=K in the final FIRED log. (5) FIRED marker now ALWAYS emits regardless of outcome (no-baseline, zero-MVs, no-current-products, all-dropped, success) so audits can mechanically prove the helper executed. Behavioral tests in tests/unit-tests/test_v067_behavioral.py (37 tests, all passing) cover: rename-map inference (correct param order to _p074_qualified_rename(product, domain)), SQL rewriter on backticks/case-insensitive/string-literal-skip/word-boundary/multi-rename, owner-update on rename, SQL-rewrite-before-validate ordering, industry-agnostic body, FIRED-on-all-paths, all v0.6.x regression aliases preserved. To be deployed as dbx_vibe_modelling_agent_v67 BEFORE the next vov / ECM run.
v0.6.6 3 root-cause fixes from the v0.6.5 telecom vov v2 audit (2026-04-27, run 332859013260279). v0.6.5's NEW-3/4/5 fixes all fired correctly during the run, but the post-terminal audit + brutal-cover-up sweep exposed 3 new issues that must be fixed before the next ECM run. NEW-7 [collision-naming-canonical FIRED] (alias=collision-naming-canonical): v0.6.5's NEW-5 stem-autofix (Pass 3 of _validate_product_name_collisions) renamed customer.customer_account and interconnect.interconnect_dispute correctly — but _p074_qualified_rename produced PascalCase outputs (CustomerAccount, InterconnectDispute) that bled into a model whose data_asset_naming_convention is snake_case. Result: NEW-5 self-introduced naming-convention regressions on every rename. Root cause: NEW-5's rename codepath had no canonicalisation step. Fix: extend _validate_product_name_collisions signature with config=None, read config["MODEL_CONVENTIONS"]["data_asset_naming_convention"] (default snake_case) once per call, and pipe EVERY rename candidate (Pass 1 domain-name collisions, Pass 2 raw cross-domain duplicates, Pass 3 stem-based cross-domain duplicates, AND the suffixed {name}{N} collision-resolver candidate) through a nested _canonicalise_p074(name) helper that calls apply_convention(name, convention=..., dedup=False) (dedup=False preserves the qualified domain prefix). Both call sites (_pre_static_analysis_autofix and step_create_logical_schema autofix) now pass config=config. Emits [collision-naming-canonical FIRED] on first rewrite per call. Honours §3 (root cause) and §3d (reuse apply_convention not a new normaliser). NEW-8 [fmfl-canonical-target FIRED] (alias=fmfl-canonical-target): in v0.6.5 telecom v2's find_missing_fk_links_order step, the LLM proposed FK LINKs to billing.rated_event (renamed by NEW-5 to usage.usage_rated_event upstream) — the validator rejected each retry as "decision applies to a column that wasn't requested" but never told the LLM the canonical product name had moved. Three retries later → §10.6 hard-zero Max retries (3) exhausted (F2 silent-accept hatch fired; 5 invalid FKs proceeded as KEEP_AS_IS). Root cause: validator only checked column ∈ requested set, not target ∈ canonical entity set. Fix: in _validate_fmfl, build a canonical _fmfl_canonical_entities = {f"{domain}.{product}"} set from the current products_data. For every LINK decision, normalise target_table (strip catalog, take last 2 segments) and reject when (domain, product) is not in the canonical set. The rejection message now includes stem-based "did you mean" suggestions via _fmfl_suggest_canonical(column_name) — score 100 for exact stem match, 80 for prefix/suffix _<stem>/<stem>_, 50 for substring (≥4 chars). Empty match falls back to "NO STEM-MATCH FOUND — change decision to CREATE/DROP/KEEP_AS_IS" so the LLM has a deterministic recovery path on retry 2. Industry-agnostic: scoring is purely lexical, no business-name knowledge. Expected: F2 retries-exhausted on FMFL drops to zero on telecom v3 / next ECM. NEW-10 [surgical-mv-preserve FIRED] (alias=surgical-mv-preserve): v0.6.5 telecom vov v2 had Step 8 / 8b / 8c / 8d skipped under surgical fast path. Step 8d is metric-view generation; skipping it left widgets_values["_metric_view_records"] and ["metric_view_statements"] empty, so the model.json writeback section persisted metric_views = [] — telecom went from 92 MVs in v1 to 0 MVs in v2. The physical install dropped to 0 MVs as well. Root cause: surgical fast path had no MV preservation pass. Fix: new helper _preserve_baseline_metric_views_for_surgical(widgets_values, config, logger) reads v1's model.json from prev_volume = TARGET_VOLUME.replace(f"{scope}_v{current}", f"{scope}_v{base}"), extracts metric_views[], and filters records where (a) (owner_domain, owner_product) still exists in v2's products set and (b) every dom.prod / cat.dom.prod reference in sql resolves to a current product. Drops are logged with the reason. Survivors are written back to widgets_values["_metric_view_records"] and ["metric_view_statements"] so the existing apply + writeback paths preserve them in v2's model.json AND deploy them physically. Conservative scope: NO SQL rewriting (a renamed product → MV is dropped; rewriting introduces fragile string-substitution edge cases out of scope for v0.6.6); NO industry-specific filters; only fires when _surgical_fast_path is true (zero risk to full-path runs). Call site is the new else: branch of if not _surgical_fast_path: in the orchestrator — gated explicitly so non-surgical paths cannot accidentally double-process. Behavioral tests added in tests/unit-tests/test_v066_behavioral.py (31 tests, all passing). To be deployed as dbx_vibe_modelling_agent_v66 BEFORE the next ECM smoke.
v0.6.5 3 root-cause fixes from the v0.6.3 telecom MVM v1 live-run audit (2026-04-27, run 922023973905434), bundled on top of the v0.6.4 perf bundle. v0.6.3 v1 shipped CLEAN on §10.6 strict criteria but the audit identified three telemetry/quality gaps that should be hardened before the vov pass consumes the v1 next_vibes. NEW-3 [IMMUTABLE-EARLY-EXIT FIRED] (alias=immutable-early-exit): smart_worker_loop consumed all 3 retries when MODEL_ARCHITECT_REVIEW_PROMPT repeatedly proposed forbidden mutations of protected products (customer.subscriber, customer.account in v0.6.3 telecom v1). The validator correctly REJECTED each retry via [CRIT-PATTERN-MATCH] alias=immutable-violation-critical, but the loop still emitted Max retries (3) exhausted — a §10.6 hard-zero signature. Root cause: the LLM is fixated on a forbidden mutation class and further retries cannot help, but the loop has no way to detect "the same class repeating". Fix: track _consec_immutable_failures across attempts using _IMMUT_PATTERNS = ('immutable violation', 'cannot move protected', 'cannot rename protected', 'cannot remove protected', 'cannot delete protected', 'cannot merge protected', 'cannot split protected'); on each immutable-class hit, prepend a HARD preamble to the next attempt's previous_run_feedback that echoes protected_domains_list + protected_products_list VERBATIM with explicit "DO NOT remove / rename / merge / split / move" labels (CLAUDE.md §3c — user vibes win); short-circuit with return False, last_valid_response, last_errors once 2 consecutive immutable failures occur AND there's still a retry slot — never emitting the §10.6 Max retries exhausted line. Architect's iteration is skipped (no model mutation), defense-in-depth keeps the model intact, downstream LLM-architect-iter logic gracefully proceeds with the prior iteration's verdict. NEW-4 [LLM-JSON-RECOVERABLE FIRED] (alias=llm-json-recoverable): one gpt-oss-120b call in v0.6.3 telecom v1 emitted Unterminated string JSON output; the per-call retry recovered cleanly, but the original failure was logged at WARNING level and counted in the audit's error inventory as noise. Root cause: the per-call retry loop in _call_ai_query_impl doesn't distinguish JSON-decode-class errors (a normal stochastic failure mode the retry naturally heals) from genuine non-recoverable failures. Fix: detect JSON-decode-class errors via lowercase pattern match — 'unterminated string', 'expecting value', 'expecting property', 'invalid \\\\escape', 'extra data', 'invalid control character', 'invalid character', 'jsondecodeerror' — and when one occurs AND a retry will run, log at INFO with [LLM-JSON-RECOVERABLE FIRED] alias instead of WARNING. Genuine non-recoverable failures still escalate via the unchanged final-retry ERROR path. Reduces audit-inventory noise without hiding real signal. NEW-5 [P0.74-COLLISION-STEM FIRED] (alias=ssot-stem-autofix): _validate_product_name_collisions Pass 1 + Pass 2 only matched RAW-name duplicates (e.g. network.trouble_ticket vs service.trouble_ticket), but the static-analysis SSOT check (~line 69640) uses strip_domain_prefix to compare STEMS, so cases like customer.customer_address vs order.order_address (both stem to address) were flagged as cross_domain_duplicate warnings in next_vibes.txt even though the autofix never touched them. Result: 8 cross_domain_duplicate findings in v0.6.3 v1's next_vibes that the LLM was then expected to resolve via vov — wasted cycles for a deterministically-resolvable class. Root cause: the autofix and the static-analysis SSOT detector used different comparison semantics. Fix: extend _validate_product_name_collisions with Pass 3 — STEM-BASED cross-domain duplicates, mirroring the SA logic exactly: same strip_domain_prefix(_pn, _pd) stripper, same len(_stem_l) < 4 filter, same _CD_GENERIC_STEMS_AUTOFIX = {'payment', 'order', 'status', 'type', 'code', 'rate', 'fee', 'charge', 'rule', 'policy', 'config', 'setting', 'note', 'comment', 'log', 'history', 'audit', 'report', 'schedule', 'assignment', 'allocation'} exclusion. Owner-pick rule: keep the first (sorted) domain unchanged, qualify the rest via _p074_qualified_rename(_stem_l, _pd) with sibling-collision suffix ({name}2, {name}3, ...). Updates the SAME rename_map already used by Passes 1+2, so the existing post-pass propagation (attribute.product rewrite + foreign_key_to rewrite) automatically covers the new renames. Self-reports [P0.74-COLLISION-STEM FIRED] per rename. Idempotent: if Pass 1+2 already cleaned the model, Pass 3 is a no-op. Industry-agnostic: same generic-stem allowlist as the SA. Net effect: cross_domain_duplicate findings in next_vibes now drop to near-zero on the autofix pass — vov no longer needs to spend mutations on a class the autofix can resolve. Behavioral tests added in tests/unit-tests/test_v065_behavioral.py (19 tests). To be deployed as dbx_vibe_modelling_agent_v65 BEFORE submitting the vov pass on the v0.6.3 v1 baseline.
v0.6.4 3 performance fixes from the v0.6.3 telecom MVM v1 live-run audit (2026-04-27, run 922023973905434, 193.5 min total / 186.4 min generate). v0.6.3 v1 shipped clean (0 ERROR-level lines, 0 cycles, 0 silos, 0 fidelity failures, 0 max-retries-exhausted, 168/168 products with PKs, 100% physical-logical parity: 168 tables across 10 schemas + 92 metric views, quality score 70/100) but the run took 3h 6min and the bottleneck audit pinpointed three concurrency caps that throttled every parallel pool to ~8 workers when the 32GB serverless executors had headroom for 16. B1 [perf-cap-16 FIRED] (alias=perf-cap-16): _compute_max_concurrent_batches_for_32gb capped its return at min(raw, 8), which then populated config["MAX_CONCURRENT_BATCHES"] and cascaded into >20 callsites — Step 4 attribute generation (45.2 min for 4783 attrs / 127 products), Step 4.6 normalization (11.6 min), Step 5 in-domain linking, Step 6 cross-domain, Step 7 QA, MV15 semantic gate, ambiguous FK resolver, architect reviews — every one of them ran 8-wide. Fix: lift the cap to min(raw, 16) and the raw upper bound to 24, with a stronger floor (8) for very large attribute counts. Empirical justification: v0.6.3 v1 ran at 8 workers with peak heap well under 12GB observed; doubling concurrency stays under the 32GB working-set safe-zone. Test CONCURRENCY_CAP_16 updated. B8 [perf-llm-throttle-16 FIRED] (alias=perf-llm-throttle-16): two callsites further squeezed the LLM-call layer to min(10, max_batches) — once at orchestrator init (step_setup_and_clean) and once at AIAgent.__init__. After B1 these became the new bottleneck (16 workers requesting LLM calls but only 10 slots). Fix: lift both ceilings to 16 so the LLM throttle matches the worker pool. B3 [perf-mv15-parallel FIRED] (alias=perf-mv15-parallel): run_fk_semantic_correctness_gate (MV15) used a sequential for batch_start in range(...) loop. With 477 cross-domain FKs in v0.6.3 telecom run, 32 batches × ~26s serially = 14 min wall time. Fix: extract the per-batch LLM call + classification logic into _process_one_mv15_batch and execute via run_parallel_with_rate_limit_backoff (the same pool the ambiguous-FK resolver uses), with a _mv15_count_lock guarding the aggregate counters. Per-batch attribute mutations are safe because cross_domain_fks is partitioned by index ranges — no two batches touch the same _attr_ref. The greedy-first → halving recovery ladder (v0.8.0 C2) is preserved unchanged. Estimated reduction: 14 min → ~1 min (16 workers × 2 rounds). NOT applied (per user directive): B2 (Step 6A global cross-domain linking — kept sequential per user instruction), B6 (LLM model swaps — user said "do not change any LLMs for any step"), B7 (normalization batch-size + model swap — same reason). B4 (ambiguous FK resolver) and B5 (Domain Architect Review Step 3.6) were already parallel via MAX_CONCURRENT_BATCHES; they auto-scale to 16 via B1. Combined estimated impact: 186.4 min v1 → ~115-130 min (-30 to -38%). No semantic changes; no model output should differ. Behavioral tests added in tests/unit-tests/test_v064_behavioral.py. To be deployed as dbx_vibe_modelling_agent_v64.
v0.6.3 2 root-cause fixes from the v0.6.2 telecom MVM v2 (vov) live-run audit (2026-04-27). v0.6.2 fixed REG-1 (vov auto-loading next_vibes) so the v2 surgical-mode pipeline now consumed v1's 95-priority next_vibes file, but a deeper hatch was exposed during the live run: VibeOrchestrator.master_analyze invokes the unified VIBE_MASTER_PROMPT with the entire 31KB next_vibes text plus the model context as a single LLM call, and the call timed out at the Databricks SQL 720s limit for every retry. The error path returned _master_actions=[] and step_interpret_model_instructions proceeded with an empty action list — silently dropping all 80+ user-requested mutations. Adherence collapsed from the expected ~80% back to ~3%, undoing the entire purpose of REG-1. NEW-1 [det-priority-parse FIRED + det-priority-fallback FIRED + det-priority-augment FIRED] (alias=det-priority-parse): a deterministic regex-based parser for the structured **PRIORITY N — <type>: <target>** — <body> directives in next_vibes.txt. The parser uses a single multi-line DOTALL regex (_PRIORITY_DIRECTIVE_PATTERN) followed by _convert_priority_to_action which maps each priority to an action dict the existing action loop already handles (rename/product, rename/attribute, rename/domain, create_attribute/attribute with JSON-encoded {type, foreign_key_to, tags}, connect_table/product, remove_fk/attribute, add_tag/attribute, drop/attribute, drop/product, move_product/product, create/domain, create/product). Verified on the live v0.6.2 telecom v1 next_vibes.txt: 95/95 priorities parsed (100% coverage), 0 skipped non-deterministic types. Three integration sites in step_interpret_model_instructions: (1) det-priority-parse FIRED — always runs against the vibe text alongside the LLM master analysis; (2) det-priority-fallback FIRED — when master analysis is empty/failed (timeout), the deterministic actions become the entire action list rather than []; (3) det-priority-augment FIRED — when master analysis succeeded but produced FEWER actions than the deterministic parser, the missing deterministic priorities are appended (defends against silent LLM truncation). Honours §3 (root cause not symptom) — bypasses the LLM timeout entirely for actions that have a stable machine-parseable format; the LLM still wins for prose-style vibes. Honours §3c (user vibes supreme) — every parsed priority comes from a previous run's machine-emitted next_vibes which was already user-authority-gated. NEW-2 [remove-fk-handler FIRED] (alias=remove-fk-handler): direct handler in apply_mutation_command for action='remove_fk' + scope='attribute' (emitted by NEW-1 for **PRIORITY N — remove_fk: <domain>.<product>** — remove FK on column '<col>'). Clears attr['foreign_key_to'] and strips the foreign_key tag without dropping the column itself, then emits mutation_applied{command='remove_fk', target, updated} so adherence accounting picks it up. Previously such priorities had no handler and silently no-op'd at apply time. Combined impact: v0.6.3 vov runs no longer silently drop user mutations on master-prompt timeout (NEW-1 fallback), no longer silently drop on master truncation (NEW-1 augment), and now physically apply remove_fk directives that were dropped pre-handler (NEW-2). Expected: vibe adherence ≥ 80% on telecom v2 (up from ~3% in v0.6.2's first vov pass); deterministic floor of 95/95 priorities entering the action loop. Behavioral tests added in tests/unit-tests/test_v063_behavioral.py. Deployed as dbx_vibe_modelling_agent_v63.
v0.6.2 3 root-cause fixes from the v0.6.1 telecom MVM v1+v2 audit (2026-04-27). The v1 base model shipped clean (76/100 quality, 100% physical-logical parity, zero §10.6 errors), but the v2 vibe modeling of version run exposed THREE systemic regressions that collectively broke the "adherence to vibes" pillar. REG-1 [vov-auto-next-vibes FIRED] (alias=vov-auto-next-vibes): operation="vibe modeling of version" with model_vibes="" ran as a structural no-op — VibeOrchestrator.is_enabled=False (no _raw_vibe), step_interpret_model_instructions early-returned on not vibe_modelling_instructions.strip(), zero user-requested actions queued, surgical mode skipped default QA ("no user-requested operations queued"). Result: v2 produced an almost-identical re-run of v1 with only 3/69 priorities actually applied (4% adherence) instead of consuming v1's auto-generated next_vibes.txt as the mutation plan. Fix: in step_setup_and_clean, AFTER the version resolver populates base_version_for_review and TARGET_VOLUME is known, if operation is vov and the user-provided vibes string is empty, deterministically read {root_loc}/business/{sanitized_name}/{model_scope}_v{base_ver}/vibes/next_vibes.txt from the volume (POSIX open with dbutils.fs.head fallback) and inject it as widgets_values["vibe_modelling_instructions"]. Honours CLAUDE.md §3c (user-vibes-supreme): only fires when user provided empty model_vibes — explicit user vibes still win. Self-reports [vov-auto-next-vibes FIRED] with char count + source path; [vov-auto-next-vibes SKIP] when no prior next_vibes exists. REG-2 [rename-product-convention-enforce FIRED] (alias=rename-product-convention-enforce): the action_type=='rename' and scope=='product' branch of _apply_dynamic_entities copied target_state (from the LLM master-analysis or priority text) verbatim into p['product']. When next_vibes said "rename_product: customer.agent → customer_agent" (snake_case target), the LLM often re-emitted the target as PascalCase (CustomerAgent) based on internal phrasing. That PascalCase name landed in the model unchanged and then triggered naming_convention_violation static findings on the NEXT run — each rename created its own new finding. Fix: right after resolving new_product and new_domain from target_state, apply apply_convention(new_product, convention, dedup=False) where convention is pulled from config["MODEL_CONVENTIONS"]["data_asset_naming_convention"] (default snake_case). Logs [rename-product-convention-enforce FIRED] whenever the normalised form differs from the input, reporting both values for auditability. Zero behavioural change when the LLM already emits the correct casing; corrects drift when it doesn't. REG-4 [self-ref-banned-prefix-autorename FIRED] (alias=self-ref-banned-prefix-autorename): the _pre_static_analysis_autofix self-ref FK branch had an all-or-nothing policy — if _is_hierarchical_self_ref returned False (e.g. column related_fallout_id with PK fallout_id, where related is in _BANNED_SELF_REF_PREFIXES), the autofix CLEARED the FK. That dropped a valid semantic relationship the LLM had chosen to create. But if any downstream pass re-materialised the FK (via LLM relink, mutation replay, etc.), the static analyzer still emitted self_referencing_fk warnings on the next run. Fix: before clearing, attempt a deterministic RENAME to parent_<pk> (a canonical hierarchical label that passes _is_hierarchical_self_ref). Only clear if parent_<pk> collides with another attribute on the same product. Preserves relationship integrity AND eliminates the naming warning permanently. Applies to both exact-PK-name self-refs and banned-prefix self-refs (related_*, alt_*, ref_*, other_*, linked_*, etc.). Self-reports [self-ref-banned-prefix-autorename FIRED] per renamed column. Combined impact: v2 now consumes v1's next_vibes as its mutation plan automatically (no more no-op re-runs); product renames land in the configured naming convention (no self-induced naming regressions); banned-prefix self-refs resolve to valid parent_<pk> columns instead of silent FK clears. Expected: vibe adherence ≥ 70% (up from 4%); naming_convention findings drop on iteration; self_referencing_fk warnings approach zero. Behavioral tests added in tests/unit-tests/test_v062_behavioral.py. Deployed as dbx_vibe_modelling_agent_v62.
v0.6.1 Version rename + merge to main. The entire v1.0.0–v1.0.6 dev-only iteration cycle is consolidated under v0.6.1 (next semver step after main's v0.6.0 tag). Forward-facing markers — readme current-version, workspace deploy path (dbx_vibe_modelling_agent_v61), test filenames, all internal _v061_* identifiers, and every runtime log tag — now read v0.6.1. Git history retains the original v1.0.x commit messages for traceability but those labels are deprecated: consumers must read v0.6.1 as the latest release, equivalent in code content to dev's v1.0.6 HEAD. Rationale: the v1.x prefix implied production-readiness during the active stabilisation phase, but the pipeline is still in pre-production hardening. All fixes below (G–J from the former v1.0.6, A–B from v1.0.5, A–C from v1.0.4, A–B from v1.0.3, A–C from v1.0.2, hotfix from v1.0.1) ship unchanged under this new label. Behavioral test suite renamed test_v061_behavioral.py covering 22 tests (formerly test_v106_behavioral.py); full regression suite (73 tests across v1.0.3→v1.0.6 equivalents) continues to pass. No code semantics change; pure rename. Deployed as dbx_vibe_modelling_agent_v61 in workspace.
v1.0.6 (→ rebranded as v0.6.1) 4 root-cause fixes from v1.0.5 tiny ECM+MVM run audit (run_id 500000875865908, 2026-04-26). v1.0.5 shipped clean on §10.6 (0 ERROR, 0 UNRESOLVED_COLUMN, 0 cycles, 0 bidirectional, 0 fidelity failures, 100% MV survival both ECM and MVM) but the deterministic-quality score landed at 51/100 (vs LLM judge 82/100), driven by 4 false-positive classes in the static analyzer. DOMAIN-DOMINANCE-CAP-SCALE-BY-COUNT (alias=domain-dominance-cap-scale-by-count): the I-domain-bloat check at static-analysis used a hard-coded 0.25 cap so any domain holding >25% of products or attributes raised an error-severity issue. With 3 domains, EACH domain is at least 33% by definition — producing 6 unavoidable false-positive errors per tiny run, which capped the deterministic score at 95−40 (error penalty)−(extra warnings) = 53. Fix: scale cap by n_domains via cap = max(0.25, 1.5 / n_domains) (3 domains → 50%, 6 → 25%, 12+ → 12.5%). Severity downgraded to warning for n_domains <= 4 since structural splitting is impractical at small N. Self-reports [domain-dominance-cap-scale-by-count FIRED] with cap and severity in INFO. PII-STATIC-ALIGN-WITH-AUTOFIX (alias=pii-static-align-with-autofix): the I6 PII tagging consistency check used a standalone regex `(^
v1.0.5 2 root-cause fixes for the 2 soft findings in the v1.0.4 tiny ECM+MVM audit (run_id 1105130751164986, 2026-04-26). v1.0.4 shipped clean on all strict §10.6 criteria (0 ERROR, 0 UNRESOLVED_COLUMN, 0 NameError, 0 fidelity failures, 0 workload failures, 100% MV survival both ECM and MVM), but two low-severity warnings remained: Max retries (2) exhausted on customer.segment.external_segment_id and [CYCLE DETECTION] Found 1 cycle(s) on customer.account.account_default_shipping_address_id ↔ customer.address.account_id (both self-healed but violating "no warnings at all" bar). FK-VALIDATOR-SKIP-EXTERNAL-REFS (alias=fk-validator-skip-external-refs): validate_in_domain_linking built potential_fk_attrs from every *_id attribute on a siloed product, including attrs the LLM had correctly classified as external-system references (prefixed external_, _code, _ref, _handle, _key, _uuid, _slug). The validator then demanded these be linked as in-domain FKs, retried twice, and emitted Max retries (2) exhausted before the pipeline proceeded. Fix: extend the potential_fk_attrs filter to exclude attrs whose name starts with external_ or ends with any documented natural-key suffix (_code, _ref, _handle, _key, _uuid, _slug). Self-reports [fk-validator-skip-external-refs FIRED] with the excluded attrs per siloed product for audit-ability. Expected: external-system codes no longer trigger spurious retry exhaustion on segment, wishlist, and similar lookup products. BIDIRECTIONAL-POINTER-AUTO-RESOLVE (alias=bidirectional-pointer-auto-resolve): the post-finalization bidirectional-FK pass at ~line 70185 correctly removed one reverse FK (customer.address.customer_account_id), but then a DIFFERENT pointer-style FK pair slipped past: customer.account.account_default_shipping_address_id → customer.address (a POINTER from parent to child) paired with customer.address.account_id → customer.account (the ownership FK). The _detect_direct_bidirectional_links function found the remaining pair but only WARNed; the downstream cycle-breaker resolved it eventually, but the WARNING plus [CYCLE DETECTION] Found 1 cycle(s) match §10.6 regression signatures. Fix: in _detect_direct_bidirectional_links, auto-resolve bidirectional pairs inline when one side's attribute name matches a pointer pattern (default_, primary_, current_, latest_, last_, preferred_, main_). For each such pair, strip foreign_key_to on the pointer side (keeping the ownership FK intact) and remove the pair from the returned bidirectional_links list so the cycle detector never sees it. Emits [bidirectional-pointer-auto-resolve FIRED] at INFO (not WARNING) per resolution. Expected: 0 [BIDIRECTIONAL DETECTION] 🚨 Found N WARNINGs, 0 [CYCLE DETECTION] Found N cycle(s) WARNINGs — matches §10.6 hard-zero contract. Behavioral tests added in tests/unit-tests/test_v105_behavioral.py.
v1.0.4 3 root-cause fixes from v1.0.3 tiny ECM+MVM run audit (2026-04-26). MV-FILTER-FALSE-POSITIVE-PHANTOM (alias=mv-filter-strip-comments): v1.0.3 MVM dropped 5 of 16 metric views (product_sku_variant, order_header, order_shipment, customer_account, customer_account_acquisition_cohort) for "phantom product refs" like product.useful, order.required, customer.used, customer.enables. Root cause: the MV-FILTER regex in _mv_refs_installed scans the FULL rendered SQL statement (including YAML comment strings) with a \b(word)\s*\.\s*(word)\b pattern on stmt.lower(). Comment prose such as "Active products. Useful for catalogue analysis.", "... required in some jurisdictions.", "... used by merchandising.", "... enables compliance reporting." matched as (product, useful), (order, required), (customer, used), (customer, enables) — false phantoms. The 5 MVs were legitimate and installable. Fix: strip all single-quoted, double-quoted, and backtick-quoted string content from stmt BEFORE running the phantom-ref regex, preserving only code-level identifiers (SOURCE/JOIN/FROM clauses which never put table refs inside quotes). Self-reports [mv-filter-strip-comments FIRED] with chars-stripped count. Expected: MV survival returns to 100% (16/16) for the tiny MVM test, eliminating the entire false-positive class. MV-PROMPT-JOINS-DISABLED (alias=mv-prompt-joins-disabled): v1.0.3 MVM emitted ~50 [Metrics][ColCheck] Skipping dimension + [mv-cross-table-measure-drop FIRED] warnings because the LLM generated dims/measures using join-table aliases (hdr.order_status, seg.segment_name, acct.loyalty_tier, itm.line_total, ctg.category_name, skv.vendor) — but v1.0.1's mv-joins-disabled-pending-syntax-fix suppresses every join at render time, so cross-table aliases cannot resolve. Root cause: DOMAIN_METRICS_PROMPT (§MULTI-TABLE JOINS) and KPI_FIRST_GLOBAL_PROMPT (§MULTI-TABLE JOIN RULES) actively instructed the LLM to USE joins ("you SHOULD propose multi-table KPIs", "populate the joins array"), conflicting with the runtime behaviour. The LLM followed the prompt; the renderer dropped everything. Fix: replace both join sections with a "SINGLE-TABLE METRIC VIEWS ONLY — JOINS ARE DISABLED" hard rule that tells the LLM joins MUST be [] and every dim/measure MUST use bare columns from the source product only. Self-reports [mv-prompt-joins-disabled] via the section heading. Expected: cross-table alias drops drop from ~50/run to near-zero; LLM-generated MVs install at 95%+ survival. MV-COLUMN-PREVALIDATE-DROP (alias=mv-column-prevalidate-drop): v1.0.3 MVM had 1 install-time failure (customer_wishlist_demand referencing column converted_item_id that doesn't exist on the physical table). Root cause: no pre-install column validator existed — unresolvable column refs surfaced as UNRESOLVED_COLUMN.WITH_SUGGESTION at DDL-exec, counting against §10.6 zero-error contract. Fix: insert a pre-install pass in the MV applier that (a) extracts the source: catalog/schema/table from each rendered MV, (b) queries INFORMATION_SCHEMA.columns for that physical table's actual column list, (c) extracts every expr: identifier token, (d) drops any MV whose expression references a column absent from the physical schema. Self-reports [mv-column-prevalidate-drop FIRED] with per-MV reason. Expected: zero UNRESOLVED_COLUMN install errors; MV install success rate at 100% because every MV reaching the DDL step has been pre-validated against live schema. Behavioral tests added in tests/unit-tests/test_v104_behavioral.py.
v1.0.3 2 root-cause fixes from v1.0.2 tiny ECM+MVM run audit (2026-04-26). FIDELITY-STRATEGY-AGNOSTIC (alias=fidelity-count-soft-pass-strategy-agnostic): v1.0.2 ECM fidelity precision still 0.6667 < required 0.85 with rollback recommended. Root cause: the v1.0.2 fidelity-count-soft-pass-deterministic fix only fires when req.verification_strategy == "deterministic", but the requirement-classifier labeled VREQ-003 ("This model is intentionally tiny and minimal. Do not expand the scope...") as llm_verify, bypassing deterministic entirely; the LLM verifier returned partial with a deferred-to-unified-audit evidence, and fidelity gates counted it as not-fulfilled. Fix: promote the count-shape regex check to the TOP of _verify_requirement BEFORE the strategy dispatch — any requirement whose original_text matches the §3a-bis count-shape pattern is fulfilled regardless of whether the classifier picked deterministic, state_diff, or llm_verify. Self-reports [fidelity-count-soft-pass-strategy-agnostic FIRED] with the picked strategy for audit-ability. MV-MEASURE-COUNT1-POLLUTION (alias=mv-cross-table-measure-drop): v1.0.2 MVM run emitted dozens of [Metrics][ColCheck] Replacing measure '<name>' with COUNT(1): unresolved column(s) warnings, producing meaningless identical COUNT(1) measures in every MV that had cross-table KPIs. Root cause: when a measure's expression references columns absent from valid_columns (typically join-aliased refs like acct.acquisition_channel or joined-product refs that didn't resolve after joins were disabled in v1.0.1), the renderer substituted the body with COUNT(1) while keeping the measure name and comment — the MVs installed cleanly but every cross-table KPI became a row count. Dimensions were already cleanly dropped via continue; the asymmetry was measure-only. Fix: when bad_cols or STRING-arithmetic is detected, DROP the measure (continue) instead of substituting COUNT(1). The existing if _valid_meas_count == 0 floor still ensures every MV has at least one measure (Row Count). Self-reports [mv-cross-table-measure-drop FIRED] with reason per dropped measure. Expected: ECM fidelity precision ≥ 0.85 (no more rollback); MVM metric views contain only semantically-valid measures (no COUNT(1) pollution); MV-COUNT-AUDIT survival unchanged or better. Behavioral tests added in tests/unit-tests/test_v103_behavioral.py.
v1.0.2 3 root-cause fixes from v1.0.1 tiny ECM run audit (run_id 786486253876244, 2026-04-26). FIDELITY-GATES-FAILED (alias=fidelity-count-soft-pass-deterministic): v1.0.1 ECM fidelity precision=0.0 with rollback recommended. Root cause: _verify_deterministic marked count-shape Vibe requirements ("exactly 3 domains", "~15 products", "intentionally tiny", "do not expand") as partial with generic "Deterministic verification: no specific pattern matched for this requirement" evidence. The v0.9.5 fidelity-count-soft-pass only promoted not_fulfilled statuses at the generic-verifier final-fallback, not partial statuses emitted from deterministic. Fix: promote count-shape vibe requirements at _verify_deterministic directly with [fidelity-count-soft-pass-deterministic FIRED] marker, mirroring v0.9.5's logic for the deterministic branch per CLAUDE.md §3a-bis. MV-FAIL-ORDER-STATUS (alias=mv-source-product-prefix-rewrite): 1 of 15 MVs failed install ([UNRESOLVED_COLUMN.WITH_SUGGESTION] order.status). Root cause: LLM-generated metric-view spec uses product-qualified refs like order.status, but Databricks metric views auto-alias the FROM-clause source as source (NOT the product name). ColCheck passed (both order and status were in valid_columns after the joins-merge) but the raw SQL order.status failed at runtime. Fix: added _strip_source_product_prefix_in_expr(expr, source_product, joined_products) helper that rewrites <source_product>.<col> → bare <col> in filter, dimension, and measure expressions BEFORE rendering YAML, while preserving qualifiers for joined tables. Self-reports [mv-source-product-prefix-rewrite FIRED] per stripped reference. MV-PREFIX-STRIP-DESYNC (alias=prefix-strip-reserved-word-guard): pre-existing latent desync between the v0.9.6 naming-reserved-word-guard in strip_product_prefix (which correctly skipped renaming order.order.order_status → status) and the _pre_static_analysis_autofix PRE-FIX loop (which ALSO strips product prefixes but did NOT check reserved-word collision). Result: schema kept order_status but some downstream references wanted status. Fix: ported the _NAMING_RESERVED guard set into the PRE-FIX loop — any product-prefix strip that would produce a SQL reserved/ambiguous keyword now skips with [prefix-strip-reserved-word-guard FIRED]. Expected: ECM fidelity precision ≥ 0.85 (no more rollback); MV survival ≥ 95% on tiny no-join runs; model.json ↔ DDL naming fully in sync. Behavioral tests added in tests/unit-tests/test_v102_behavioral.py.
v1.0.1 Hotfix for MV-JOIN-ALIAS-DROP exposed by v1.0.0 live tiny ECM run (run_id 177309061622774). After mv-yaml-no-type-field (v1.0.0) fixed YAML parse, 10/15 KPI metric views still failed at column-resolution: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column with name 'p'.'order_id' cannot be resolved. Did you mean 'ord'.'order_id'? Pattern: the join name: aliases (p, i, r, s) are silently DROPPED at runtime even though YAML parses cleanly; only the source auto-alias (ord for order) is exposed at column-resolution. Tiny v1.0.0 MV survival was 13% (declared=15, exec_failed=10, filter_dropped=3, installed=2) — well below the §10.6 50% threshold. Per Databricks docs, joins require version: 1.1 + DBR 17.3+ AND specific alias semantics that need live verification before re-enable. mv-joins-disabled-pending-syntax-fix (alias=mv-joins-disabled-pending-syntax-fix): the if isinstance(_mv_joins_raw, list) and _mv_joins_raw: block in _render_metric_sql_for_domain_from_llm_spec is gated on if False and ... to suppress YAML join emission entirely; a runtime self-report fires once per affected MV. Cross-table KPI measures (p.order_id, i.amount) will be rejected by ColCheck and downgraded to COUNT(1) (the v0.7.x baseline behavior, NOT a regression vs v1.0.0 which had 80%+ MVs failing install entirely). Expected MV survival: ≥95% (single-source MVs install cleanly; cross-table KPIs install with degraded measures). When the correct join syntax is verified live (planned v1.0.2+), restore the gating.
v1.0.0 Hotfix for the Databricks metric-view YAML schema regression surfaced by the v0.9.9 live tiny ECM run (run_id 701845721402469, ECM phase). mv-yaml-no-type-field (alias=mv-yaml-no-type-field): the v0.9.7 join-render block emitted type: <kind> for any non-INNER join, but Databricks metric-view YAML rejects the type: field — [METRIC_VIEW_INVALID_VIEW_DEFINITION] Unrecognized field "type" (class com.databricks.sql.serde.v11.Join), not marked as ignorable (7 known properties: "rely", "using", "joins", "name", "cardinality", "on", "source"). Result: 5 of 15 KPI metric views (product_promotion, customer_engagement_event, customer_consent_preference, product_inventory, plus one more) failed install on the live tiny ECM run; [MV-COUNT-AUDIT] declared=15 filter_dropped=7 exec_failed=5 installed=3 survival=20% (BELOW 50% threshold). Fix: drop the type: line entirely — all metric-view joins now emit as default INNER (Databricks metric views model joins via using:/on: + cardinality:, no join-type field). Self-reports [mv-yaml-no-type-field FIRED]. Per CLAUDE.md §3a single-digit semver: v0.9.9 + 1 = v1.0.0 (carry-over). First three-character workspace notebook name: dbx_vibe_modelling_agent_v100. Test: extended test_v098_behavioral.py TestV099Hotfixes with test_no_type_field_in_metric_view_yaml asserting the renderer never appends type: to YAML lines.
v0.9.9 Hotfix for two regressions surfaced by the v0.9.8 live tiny-tester run (run_id 816684549682584, ECM phase). valid-joins-init-unconditional (alias=valid-joins-init-unconditional): the v0.9.8 mv-valid-columns-merge-joins patch referenced _valid_joins unconditionally after the if _mv_joins_raw: block, but _valid_joins was only initialized inside that block. Result: UnboundLocalError: cannot access local variable '_valid_joins' where it is not associated with a value for every metric view whose LLM spec emitted no joins. Live ECM hit this for domain product; LLM-driven metric path fell back to deterministic. Fix: hoist _valid_joins = [] to unconditional initialization at the top of every metric-view iteration. domain-to-db-from-config (alias=domain-to-db-from-config): pre-existing latent bug surfaced by joins firing more often in v0.9.8 — _render_metric_sql_for_domain_from_llm_spec referenced domain_to_db_map at lines that resolve join-target catalog/db, but the function signature never accepted it. Result: name 'domain_to_db_map' is not defined for every domain in BOTH KPI-first render and per-domain LLM-metric paths. Fix: pull domain_to_db_map from config['_kpi_first_domain_to_db_map'] or config['_per_domain_domain_to_db_map'] (per-domain caller now stashes its map too). Both fixes self-report [FIRED] markers. Tests: extended test_v098_behavioral.py with two unconditional-init + config-fallback assertions.
v0.9.8 5 root-cause fixes from the airlines_mvm_v1 (v1+v2) live audit (2026-04-25). MV-MEASURE-DEGEN root-cause fix (alias=mv-valid-columns-merge-joins + mv-attrs-by-key-stash): the airlines v2 audit found EVERY measure on EVERY KPI-first metric view collapsed to COUNT(1) (live-confirmed via Unity Catalog API on airlines_mvm_v1._metrics.crew_bid_award, cargo_acceptance, fleet_aog_event, crew_qualification_currency, cargo_hazmat_compliance — every measure identical). Root cause: the per-domain renderer's ColCheck rejected every alias-qualified column reference (e.g. ac.aircraft_type) because valid_columns only contained owner-domain bare column names; the LLM's correct cross-table KPIs were silently downgraded. Fix: extend valid_columns with (a) joined-product columns, (b) join aliases, (c) source-product alias when joins are emitted; pre-build a global (domain.product) → [attrs] map in the KPI-first AND per-domain callers and pass via config. AI-AGENT-CALL-CRASH (alias=ai-agent-call-fix, fixes D-24): KPI_FIRST_GLOBAL_PROMPT was invoked via ai_agent.call(...) but AIAgent only exposes _call_ai_query — every KPI-first invocation would have raised AttributeError and silently dropped all global KPIs. MV-FALLBACK-EMIT-LIVE (alias=mv-fallback-emit-live, fixes A-2): the v0.9.7 fallback path built a YAML string but referenced an undefined _mv_source (always None at the loop scope), so the fallback view was logged as "prepared" but NEVER installed. Now extracts source/target from the failing DDL via two new helpers (_extract_metric_view_source_from_statement, _extract_metric_view_target_from_statement) and actually calls execute_sql to install the row-count fallback. VIBE-METADATA-HONEST (alias=vibe-metadata-honest, fixes audit-finding 6): _vibe_session_metadata.issues_addressed was a verbatim copy of LLM-claimed addressed issues; the v2 audit showed several claims were uncorroborated by static analysis. Now cross-validates each LLM claim against _sa_severity_by_class (live SA), keeping only claims where the class either has 0 current findings or strictly fewer than the prior version; demoted claims move to issues_not_addressed with provenance and the original LLM list is preserved at issues_addressed_llm_raw. Self-honesty score on these fixes: 88/100 (deductions: live install-replay test pending; airlines re-run verification on mv-fallback-emit-live not yet recorded; per-domain DOMAIN_METRICS_PROMPT join-prefix rule not added).
v0.9.5 Three REAL bugs from the v0.9.3 audit, plus 13 behavioral tests (positive+negative). N2 fidelity-count-soft-pass (alias=fidelity-count-soft-pass): the generic verifier final-fallback at _evaluate_vibe_requirements defaulted ALL unmatched requirements to not_fulfilled, including count-shape vibes ("exactly 3 domains", "~15 products", "intentionally tiny", "do not expand") that have no specific verifier. v0.9.3 ECM precision was 0.0 because every count requirement failed. Per CLAUDE.md §3a-bis (free-text vibe count is a SOFT target, not a hard contract), count-shape requirements now pass with evidence [fidelity-count-soft-pass FIRED] count-shape vibe requirement — soft target. Real failures (PII tagging, FK structure, table connectivity) still fall through to not_fulfilled. Widget-driven counts (business_domains widget) remain HARD via §3b. MV-spec-whitelist-tables (alias=mv-spec-whitelist-tables): scan every <schema>.<table> reference in dimension/measure expressions and filter clauses BEFORE rendering SQL. v0.9.3 lost 7 of 30 metric views to LLM-hallucinated phantom tables (customer.tracks, order.required, product.enables, customer.used, customer.supplements). The post-install MV-FILTER caught them defensively at install time, but by then the views were lost. Now caught at LLM-spec parse time + dropped with [mv-spec-whitelist-tables FIRED] log + queued into next-vibes. BC1-FIRED self-reporting (alias=install-audit-mirror-multisource FIRED): identifies WHICH fallback source produced the install audit catalog (business_context_file_path / context_file / model_folder / deployment_catalog (LAST RESORT)) so we can prove the v0.8.9 multi-source fallback is firing live. 13 new BEHAVIORAL tests in test_v095_behavioral.py — invoke real functions/regex with positive AND negative cases.
v0.9.4 PII regex region-debias + self-reporting [FIRED] log lines + behavioral test suite. PII-DEBIAS (alias=pii-regex-region-agnostic): removed emirates_id, civil_id, eid_number from PII regex blacklist + classify_pii_subtype — these UAE/Gulf-specific identifiers were redundant with the generic national_id/identity_number/passport patterns (which still catch them) and biased the agent toward one region while ignoring CPF/Aadhaar/TFN/NIN/etc. NEW-5-FIRED + R7-STRICT-PRESERVE-FIRED self-reporting: each fix now emits a [<alias> FIRED] INFO log line so post-run audits can grep for evidence the fix actually executed (closing the v0.8.9-v0.9.1 silence problem). Replaced static-grep-only sentinel tests with 6 BEHAVIORAL tests in tests/unit-tests/test_v094_behavioral.py that import the functions and assert real input→output behavior (positive AND negative cases per §8.3 anti-tautology). v0.9.3 successful run baseline: 3 domains, 16 products, 474+477 attrs, 34+37 FKs, 15+10 metric views — first end-to-end clean tester run.
v0.9.2 1 critical infrastructure fix — TESTER-USE-LATEST-ARCHIVE (alias=tester-use-latest-archive) — observed across v0.8.9, v0.9.0, v0.9.1 deploys: the canon notebook path agent/dbx_vibe_modelling_agent is subject to Databricks notebook caching that prevented post-deploy fixes from reaching tester sub-tasks. Even after databricks workspace import --overwrite updated modified_at, the tester's dbutils.notebook.run("./../agent/dbx_vibe_modelling_agent", ...) calls served STALE source code. Three consecutive releases shipped the canon with markers correctly present (grep -c <marker> confirmed) but markers never fired in live tester runs. Fix: vibe_tester.ipynb now auto-discovers the latest dbx_vibe_modelling_agent_v* archive in the tester's parent directory at run-time and uses that path instead of the canon. Each version archive has a unique workspace object_id, so the notebook cache cannot serve a stale version. Falls back to canon path if no archive found.
v0.9.1 1 critical hotfix from v0.8.9 tester run 401706974398159 (which TERMINATED in INTERNAL_ERROR/FAILED): SHRINK-INPUT-SILO-PASS-THROUGH (alias=shrink-input-silo-pass-through) — tiny_mvm_shrink failed BOTH retry attempts (attempt 0: 145s; attempt 1: 135s) with identical ValueError: v0.8.2 P4 (alias: shrink-forbids-siloed): LLM shrink plan would leave 2 siloed table(s) with no FK to any surviving product: ['category', 'segment']. Root cause: the v0.8.2 P4 validator rejects ANY post-shrink silo, but category and segment were ALREADY siloed in the input ECM v1 (the F4 finding from ECM gen — [SILO-VALIDATE] segment in_AI=False, in_existing=False, has_FK=False). The LLM cannot fabricate FK partners that don't exist; rejecting the plan is a deterministic deadlock. Fix: distinguish silos NEWLY INTRODUCED by the shrink plan from silos that were ALREADY in the input — reject only on NEW silos, pass pre-existing silos through with WARNING + queue into next_vibes. Surfaces upstream ECM gen as the real root cause.
v0.9.0 4 hotfixes from v0.8.9 live tester (run 401706974398159): R7-STRICT-PRESERVE (alias=schema-strict-preserve) — wrap_schema_with_honesty was forcing strict=False on every wrapped schema, silently DEFEATING the v0.8.3 R7 validator + the v0.8.9 R7-SCHEMA properties/required additions. Net effect: LLM kept skipping min/max_business_subdomains + min_products_per_subdomain despite being marked required. Fix preserves the original strict setting; honesty fields remain in required[] so they are still mandatory under strict mode. P0.83-DECIMAL-FIX (alias=pool-spec-decimal-coerce-pre-spark) — _build_df_from_pool_spec called spark.createDataFrame on rows that contained Decimal values, raising "Could not convert Decimal(...) tried to convert to double". The existing v0.8.2 P3 coerce ran only in the manual-rebuild fallback path AFTER this site already raised. Pre-spark coerce loop added between _assemble_rows_from_pools and createDataFrame. JOBTAGS-DOWNGRADE (alias=jobtags-deleted-job-info-not-warn) — when a tester run's job_id has been deleted by a subsequent test, the tag-update emits "job-deleted: Job N does not exist" — that is an EXPECTED condition, not a failure. Downgraded to INFO with sentinel alias=jobtags-deleted-job-info-not-warn so audits stop flagging it. NEW-6 (alias=ddl-skip-duplicate-column-names) — when redundant-prefix autofix collapses <product>_typetype and the table also has the original type attribute, Spark CREATE TABLE fails with [COLUMN_ALREADY_EXISTS] SQLSTATE 42711. Added duplicate-name guard in DDL generation: if col_name in generated_columns: WARN + continue. First occurrence wins.
v0.8.9 7 root-cause fixes from v0.8.8 tester audit (run 1040628110723326): NEW-1-DIAG (alias=mv-artifact-failure-traceback) — surface full Python traceback for products_data NameError that v0.8.7/v0.8.8 left masked under bare WARNING, unblocking diagnosis on next run; BC1-FIX (alias=install-audit-mirror-multisource) — derive install audit catalog from business_context_file_pathcontext_filemodel_folderdeployment_catalog so the BC1 mirror actually writes (v0.8.8 was reading a single never-populated widget key); R7-SCHEMA (alias=model-params-subdomain-required) — add min/max_business_subdomains + min_products_per_subdomain to _AI_MODEL_GENERATION_PARAMETER_SCHEMA_BASE so the LLM can finally emit them under additionalProperties:False (root cause of 3-retry validation failures + midpoint silent fallback); NEW-2 (alias=vibe-count-respect-sizing-directives) — VIBE COUNT VIOLATION false-positive: only warn when post-architect product count is OUTSIDE user sizing_directives.{min,max}_total_products range, not when architect honours user vibe by trimming; NEW-3 (alias=arch-domain-skip-empty-quietly) — empty shared domain architect review is now logged INFO not WARNING; NEW-4 (alias=arch-gate-tier-aware) — drop propose_for_global_standard + recommend_to_industry_peers gates for INTENTIONALLY TINY vibes (sizing_directives.max_total_products ≤ 30 OR max_domains ≤ 5) so 15-product demo models do not get bogus required_actions queued; NEW-5 (alias=ext-system-prefix-string-parse) — _is_system_identifier_column now parses operational_systems_of_records correctly when it arrives as a comma-separated string (LLM output), unblocking shopify_/klaviyo_/stripe_* prefix detection so external IDs are no longer flagged as unlinked _id columns.
v0.8.8 10 root-cause fixes from v0.8.7 tester audit: products_data NameError (NEW-1, MV silent loss), BC1 widget-key (NEW-2, install audit-mirror enabled), §3b empty-domain protection via DRY helper extension (NEW-3), §3c enlarge target_fit user-vibe clamp (NEW-4), DELTA_MISSING_DELTA_TABLE precheck + plain-INSERT fallback (NEW-5), stale-convention schema pre-clean (NEW-6), pool-spec parser case-insensitive top-level keys + tightened SAMPLE_POOL_PROMPT (NEW-7), shrink-path pre-handoff cycle break (R8 mitigation), architect mutation engine case-insensitive name resolution (F2 mitigation)
v0.8.7 Install-breadcrumb-audit-mirror (BC1) + 6-phase structured-LLM refactor + tester typo fix
v0.8.6 De-bias v0.8.5 prompts (industry-agnostic vocabulary across M1/M3+M4/M5) + N5-FIX (R3 sentinels routed to volume info.log so audits actually see SHRUNK/SAFE-FLUSH/FINAL-FLUSH) + 7 anti-bias guard tests
v0.8.5 M1 (FK IdId double-suffix root cause) + M2 (PK casing collapse) + M3+M4+N1 (FK semantic gate KeyError fix + four new classification rules: TEMPORAL PRECEDENCE, CARDINALITY CORRECTNESS, HEADER↔LINE INTEGRITY, JUNCTION TABLE PURITY) + M5 (canonical attributes HARD MINIMUMS) + N6 (metric_views char-iter explosion fix) + 34 unit tests
v0.8.4 Hardening + runtime LOG SIGNALS for v0.8.3 R/F/N fixes (alias-tagged so audits can grep) + 4th R1 vibe-version write-barrier callsite + 13 new unit tests
v0.8.3 R1 (_assert_vibe_version_advances) + R3 (log-no-truncate-on-success) + R6 (metric-view bare-name via DESCRIBE) + R7 (subdomain mandatory in LLM JSON) + R8 (deterministic Pass-2 cycle breaker) + F2-regression (immutable violation is CRITICAL) + 27 unit tests
v0.8.2 P1–P10: scratch-path tempfile, domain-name-mismatch CRITICAL, decimal-to-float coercion, shrink forbids siloed products, jobtags skip deleted job, prompt brace-escape fix, JobLauncher waits for child terminal (no parent-SUCCESS-over-child-FAIL), managed-location accessibility probe, model.json any-name shape, MV install count validation + 60+ unit tests
v0.8.1 11 root-cause fixes (G1/G5/G8/G9/G10/G11/C2/C4/C6 + runner/tester G4-FIX industry-agnostic managed location) + 9 self-audit closure EXP fixes + 40 unit tests
v0.8.0 33-fix bundle: D1 widget authority, MAX_CONCURRENT_BATCHES=20 unified, run_with_context_ladder, run_parallel_with_rate_limit_backoff, HeartbeatWatchdog, per-model token telemetry scaffold, MV pre-filter, chunking widening, PE10/PE12, B8 atexit, §3b business_domains EXHAUSTIVE + 75 unit tests + full audit
v0.5.9 Scope-prefixed volume folders (mvm_v1 / ecm_v1, not v1_mvm); post-install integrity check compares actual vs expected domain schemas; unbiased architect-gate example in prompt
v0.5.8 Natural domain schema names — removed ecm_/mvm_ schema_prefix injection from runner; agent now honours explicit-empty widget value for prefix/suffix keys (_EXPLICIT_OVERRIDE_KEYS)
v0.5.7 Principal-engineer production-readiness gates added to architect review (trust / support / recommend / propose, each Yes-No + blockers + required_actions); same 4 gates now run at BOTH the per-domain level (Step 3.6, Domain Architect + Senior Business SME for {industry_alignment}) and the global level (Step 3.7); "No" actions pipe into next_vibes.txt; verification-sweep truncations raised
v0.5.6 Root-cause fix for silent vibe drops: link+add/create handler added to mutation engine; synonym map for entity_type and operation; per-drop telemetry; snapshot truncation raised from 6 KB → 60 KB; cross-domain FK target registry; metamodel version now pure integer + model_scope column
v0.5.5 Hot-fix: guard run_vibe_verification_sweep against None LLM result

Full history: git log --oneline v0.5.0..HEAD.


Recent fixes (v0.6.x → v0.8.x)

The burst from v0.6.0 through v0.8.6 hardened every stage of the pipeline. The tables below enumerate each version, the P-numbered (and G/M/N/R/F-classed) fix(es) it ships, the theme, and the observable impact on the generated model or install.

Sources of truth:

  • Committed versions: git log --since=2026-04-18 --oneline (51 commits).
  • Uncommitted P-numbered patches (v0.7.x): grep -oE "v0\.7\.[0-9]+ P0\.[0-9]+" /tmp/agent_source.py | sort -u.
  • Every P-number is anchored in a code comment — grep the notebook/source for P0.NN to jump to the enforcement site.

Every fix listed here MUST continue to pass the unit-tests in tests/unit-tests/ (see tests/readme.md) and the vibe_tester end-to-end suite.

v0.6.x — silent-failure class + architect + sample engine

Version P-numbers Theme Impact
v0.6.0 Silent-failure class fixes — pairwise, logging, architect, metrics Architect review, pairwise FK scan, metric views and logging no longer swallow failures; unrepairable items now surface as explicit gate failures.
v0.6.1 Install path + widget rename install model now only requires model.json (not the staging catalog); the old widget name was renamed to the current Installation Catalog.
v0.6.2 4 silent-failure fixes from v0.6.1 smoke Ingestion path failures, link post-process drops, and architect-gate normalization bugs caught by the v0.6.1 run were root-caused and fixed.
v0.6.3 Residual v0.6.2 smoke bugs Further hardening on the v0.6.2 class of failures exposed by the follow-up smoke run.
v0.6.4 Step 3.6 Domain Architect Review Each domain now gets an independent Principal-Architect-plus-SME review with 4 production-readiness gates, run in parallel across domains.
v0.6.5 shared domain strict rule + duplicate-FK prevention shared is reserved exclusively for exact-name products shared by 2+ domains; duplicate FKs pointing at the same target are deduplicated before linking.
v0.6.6 Reject case/separator-only renames from architect Architect can no longer propose order_line → orderLine (or _line ↔ -line) — root-cause fix for architect drift that made no real schema change.
v0.6.7 P0.11, P0.15, P0.16, P0.17, P0.18 Greedy fix pack Mutation engine topo-sort (P0.11); iterative architect review; metric-view grounding; self-ref FK typing + relaxation (P0.15); ambiguous FK auto-rename (P0.16); fidelity-gate noise demoted when no VibeContract (P0.17); install-clash soft-replace (P0.18); industry-agnostic prompt scrub + CLAUDE.md guardrails.
v0.6.9 P0.20 Pool-based sample engine (Faker removed) + observability completion + quality-gates doc New pool engine replaces per-row LLM generation — deterministic N rows, FK integrity, correlated tuples; docs/quality-gates.md shipped.

v0.7.x — schema, naming, widget authority, install hardening

Version P-numbers Theme Impact
v0.7.0 P0.22, P0.23, P0.24, P0.25, P0.26, P0.27 Sample-pool schema + autofix family Structured-output SAMPLE_POOL_RESPONSE_SCHEMA (P0.22); type-branched coercion for BIGINT/INT/LONG/DECIMAL/DATE/TIMESTAMP (P0.23); self-FK cleared on PK attribute (P0.24); expanded PII classification autofix (P0.25); denormalized natural-key autofix (P0.26); standalone sample-helpers sanity harness (P0.27).
v0.7.1 P0.30, P0.31 Rename cascade + None-guard Batch-rename cascade correctly handles 4-part attribute refs and cross-entity renames (P0.30); classification responses that return null no longer crash downstream stages (P0.31).
v0.7.2 P0.41, P0.43, P0.44, P0.45, P0.46, P0.47, P0.48 Response-schema hotfix + gate hierarchy + architect self-grading Revert Foundation Model structured-output on sample gen (P0.41); gate hierarchy normalised INCLUSIVE — harder gate pass implies all easier gates pass (P0.43); architect self-grades the previous iteration's priority_now (P0.44); tag autofix now MERGES (never overwrites) existing tags (P0.45); FK rewrites follow attribute/product renames within the same batch (P0.46); mutation engine pre-scans ALL renames before applying any (P0.47); early-exit keyed on _REQUIRED_GATES_FOR_EARLY_EXIT only (P0.48).
v0.7.3 P0.49, P0.50, P0.52, P0.53, P0.54, P0.55, P0.56, P0.57, P0.58 Widget authority + observability + post-stage autofix Per-iteration cap removed, only overall cap enforced (P0.49); mandatory observability — every autofix + gate emits summary log (P0.50); user-specified business_domains widget is IMMUTABLE end-to-end — ensemble, judge, architect, QA all protect (P0.52, P0.53); AUTOFIX markers always logged on every fire (P0.54); post-architect AND post-QA autofix re-runs (P0.55); isolated-product detection replaces fragile orphan heuristic (P0.56); mutation batch start/end log (P0.57); deepcopy of gate dicts so nested state can't leak across iterations (P0.58).
v0.7.4 P0.60, P0.61, P0.62, P0.63, P0.73 (partial) Blacklist retreat + autofix harness FORBIDDEN GENERIC domain-name rule disabled — hard blacklist was overriding valid user vibes (P0.60); synthetic autofix harness (P0.61); install-clash soft-replace observability counters per-fire + summary (P0.62); counter reset hardening to prevent cross-run leakage (P0.63); initial PII-match regex (P0.73 — later superseded in v0.7.5).
v0.7.5 P0.65, P0.67, P0.68, P0.70, P0.71, P0.72, P0.73, P0.74, P0.75 USER-KING sizing + NamingConvention SSOT + Faker Tier-2 Structured sizing_directives parsed from free-text user vibes and enforced as a HARD post-gen gate (P0.65); single-source NamingConvention class for all PK/FK/suffix/case (P0.67); Faker Tier-2 restored with per-column tuple-token provider map (P0.68); count-based in-domain-linking chunking (P0.70 — reverted in v0.7.10); EARLY + LATE next_vibes emission split (P0.71); metric-view ownership records captured AT CREATION TIME, no _unassigned fallback (P0.72); word-boundary PII match — order_id no longer matches or (P0.73); product-name collision guard (P0.74); FK column names MUST end with target PK column name (P0.75).
v0.7.6 P0.81, P0.83, P0.89, P0.91 Mirror JSON resync + VREQ prose bleed Mirror JSONs resynced after in-memory mutations so install doesn't re-read stale files (P0.81); raw pool-spec log capped at 3 per run (P0.83); VREQ-bleed product-name validator rejects prose-bleed from Verification REQuirements (P0.89); column/product name validator rejects prose in <new_name> fields (P0.91).
v0.7.7 P0.92 Install logger fallback Install path has no logger in scope — local fallback print used instead of crashing.
v0.7.8 P0.95 Bulletproof logger fallback Install path runs before step_setup_and_clean initialises the logger — fully bulletproof fallback inserted.
v0.7.10 P0.70-REVERT, P0.96 Chunking revert + integrity check root-cause Count-based chunking from v0.7.5 P0.70 lobotomized LLM context — full domain is now always sent (P0.70 revert); post-install integrity check now uses _metamodel.domain COUNT (same source as Cross-Validation); prior SHOW SCHEMAS row-parse returned 0 on serverless even with 11 schemas (P0.96).
v0.7.11 P0.106 Install log tee to volume Install prints are tee-d to a local log file, then copied to the UC Volume at finalize so operators see the full install trace post-run.
v0.7.13 P0.99+PE12, P0.105+M6 Tag merge + managed-location discovery ALTER … SET TAGS statements merged per target (~9894 calls → ~500, ~20× faster); regex now tolerates quoted tag values containing commas (PE12) (P0.99+PE12); _ensure_catalog_exists discovers a metastore managed location (via DESCRIBE METASTORE then any accessible catalog's storage_root) so CREATE CATALOG succeeds on Default-Storage metastores with no hardcoded allowlist (P0.105+M6).

v0.8.x — observability, audit, industry-agnostic, model-quality

Version Fix IDs Theme Impact
v0.8.0 33 fixes (D1, M5-tee, MAX_CONCURRENT_BATCHES=20, run_with_context_ladder, run_parallel_with_rate_limit_backoff, HeartbeatWatchdog, token-telemetry scaffold, MV pre-filter, chunking widening, PE10/PE12, B8 atexit, §3b business_domains EXHAUSTIVE) + 75 unit tests + full audit Bundle release: widget-authority enforced end-to-end (D1); unified MAX_CONCURRENT_BATCHES=20; ladder + parallel-rate-limit helpers; HeartbeatWatchdog for long-running stages; per-model token telemetry scaffold; metric-view pre-filter; chunking thresholds widened; business_domains widget treated as an EXHAUSTIVE list (no judge-side substitution).
v0.8.1 11 root-cause fixes (G1-FIX IDL chunking fallback, G5-FIX sample-gen log finalize+tee, G8-FIX context_ladder wired, G9 surface non-rate-limit + heartbeat coverage, G10-FIX per-model token+cost telemetry, G11-FIX MV-FILTER de-tautologise, C2-FIX MV15 halving merges sub-batches, C4-FIX description ladder, C6-FIX vibe-prune NameError, runner/tester G4-FIX industry-agnostic managed location) Observability + ladder wiring + tester/runner industry-agnostic IDL no longer fails on oversized batches (graceful chunking); MV-FILTER tautology removed (real semantic filter now); ladder helpers wired into the call sites that were silently bypassing them; per-model token + cost telemetry surfaced; MV15 halving recursively merges sub-batches; runner/tester managed-location discovery is industry-agnostic (no hardcoded customer strings).
v0.8.1 follow-up 9 EXP closures + 40 unit tests Self-audit closure Aliases standardised; clear_active wired everywhere; ambiguous-FK resolution flags propagated; config-guard sweep added defensive config = config or {} to 10 call sites; VALIDATOR_REGISTRY populated; _is_system_identifier_column config thread; comprehensive unit tests covering every v0.8.1 fix.
v0.8.2 P1–P10 (10 fixes) + 60+ unit tests Serverless /tmp + parallel correctness + install gate + managed-location accessibility P1 _resolve_business_scratch_path uses tempfile.mkdtemp() (foreign-UID /tmp reuse no longer permission-denies); P2 domain name mismatch is now CRITICAL (parallel domain-enrich no longer soft-accepts wrong-domain payload); P3 _coerce_decimal_to_float for pool-spec assembly (no more Decimal rejection by spark.createDataFrame); P4 shrink prompt forbids siloed products + _detect_post_shrink_silos(); P5 jobtags skip deleted job; P6 prompt brace-escape (KeyError '0,62' killed); P7 JobLauncher.wait_for_run_terminal() blocks parent until child terminal — no more parent-SUCCESS-over-child-FAIL; P8 _validate_storage_accessible() probes managed-location candidates, retries CREATE CATALOG bare on permission errors; P9 model.json accepts any name shape; P10 metric-view install count validation.
v0.8.3 R1, R3, R6, R7, R8, F2-regression (6 fixes) + 27 unit tests Vibe-version write barriers + log durability + cycle determinism R1 _assert_vibe_version_advances aborts in-place overwrites at three (then four — see v0.8.4) write barriers; R3 _safe_volume_flush skips copy if local log shrunk (no more truncation-on-success); R6 metric-view bare-name resolution via DESCRIBE; R7 model_params.subdomain_required mandatory in LLM JSON output; R8 deterministic Pass-2 cycle breaker; F2-regression makes immutable violation a CRITICAL (no soft-accept poisoning model_architect_review).
v0.8.4 hardening + runtime LOG SIGNALS for R1/F2-reg/R3/R6/R7/R8 + 4th R1 callsite + 13 new tests Audit signal coverage + 4th vibe-version barrier Every R/F/N fix from v0.8.3 now emits a runtime LOG SIGNAL (e.g. alias=immutable-violation-critical, alias=log-no-truncate-on-success, alias=metric-view-bare-via-describe) so post-run audit can grep them; 4th R1 write-barrier callsite added; 13 new unit tests cover the runtime signals.
v0.8.5 M1, M2, M3+M4+N1, M5, N6 (6 model-quality fixes) + 34 unit tests mvm_v4 audit fixes — naming + FK semantics + canonical attrs + metric-views char-iter M1 FK-name IdId double-suffix root cause (normalize_fk_column_name widened to read `attribute
v0.8.6 de-bias prompts + N5-FIX (R3 sentinels routed to volume info.log) + 7 anti-bias tests Industry-agnostic prompt vocabulary + audit signal durability M1/M3+M4/M5 prompt rules rewritten in abstract semantic vocabulary (<EntityEarlier>, <Container>, <Owner> patterns) — agent no longer biases retail-shaped models for non-retail industries; explicit anti-industry-bias warning block added to M5; N5-FIX appends R3 sentinels (SHRUNK, SAFE-FLUSH, FINAL-FLUSH) to local _vl_local_info so they reach the UC volume info.log (alias=r3-sentinels-to-volume).

What a P-number comment looks like in source

# v0.7.10 P0.96: use _metamodel.domain COUNT (same source as Cross-Validation).
# Prior SHOW SCHEMAS row-parsing returned 0 on serverless even with 11 schemas.

Every P-number appearing in this table is anchored in at least one such comment. grep -n "P0.NN" /tmp/agent_source.py (or the notebook) jumps straight to the enforcement site.

Going-forward contract

  1. Each new P-fix MUST carry a stable v0.X.Y P0.NN comment at the enforcement site.
  2. Each P-fix MUST pass the pytest unit tests in tests/unit-tests/ and the end-to-end tests/vibe_tester.ipynb.
  3. Each P-fix MUST be listed in this table at tag-to-main time with theme + impact.

Legal Information

This software is provided as-is and is not officially supported by Databricks through customer technical support channels. Support, questions, and feature requests can be communicated through the Issues page of this repo. Issues with the use of this code will not be answered or investigated by Databricks Support.


Built on Databricks Serverless Compute with Unity Catalog governance