Compliance DSL Reference

Overview

The Compliance DSL (Domain-Specific Language) provides a powerful, flexible way to define compliance rules that can be evaluated against various entities in your Databricks environment and application.

Syntax

Basic Structure

MATCH (entity:Type)
WHERE entity.property OP value [AND/OR ...]
ASSERT condition
ON_PASS action [action ...]
ON_FAIL action [action ...]

Components

MATCH Clause

Specifies the entity type to evaluate:

MATCH (obj:Object)        # All entity types
MATCH (tbl:table)         # Only tables
MATCH (prod:data_product) # Only data products

Supported Entity Types:

Unity Catalog: catalog, schema, table, view, function, volume
Application: data_product, data_contract, domain, glossary_term, review
Generic: Object (matches all types)

WHERE Clause (Optional)

Filters entities before evaluation:

WHERE obj.type IN ['table', 'view']
WHERE obj.owner != 'unknown' AND obj.status = 'active'

ASSERT Clause

Defines the compliance condition:

ASSERT obj.name MATCHES '^[a-z][a-z0-9_]*$'
ASSERT obj.encryption = 'AES256'
ASSERT HAS_TAG('data-product')

Action Clauses (Optional)

Define what happens on success or failure:

ON_PASS PASS
ON_FAIL FAIL 'Name must be lowercase with underscores'
ON_FAIL ASSIGN_TAG compliance_issue: 'naming_violation'
ON_FAIL NOTIFY 'compliance-team@example.com'

Operators

Comparison Operators

= - Equality
!= - Inequality
> - Greater than
< - Less than
>= - Greater than or equal
<= - Less than or equal
MATCHES - Regular expression match
IN - List membership
CONTAINS - Substring/element containment

Boolean Operators

AND - Logical AND
OR - Logical OR
NOT - Logical NOT

Examples

# Equality
obj.status = 'active'

# Regex match
obj.name MATCHES '^[a-z][a-z0-9_]*$'

# List membership
obj.type IN ['table', 'view']

# Boolean combination
obj.owner != 'unknown' AND obj.status = 'active'

# Negation
NOT obj.deprecated

Functions

Tag Functions

HAS_TAG(key) - Check if entity has a tag
TAG(key) - Get tag value

ASSERT HAS_TAG('data-product')
ASSERT TAG('domain') = 'finance'

String Functions

UPPER(str) - Convert to uppercase
LOWER(str) - Convert to lowercase
LENGTH(str) - Get string length

ASSERT LENGTH(obj.name) <= 64
ASSERT LOWER(obj.name) = obj.name  # Enforce lowercase

CASE Expressions

Conditional logic for complex rules:

ASSERT
  CASE obj.type
    WHEN 'catalog' THEN obj.name MATCHES '^[a-z][a-z0-9_]*$'
    WHEN 'schema' THEN obj.name MATCHES '^[a-z][a-z0-9_]*$'
    WHEN 'table' THEN obj.name MATCHES '^[a-z][a-z0-9_]*$'
    WHEN 'view' THEN obj.name MATCHES '^v_[a-z][a-z0-9_]*$'
    ELSE true
  END

Actions

PASS

Mark check as successful (default):

ON_PASS PASS

FAIL

Mark check as failed with custom message:

ON_FAIL FAIL 'Name must start with lowercase letter'

ASSIGN_TAG

Add or update a tag on the entity:

ON_FAIL ASSIGN_TAG compliance_status: 'violation'
ON_PASS ASSIGN_TAG last_checked: '2025-01-15'

REMOVE_TAG

Remove a tag from the entity:

ON_PASS REMOVE_TAG compliance_issue

NOTIFY

Trigger notification to recipients:

ON_FAIL NOTIFY 'security-team@example.com'
ON_FAIL NOTIFY 'compliance-alerts@example.com,data-governance@example.com'

Complete Examples

1. Naming Convention Rule

MATCH (obj:Object)
WHERE obj.type IN ['catalog', 'schema', 'table', 'view']
ASSERT
  CASE obj.type
    WHEN 'view' THEN obj.name MATCHES '^v_[a-z][a-z0-9_]*$'
    ELSE obj.name MATCHES '^[a-z][a-z0-9_]*$'
  END
ON_FAIL FAIL 'Objects must follow naming conventions'
ON_FAIL ASSIGN_TAG compliance_issue: 'naming_violation'

2. PII Encryption Rule

MATCH (tbl:table)
WHERE HAS_TAG('contains_pii') AND TAG('contains_pii') = 'true'
ASSERT HAS_TAG('encryption') AND TAG('encryption') = 'AES256'
ON_FAIL FAIL 'PII data must be encrypted with AES256'
ON_FAIL NOTIFY 'security-team@example.com'
ON_FAIL ASSIGN_TAG security_risk: 'high'

3. Data Product Ownership Rule

MATCH (prod:data_product)
WHERE prod.status IN ['active', 'published']
ASSERT prod.owner != 'unknown' AND LENGTH(prod.owner) > 0
ON_FAIL FAIL 'Active data products must have a valid owner'
ON_FAIL ASSIGN_TAG needs_attention: 'missing_owner'
ON_PASS REMOVE_TAG needs_attention

4. Tag Requirement Rule

MATCH (obj:Object)
WHERE obj.type IN ['table', 'view']
ASSERT HAS_TAG('data-product') OR HAS_TAG('excluded')
ON_FAIL FAIL 'All tables and views must be tagged with a data product'
ON_FAIL ASSIGN_TAG compliance_status: 'untagged'

5. Data Quality Threshold Rule

MATCH (contract:data_contract)
WHERE contract.status = 'active'
ASSERT
  HAS_TAG('quality_score') AND
  TAG('quality_score') >= '95'
ON_FAIL FAIL 'Data quality score must be at least 95%'
ON_FAIL NOTIFY 'data-quality-team@example.com'

6. Multi-Action Failure Handling

MATCH (obj:table)
ASSERT obj.owner != 'unknown'
ON_FAIL FAIL 'Table must have a valid owner'
ON_FAIL ASSIGN_TAG compliance_issue: 'missing_owner'
ON_FAIL ASSIGN_TAG last_failed: '2025-01-15'
ON_FAIL NOTIFY 'data-governance@example.com'

Property Access

Entities have different properties based on their type:

Unity Catalog Entities

type - Entity type (catalog, schema, table, view, etc.)
name - Entity name
id - Entity ID
full_name - Full qualified name
catalog - Parent catalog (for schemas, tables)
schema - Parent schema (for tables, views)
owner - Owner email
comment - Description
created_at - Creation timestamp
updated_at - Last update timestamp
storage_location - Storage path (for tables, volumes)
table_type - TABLE or VIEW

Application Entities

type - Entity type
id - Entity ID
name - Entity name
description - Description
status - Status (draft, active, etc.)
version - Version number
owner - Owner email
domain - Domain assignment
tags - Tag dictionary
created_at - Creation timestamp
updated_at - Last update timestamp

Best Practices

Use WHERE to Filter: Narrow down entities before ASSERT evaluation for performance
Specific Error Messages: Provide clear, actionable error messages in FAIL actions
Combine Actions: Use multiple actions for comprehensive failure handling
Tag for Tracking: Use tags to track compliance status over time
Notify Appropriately: Only notify on critical failures to avoid alert fatigue
Test Incrementally: Test rules on small subsets before running on all entities

Error Handling

The DSL parser provides detailed error messages with line and column information:

Lexer error at line 2, column 15: Unterminated string
Parser error at line 3, column 8: Expected ')', got 'AND'
Evaluation error: Unknown function: INVALID_FUNC

Extending the DSL

Custom Actions

from src.common.compliance_actions import Action, ActionResult, register_action

class CustomAction(Action):
    def execute(self, context):
        # Your custom logic here
        return ActionResult(
            success=True,
            action_type='CUSTOM',
            message='Custom action executed'
        )

register_action('CUSTOM', CustomAction)

Custom Functions

Extend the evaluator to support custom functions by modifying evaluate_function in compliance_dsl.py.

Performance Considerations

Batch Processing: Entity loading is batched for efficiency
Lazy Evaluation: Entities are loaded on-demand as iterators
WHERE Filtering: Applied before ASSERT to reduce evaluations
Limits: Use the limit parameter when testing rules

Integration

The DSL is integrated into the ComplianceManager:

from src.controller.compliance_manager import ComplianceManager

manager = ComplianceManager()
run = manager.run_policy_inline(db, policy=policy, limit=100)

Results are stored in the database with:

Success/failure counts
Overall compliance score
Per-entity results with messages
Action execution audit trail

FilesExpand file tree

compliance-dsl-reference.md

Latest commit

History