The Compliance DSL (Domain-Specific Language) provides a powerful, flexible way to define compliance rules that can be evaluated against various entities in your Databricks environment and application.
MATCH (entity:Type)
WHERE entity.property OP value [AND/OR ...]
ASSERT condition
ON_PASS action [action ...]
ON_FAIL action [action ...]
Specifies the entity type to evaluate:
MATCH (obj:Object) # All entity types
MATCH (tbl:table) # Only tables
MATCH (prod:data_product) # Only data products
Supported Entity Types:
- Unity Catalog:
catalog,schema,table,view,function,volume - Application:
data_product,data_contract,domain,glossary_term,review - Generic:
Object(matches all types)
Filters entities before evaluation:
WHERE obj.type IN ['table', 'view']
WHERE obj.owner != 'unknown' AND obj.status = 'active'
Defines the compliance condition:
ASSERT obj.name MATCHES '^[a-z][a-z0-9_]*$'
ASSERT obj.encryption = 'AES256'
ASSERT HAS_TAG('data-product')
Define what happens on success or failure:
ON_PASS PASS
ON_FAIL FAIL 'Name must be lowercase with underscores'
ON_FAIL ASSIGN_TAG compliance_issue: 'naming_violation'
ON_FAIL NOTIFY 'compliance-team@example.com'
=- Equality!=- Inequality>- Greater than<- Less than>=- Greater than or equal<=- Less than or equalMATCHES- Regular expression matchIN- List membershipCONTAINS- Substring/element containment
AND- Logical ANDOR- Logical ORNOT- Logical NOT
# Equality
obj.status = 'active'
# Regex match
obj.name MATCHES '^[a-z][a-z0-9_]*$'
# List membership
obj.type IN ['table', 'view']
# Boolean combination
obj.owner != 'unknown' AND obj.status = 'active'
# Negation
NOT obj.deprecated
HAS_TAG(key)- Check if entity has a tagTAG(key)- Get tag value
ASSERT HAS_TAG('data-product')
ASSERT TAG('domain') = 'finance'
UPPER(str)- Convert to uppercaseLOWER(str)- Convert to lowercaseLENGTH(str)- Get string length
ASSERT LENGTH(obj.name) <= 64
ASSERT LOWER(obj.name) = obj.name # Enforce lowercase
Conditional logic for complex rules:
ASSERT
CASE obj.type
WHEN 'catalog' THEN obj.name MATCHES '^[a-z][a-z0-9_]*$'
WHEN 'schema' THEN obj.name MATCHES '^[a-z][a-z0-9_]*$'
WHEN 'table' THEN obj.name MATCHES '^[a-z][a-z0-9_]*$'
WHEN 'view' THEN obj.name MATCHES '^v_[a-z][a-z0-9_]*$'
ELSE true
END
Mark check as successful (default):
ON_PASS PASS
Mark check as failed with custom message:
ON_FAIL FAIL 'Name must start with lowercase letter'
Add or update a tag on the entity:
ON_FAIL ASSIGN_TAG compliance_status: 'violation'
ON_PASS ASSIGN_TAG last_checked: '2025-01-15'
Remove a tag from the entity:
ON_PASS REMOVE_TAG compliance_issue
Trigger notification to recipients:
ON_FAIL NOTIFY 'security-team@example.com'
ON_FAIL NOTIFY 'compliance-alerts@example.com,data-governance@example.com'
MATCH (obj:Object)
WHERE obj.type IN ['catalog', 'schema', 'table', 'view']
ASSERT
CASE obj.type
WHEN 'view' THEN obj.name MATCHES '^v_[a-z][a-z0-9_]*$'
ELSE obj.name MATCHES '^[a-z][a-z0-9_]*$'
END
ON_FAIL FAIL 'Objects must follow naming conventions'
ON_FAIL ASSIGN_TAG compliance_issue: 'naming_violation'
MATCH (tbl:table)
WHERE HAS_TAG('contains_pii') AND TAG('contains_pii') = 'true'
ASSERT HAS_TAG('encryption') AND TAG('encryption') = 'AES256'
ON_FAIL FAIL 'PII data must be encrypted with AES256'
ON_FAIL NOTIFY 'security-team@example.com'
ON_FAIL ASSIGN_TAG security_risk: 'high'
MATCH (prod:data_product)
WHERE prod.status IN ['active', 'published']
ASSERT prod.owner != 'unknown' AND LENGTH(prod.owner) > 0
ON_FAIL FAIL 'Active data products must have a valid owner'
ON_FAIL ASSIGN_TAG needs_attention: 'missing_owner'
ON_PASS REMOVE_TAG needs_attention
MATCH (obj:Object)
WHERE obj.type IN ['table', 'view']
ASSERT HAS_TAG('data-product') OR HAS_TAG('excluded')
ON_FAIL FAIL 'All tables and views must be tagged with a data product'
ON_FAIL ASSIGN_TAG compliance_status: 'untagged'
MATCH (contract:data_contract)
WHERE contract.status = 'active'
ASSERT
HAS_TAG('quality_score') AND
TAG('quality_score') >= '95'
ON_FAIL FAIL 'Data quality score must be at least 95%'
ON_FAIL NOTIFY 'data-quality-team@example.com'
MATCH (obj:table)
ASSERT obj.owner != 'unknown'
ON_FAIL FAIL 'Table must have a valid owner'
ON_FAIL ASSIGN_TAG compliance_issue: 'missing_owner'
ON_FAIL ASSIGN_TAG last_failed: '2025-01-15'
ON_FAIL NOTIFY 'data-governance@example.com'
Entities have different properties based on their type:
type- Entity type (catalog, schema, table, view, etc.)name- Entity nameid- Entity IDfull_name- Full qualified namecatalog- Parent catalog (for schemas, tables)schema- Parent schema (for tables, views)owner- Owner emailcomment- Descriptioncreated_at- Creation timestampupdated_at- Last update timestampstorage_location- Storage path (for tables, volumes)table_type- TABLE or VIEW
type- Entity typeid- Entity IDname- Entity namedescription- Descriptionstatus- Status (draft, active, etc.)version- Version numberowner- Owner emaildomain- Domain assignmenttags- Tag dictionarycreated_at- Creation timestampupdated_at- Last update timestamp
- Use WHERE to Filter: Narrow down entities before ASSERT evaluation for performance
- Specific Error Messages: Provide clear, actionable error messages in FAIL actions
- Combine Actions: Use multiple actions for comprehensive failure handling
- Tag for Tracking: Use tags to track compliance status over time
- Notify Appropriately: Only notify on critical failures to avoid alert fatigue
- Test Incrementally: Test rules on small subsets before running on all entities
The DSL parser provides detailed error messages with line and column information:
Lexer error at line 2, column 15: Unterminated string
Parser error at line 3, column 8: Expected ')', got 'AND'
Evaluation error: Unknown function: INVALID_FUNC
Register custom actions in Python:
from src.common.compliance_actions import Action, ActionResult, register_action
class CustomAction(Action):
def execute(self, context):
# Your custom logic here
return ActionResult(
success=True,
action_type='CUSTOM',
message='Custom action executed'
)
register_action('CUSTOM', CustomAction)Extend the evaluator to support custom functions by modifying evaluate_function in compliance_dsl.py.
- Batch Processing: Entity loading is batched for efficiency
- Lazy Evaluation: Entities are loaded on-demand as iterators
- WHERE Filtering: Applied before ASSERT to reduce evaluations
- Limits: Use the
limitparameter when testing rules
The DSL is integrated into the ComplianceManager:
from src.controller.compliance_manager import ComplianceManager
manager = ComplianceManager()
run = manager.run_policy_inline(db, policy=policy, limit=100)Results are stored in the database with:
- Success/failure counts
- Overall compliance score
- Per-entity results with messages
- Action execution audit trail