agentic-data-contracts

YAML-first, domain-driven data governance for AI agents.

agentic-data-contracts takes a domain-driven approach to AI agent governance: instead of letting agents figure out your data landscape by trial and error, you teach them your business domains, metrics, and rules upfront — in YAML. The agent starts by understanding what a business domain means, then discovers which metrics to use, then builds queries that comply with your governance rules. All enforced automatically at query time via SQL validation powered by sqlglot.

Why domain-driven? AI agents querying databases face three problems: resource runaway (unbounded compute, endless retries, cost overruns), semantic inconsistency (wrong tables, missing filters, ad-hoc metric definitions), and lack of business context (the agent doesn't know what "revenue" means in your company). This library addresses all three with a single YAML contract that combines governance rules with business domain knowledge.

Works with: Claude Agent SDK (primary target), or any Python agent framework. Optionally integrates with ai-agent-contracts for formal resource governance.

See it running: three working example agents cover distinct governance archetypes — financial reporting (revenue_agent), experimentation (growth_agent), and SRE reliability (ops_agent). Each runs end-to-end in demo mode without any external API key.

How It Works

The agent follows a domain-driven workflow — understanding business context before writing SQL:

1. Agent receives: "How is revenue trending?"
2. lookup_domain("revenue")     → "Revenue is recognized at fulfillment, not booking"
3. lookup_metric("total_revenue") → SUM(amount) FILTER (WHERE status = 'completed')
4. Agent writes SQL using the metric definition
5. inspect_query(sql)           → {"valid": true, "estimated_cost_usd": 0.0, ...}
6. run_query(sql)               → results returned

Governance rules are enforced automatically at query time:

Agent: "SELECT * FROM analytics.orders"
  -> BLOCKED (no SELECT * — specify explicit columns)

Agent: "SELECT order_id, amount FROM analytics.orders"
  -> BLOCKED (missing required filter: tenant_id)

Agent: "SELECT order_id, amount FROM analytics.orders WHERE tenant_id = 'acme'"
  -> PASSED + WARN (consider using semantic revenue definition)

The contract defines the domains, metrics, and rules. The library enforces them — before the query ever reaches the database.

Installation

uv add agentic-data-contracts
# or
pip install agentic-data-contracts

With optional database adapters:

uv add "agentic-data-contracts[duckdb]"      # DuckDB
uv add "agentic-data-contracts[bigquery]"    # BigQuery
uv add "agentic-data-contracts[snowflake]"   # Snowflake
uv add "agentic-data-contracts[postgres]"    # PostgreSQL
uv add "agentic-data-contracts[agent-sdk]"   # Claude Agent SDK integration

Quick Start

1. Write a YAML contract

# contract.yml
version: "1.0"
name: revenue-analysis

semantic:
  source:
    type: yaml
    path: "./semantic.yml"
  allowed_tables:
    - schema: analytics
      description: "Curated analytics tables — prefer for reporting"
      preferred: true
      tables: ["*"]          # all tables in schema (discovered from database)
    - schema: marketing
      tables: [campaigns]    # or list specific tables
      allowed_principals: [alice@co.com, bob@co.com]  # only these may query marketing.campaigns
  forbidden_operations: [DELETE, DROP, TRUNCATE, UPDATE, INSERT]
  domains:
    - name: revenue
      summary: "Financial metrics from completed orders"
      description: >
        Revenue is recognized at fulfillment, not at booking.
        Excludes refunds and chargebacks unless stated.
      metrics: [total_revenue]
  rules:
    - name: tenant_isolation
      description: "All queries must filter by tenant_id"
      enforcement: block
      query_check:
        required_filter: tenant_id
    - name: no_select_star
      description: "Must specify explicit columns"
      enforcement: block
      query_check:
        no_select_star: true
    - name: pii_columns_redacted_for_juniors
      description: "Junior analysts may not select PII columns from analytics.users"
      enforcement: block
      table: analytics.users
      blocked_principals: [security_admin@co.com]   # everyone except security_admin
      query_check:
        blocked_columns: [ssn, dob, email]

resources:
  cost_limit_usd: 5.00
  max_retries: 3
  token_budget: 50000

temporal:
  max_duration_seconds: 300

2. Load the contract and create tools

from agentic_data_contracts import DataContract, create_tools
from agentic_data_contracts.adapters.duckdb import DuckDBAdapter

dc = DataContract.from_yaml("contract.yml")
adapter = DuckDBAdapter("analytics.duckdb")

# Semantic source is auto-loaded from contract config (source.type + source.path)
tools = create_tools(dc, adapter=adapter)

Per-Caller Access Control (Optional)

When different callers should see different subsets of a contract's tables, pass caller_principal to create_tools. Use a static string for single-user sessions (e.g. Chainlit), or a zero-arg callable when identity changes per request (e.g. a Webex room bot serving multiple users from one long-lived process):

from agentic_data_contracts import DataContract, create_tools

dc = DataContract.from_yaml("contract.yml")

# Chainlit app (one user per session)
tools = create_tools(dc, adapter=adapter, caller_principal="alice@co.com")

# Webex bot (multiple users per bot instance, identity per message)
import contextvars
current_sender: contextvars.ContextVar[str | None] = contextvars.ContextVar("sender", default=None)
tools = create_tools(dc, adapter=adapter, caller_principal=lambda: current_sender.get())
# Handler sets current_sender before invoking the agent for each message.

The resolver is called per-query, not cached, so one long-lived Validator can serve different callers sequentially. Fail-closed: any allowed_principals or blocked_principals field on a table requires the caller to be identified — an anonymous caller is treated as unauthenticated and denied.

Principal and resolve_principal are available from the package root for integrators typing their own middleware:

from agentic_data_contracts import Principal, resolve_principal

Known limitation: to_system_prompt() lists all declared tables in the contract without filtering by principal. Query-time gating remains authoritative (denied queries never reach the database), but the agent may still be told about tables the current caller cannot access and can waste retry budget (resources.max_retries) on queries that will be blocked. Principal-aware prompt rendering is a candidate future feature — file an issue if your deployment needs it.

Per-Rule Principal Scoping

Individual SemanticRule entries accept the same allowed_principals / blocked_principals pair (mutually exclusive at load time). When a rule carries either field, it is skipped at validate-time for callers outside the scope. This works across every rule kind — blocked_columns, required_filter, no_select_star, max_joins, and result_check:

rules:
  # Block selecting `ssn` for everyone except the security admin.
  - name: redact_ssn
    enforcement: block
    table: pii.users
    blocked_principals: [security_admin@co.com]
    query_check:
      blocked_columns: [ssn]

  # Only the on-call engineer is held to the 60-second timeout result-check.
  - name: oncall_query_budget
    enforcement: warn
    table: prod.events
    allowed_principals: [oncall@co.com]
    result_check:
      max_rows: 1_000_000

Same fail-closed contract as per-table scoping: a rule with allowed_principals or blocked_principals set requires the caller to be identified — anonymous callers are out of scope and the rule is skipped (it does not silently downgrade to "applies to everyone"). This lets you express things like "Alice may not select ssn from pii.users, but Bob may" directly in YAML, without splitting tables into per-principal views.

3. Use with the Claude Agent SDK (requires `claude-agent-sdk>=0.1.52`)

import asyncio
from agentic_data_contracts import create_sdk_mcp_server
from claude_agent_sdk import (
    ClaudeAgentOptions,
    AssistantMessage,
    TextBlock,
    query,
)

# One-liner: wraps all 9 tools and bundles into an SDK MCP server
server = create_sdk_mcp_server(dc, adapter=adapter)

options = ClaudeAgentOptions(
    model="claude-sonnet-4-6",
    system_prompt=f"You are a revenue analytics assistant.\n\n{dc.to_system_prompt()}",
    mcp_servers={"dc": server},
    **dc.to_sdk_config(),  # token_budget → task_budget, max_retries → max_turns
)

async def run(prompt: str) -> None:
    async for message in query(prompt=prompt, options=options):
        if isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock):
                    print(block.text)

asyncio.run(run("What was total revenue by region in Q1 2025?"))

Layer Anthropic's `data` plugin on top (governed analyst skills)

The Agent SDK can load knowledge-work plugins alongside your governed tools, so the agent gets the data plugin's analyst skills (validate-data, statistical-analysis, explore-data, sql-queries, …) while every query it runs is still enforced by your contract. The skills are tool-agnostic — they drive "whatever warehouse tool is connected," which in your process is your governed in-process server.

The one rule that makes this safe: suppress the plugin's bundled .mcp.json warehouse servers so the agent can't bypass the contract. strict_mcp_config=True does exactly that — it uses only the servers you pass in mcp_servers.

import dataclasses

server = create_sdk_mcp_server(dc, adapter=adapter)

opts_kwargs = {
    "model": "claude-sonnet-4-6",
    "mcp_servers": {"dc": server},                  # the ONLY data path
    "allowed_tools": [f"mcp__dc__{t.name}" for t in tools],
}

# Feature-detect SDK support, then overlay the plugin's skills.
fields = {f.name for f in dataclasses.fields(ClaudeAgentOptions)}
if {"plugins", "skills", "strict_mcp_config"} <= fields:
    opts_kwargs["plugins"] = [{"type": "local", "path": "/path/to/knowledge-work-plugins/data"}]
    opts_kwargs["skills"] = ["validate-data", "statistical-analysis", "explore-data", "sql-queries"]
    opts_kwargs["strict_mcp_config"] = True         # ← ignore the plugin's warehouse .mcp.json
    opts_kwargs["system_prompt"] = {                # skills need the claude_code harness
        "type": "preset", "preset": "claude_code",
        "append": dc.to_system_prompt(),            # your governance, appended
    }

options = ClaudeAgentOptions(**opts_kwargs)

Notes:

Curate the skill list — do not use skills="all". data-context-extractor is deliberately omitted: it generates a parallel semantic skill that competes with your contract as the source of metric truth. The viz/dashboard skills (create-viz, build-dashboard) need code-execution tools you may not want to grant.
Set metric precedence in your prompt (e.g. "resolve metrics via lookup_metric/lookup_domain before writing SQL") so the plugin's "just write a query" instinct doesn't undercut your governed semantic layer.
All three examples ship this wiring behind an opt-in DATA_PLUGIN_PATH env var — growth_agent/agent.py is the canonical, fully-commented template.

4. Or use with deepagents / LangChain (requires `langchain>=1.2.17`)

from agentic_data_contracts import create_langchain_tools, ContractMiddleware
from deepagents import create_deep_agent

# `dc` and `adapter` are from the previous example.
# Returns list[BaseTool] — drop in anywhere LangChain accepts tools.
tools = create_langchain_tools(dc, adapter=adapter)

# Enforcement is auto-applied: session limits and BLOCKED envelopes from the
# underlying tools surface as ToolMessage(status="error"). Pair with
# ContractMiddleware (and apply_middleware=False) for graph-level interception.
agent = create_deep_agent(tools=tools)

Install: pip install "agentic-data-contracts[langchain]". For graph-level enforcement instead of in-tool:

tools = create_langchain_tools(dc, adapter=adapter, apply_middleware=False)
agent = create_deep_agent(tools=tools, middleware=[ContractMiddleware(dc, adapter=adapter)])

5. Or use the tools directly (no SDK required)

import asyncio

async def demo() -> None:
    # Inspect a query without executing. Response is structured JSON.
    inspect = next(t for t in tools if t.name == "inspect_query")
    result = await inspect.callable(
        {"sql": "SELECT id, amount FROM analytics.orders WHERE tenant_id = 'acme'"}
    )
    print(result["content"][0]["text"])
    # {"valid": true, "violations": [], "warnings": [], "log_messages": [],
    #  "schema_valid": true, "explain_errors": [], "pending_result_checks": [...]}

    # Blocked query
    result = await inspect.callable({"sql": "SELECT * FROM analytics.orders"})
    print(result["content"][0]["text"])
    # {"valid": false,
    #  "violations": ["SELECT * is not allowed — specify explicit columns", ...],
    #  "warnings": [], ...}

asyncio.run(demo())

The 9 Tools

Tool	Description
`describe_table`	Get full column details for an allowed table
`preview_table`	Preview sample rows from an allowed table
`list_metrics`	List metric definitions, optionally filtered by domain, tier, or indicator_kind
`lookup_metric`	Get a metric definition (SQL, tier, indicator_kind, impacts, impacted_by); fuzzy search fallback when no exact match
`lookup_domain`	Get full domain context (description, metrics, tables); fuzzy search fallback
`lookup_relationships`	Look up join paths for a table; finds multi-hop paths when given a target table
`trace_metric_impacts`	Walk the metric-impact graph upstream (drivers) or downstream (affected metrics) from a starting metric
`inspect_query`	Validate a SQL query and estimate its cost via EXPLAIN without executing
`run_query`	Validate and execute a SQL query, returning results

Domain-Driven Agent Workflow

The core design principle: agents should understand the business domain before writing SQL. Instead of dumping table schemas and hoping for the best, the contract teaches the agent your business vocabulary through progressive disclosure:

1. Domain context     →  "What does 'revenue' mean here?"
2. Metric definitions →  "How is 'total_revenue' calculated?"
3. Query execution    →  "Run the validated SQL"

Defining domains

Each domain carries a description that teaches the agent your business rules — things the SQL alone can't express:

semantic:
  domains:
    - name: acquisition
      summary: "Customer acquisition costs and conversion metrics"
      description: >
        Acquisition metrics track the cost and efficiency of
        acquiring new customers across all channels.
        CAC is calculated using fully-loaded cost, not just ad spend.
      metrics: [CAC, CPA, CPL, click_through_rate]
    - name: retention
      summary: "Customer retention, churn, and lifetime value"
      description: >
        Retention metrics measure how well we keep customers.
        Churn is measured on a 30-day rolling window.
        A customer is "active" if they had at least one qualifying
        action in the window.
      metrics: [churn_rate, LTV, retention_30d]

How the agent uses domains

The system prompt gives the agent a compact domain index. When a user asks a domain-specific question, the agent explores progressively:

lookup_domain("acquisition")        → business context + metric descriptions
lookup_metric("CAC")                → SQL expression, source table, filters
lookup_metric("acquisition cost")   → fuzzy match, returns [CAC, CPA] as candidates
list_metrics(domain="retention")    → all metrics in the retention domain

This means the agent knows that "revenue is recognized at fulfillment, not at booking" before it writes a single line of SQL — reducing hallucinated metrics and incorrect calculations.

Why progressive disclosure works

This pattern — compact index in the prompt, detailed context on demand — is the same philosophy validated by agent skill systems, MCP tool servers, and RAG architectures. Instead of overloading the agent's context window with everything upfront, you give it just enough to know where to look, then let it pull details when needed. The result is better token efficiency, more focused reasoning, and fewer hallucinations from context overload.

Contract Rules

Rules are enforced at three levels:

block — query is rejected and an error is returned to the agent
warn — query proceeds and a WARNINGS: preamble is prepended to the run_query response (also in inspect_query under warnings)
log — query proceeds and a LOG: preamble is prepended to the run_query response (also in inspect_query under log_messages); rules at this level are omitted from the system prompt so the agent can't adapt behavior to avoid triggering them

Each rule carries a query_check (pre-execution) or result_check (post-execution) block. Rules with neither are advisory — they appear in the system prompt but don't enforce anything. Every rule can be scoped to a specific table or applied globally.

Built-in query checks (pre-execution, validated against SQL AST):

Check	Description
`required_filter`	Require a column in WHERE clause (e.g., `tenant_id`)
`no_select_star`	Forbid `SELECT *` — require explicit columns
`blocked_columns`	Forbid specific columns in SELECT (e.g., PII)
`require_limit`	Require a LIMIT clause
`max_joins`	Cap the number of JOINs

Built-in result checks (post-execution, validated against query output):

Check	Description
`min_value` / `max_value`	Numeric bounds on a column's values
`not_null`	Column must not contain nulls
`min_rows` / `max_rows`	Row count bounds on the result set

Example with table scoping and both check types:

rules:
  - name: tenant_isolation
    description: "Orders must filter by tenant_id"
    enforcement: block
    table: "analytics.orders"      # only applies to this table
    query_check:
      required_filter: tenant_id

  - name: hide_pii
    description: "Do not select PII columns from customers"
    enforcement: block
    table: "analytics.customers"
    query_check:
      blocked_columns: [ssn, email, phone]

  - name: wau_sanity
    description: "WAU should not exceed world population"
    enforcement: warn
    table: "analytics.user_metrics"
    result_check:
      column: wau
      max_value: 8_000_000_000

  - name: no_negative_revenue
    description: "Revenue must not be negative"
    enforcement: block
    result_check:
      column: revenue
      min_value: 0

Semantic Sources

A semantic source provides metric, table schema, and relationship metadata to the agent. Paths are resolved relative to the contract file's directory (not the process CWD).

YAML (built-in):

# semantic.yml
metrics:
  - name: total_revenue
    description: "Total revenue from completed orders"
    sql_expression: "SUM(amount) FILTER (WHERE status = 'completed')"
    source_model: analytics.orders
    domains: [revenue]                 # optional — see "Metric Impacts" below
    tier: [north_star, department_kpi] # optional — north_star / department_kpi / team_kpi
    indicator_kind: lagging            # optional — leading | lagging

tables:
  - schema: analytics
    table: orders
    columns:
      - name: id
        type: INTEGER
      - name: amount
        type: DECIMAL
      - name: tenant_id
        type: VARCHAR

tier, indicator_kind, and domains are all optional. For dbt and Cube sources, these fields live under the metric's meta: block and are read through the same field names.

dbt — point to a manifest.json:

semantic:
  source:
    type: dbt
    path: "./dbt/manifest.json"

dbt's built-in relationships schema test compiles into the manifest as a test node — DbtSource projects each one into a Relationship, resolving the owner via attached_node (manifest v12+) and the referenced model via depends_on.nodes. Tests with non-relationships types (not_null, unique, custom tests) and tests that can't be resolved are silently ignored. Three optional knobs read from the test's meta: block (matching how tier / domains are read on metrics):

# In your dbt schema.yml
models:
  - name: orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('customers')
              field: id
              meta:
                preferred: true
                required_filter: "status != 'cancelled'"
                relationship_type: many_to_one  # default; one_to_one / many_to_many also accepted

Cube — point to a Cube schema file:

semantic:
  source:
    type: cube
    path: "./cube/schema.yml"

Each cube's joins: block projects into Relationship instances. The parser handles the single-equality form {CUBE}.col1 = {Other}.col2 (in either direction); the from side is always the column on the cube declaring the join, regardless of how the SQL was written. Cube's relationship enum (belongsTo, hasOne, hasMany, plus the snake_case aliases many_to_one / one_to_one / one_to_many) maps to the canonical Relationship.type. Reads from each join's meta: block:

# In your Cube schema
cubes:
  - name: Orders
    sql_table: analytics.orders
    joins:
      - name: Users
        sql: "{CUBE}.customer_id = {Users}.id"
        relationship: belongsTo
        meta:
          preferred: true
          required_filter: "status != 'cancelled'"
          relationship_type: many_to_one  # optional override

Joins whose SQL doesn't match the single-equality pattern (composite keys with AND-chained equalities) or whose target cube can't be resolved by name are skipped silently — fall back to declaring those in your contract YAML via YamlSource.

Table Relationships

Define join paths so the agent knows how to combine tables correctly:

# semantic.yml
relationships:
  - from: analytics.orders.customer_id
    to: analytics.customers.id
    type: many_to_one
    description: >
      Join orders to customers for region-level breakdowns.
      Every order has exactly one customer.

  - from: analytics.bdg_attribution.contact_id
    to: analytics.contacts.contact_id
    type: many_to_one
    description: "Bridge table — filter to avoid fan-out from multiple attribution records."
    required_filter: "attribution_model = 'last_touch_attribution'"

  # When multiple parallel join paths exist between the same pair of tables
  # (role-playing dimensions, multi-role FKs), mark the canonical one
  # `preferred: true`. The agent sees `preferred="true"` in the prompt and
  # `lookup_relationships` returns preferred edges first.
  - from: analytics.orders.customer_id
    to: analytics.users.id
    type: many_to_one
    description: "Customer who placed the order — canonical user join."
    preferred: true
  - from: analytics.orders.sales_rep_id
    to: analytics.users.id
    type: many_to_one
    description: "Salesperson who closed the order."

Field	Required	Description
`from` / `to`	Yes	Fully qualified column references (`schema.table.column`)
`type`	No	Cardinality: `many_to_one` (default), `one_to_one`, `many_to_many`
`description`	No	Free-text context for the agent (join guidance, caveats, data quality notes)
`required_filter`	No	SQL condition that must be applied when using this join (e.g., bridge table disambiguation)
`preferred`	No	Mark the canonical join when alternatives exist (defaults to `false`). Surfaces as `preferred="true"` in the prompt, floats the edge to the front of `lookup_relationships` direct-lookup output, and biases multi-hop BFS path-finding toward it. Leave unset for role-playing peers (e.g. `order_date` vs `ship_date`) where no single path is canonical.

The agent sees these in its system prompt and uses them to write correct JOINs instead of guessing from column names.

Relationship Validation

When a SemanticSource is passed to the Validator, declared relationships are actively validated against the agent's SQL:

Check	Trigger	Warning
Join-key correctness	Agent joins on wrong columns for a declared relationship	"uses `email` but declared relationship specifies `customer_id → id`"
Required-filter missing	Join has `required_filter` but WHERE clause doesn't include it	"has required filter `status != 'cancelled'` but query does not filter on: status"
Fan-out risk	Aggregation (SUM, COUNT, etc.) across a `one_to_many` join	"Results may be inflated by row multiplication"

All relationship checks are advisory only (warnings, never blocks). Undeclared joins are silently ignored — the checker only validates relationships you've explicitly defined.

Metric Impacts

Table relationships tell the agent how to join. Metric impacts tell the agent what drives what — the causal / economic graph between KPIs. When an agent is asked "why did revenue drop?", an impact graph lets it walk upstream to the drivers (conversion rate, active customers, traffic) rather than blindly querying revenue again. When it's asked to recommend an action, it can cite verified evidence rather than hand-waving.

Declare impacts at the top level of the semantic YAML, alongside metrics: and relationships::

# semantic.yml
metric_impacts:
  - from: active_customers
    to: total_revenue
    direction: positive           # positive | negative
    confidence: verified          # verified | correlated | hypothesized
    evidence: "A/B test exp-042 (Q3 2025), +3.2% revenue lift, p<0.01"
    description: "Retained customers drive repeat purchases."

Field	Required	Description
`from` / `to`	Yes	Metric names (must match a metric declared in the same contract)
`direction`	No	`positive` (default) or `negative`
`confidence`	No	`hypothesized` (default), `correlated`, or `verified` — lets the agent prioritize backed-up drivers over hunches
`evidence`	No	Free text — study reference, A/B test ID, anything the agent should quote when making a recommendation
`description`	No	Optional elaboration

Edges are directional. There's no domains field on the edge itself: an impact surfaces whenever either endpoint is in the agent's active domain, so cross-domain drivers (Checkout → Revenue) get discovered for free.

How the agent uses impacts

lookup_metric surfaces an enriched response: each metric carries impacts (outgoing edges) and impacted_by (incoming edges), each rendered as a one-line citation string:

"positive impact on total_revenue (verified): A/B test exp-042 (Q3 2025), +3.2% revenue lift, p<0.01"

The agent can quote this verbatim in its answer — structured enough to reason over, readable enough to paste.

trace_metric_impacts walks the graph via BFS:

await trace.callable({
    "metric_name": "total_revenue",
    "direction": "upstream",     # upstream = drivers, downstream = affected
    "max_depth": 2,
})
# Returns: {"edges": [{"depth": 1, "from": "active_customers", "to": "total_revenue",
#                       "direction": "positive", "confidence": "verified",
#                       "evidence": "A/B test exp-042..."}]}

Impacts declared in contract YAML reference metric names regardless of where the metric itself is defined, so this works even for dbt and Cube-sourced metrics — neither semantic layer has a native causal-graph concept. Unknown metric references in metric_impacts emit a warning at tool-creation time (same pattern as domain validation).

Custom Prompt Rendering

The system prompt is generated by a PromptRenderer. The default ClaudePromptRenderer produces XML-structured output optimized for Claude models:

dc = DataContract.from_yaml("contract.yml")
print(dc.to_system_prompt())  # XML output, optimized for Claude

For other models (GPT-4, Gemini, Llama), implement the PromptRenderer protocol:

from agentic_data_contracts import PromptRenderer, DataContract

class MarkdownRenderer:
    def render(self, contract, semantic_source=None):
        tables = "\n".join(f"- {t}" for t in contract.allowed_table_names())
        return f"## {contract.name}\n\nAllowed tables:\n{tables}"

dc = DataContract.from_yaml("contract.yml")
print(dc.to_system_prompt(renderer=MarkdownRenderer()))

Scaling to Large Organizations

Tested for 200+ tables, 300+ metrics, 50+ relationships across multiple schemas.

Concern	How it scales
System prompt size	With domains: compact index (name + summary + count). Without domains: >20 metrics auto-switches to count. >30 relationships: per-table join counts with `lookup_relationships` hint
Relationship lookup	`lookup_relationships(table=...)` returns joins for a table on demand. With `target_table`, finds shortest multi-hop join path via BFS (up to 3 hops)
Wildcard schemas	`tables: ["*"]` discovers tables from the database. Resolution is cached — no repeated queries
Metric lookup	Fuzzy search via `thefuzz` (C++ backed) — sub-millisecond even with 1000+ metrics
SQL validation	Set-based allowlist check — O(1) per table reference regardless of allowlist size

Resource Limits

resources:
  cost_limit_usd: 5.00          # max estimated query cost
  max_retries: 3                 # max blocked queries per session
  token_budget: 50000            # max tokens consumed
  max_query_time_seconds: 30     # max wall-clock query time
  max_rows_scanned: 1000000      # max rows an EXPLAIN may estimate

Optional Dependencies

Extra	Package	Purpose
`duckdb`	`duckdb`	DuckDB adapter
`bigquery`	`google-cloud-bigquery`	BigQuery adapter
`snowflake`	`snowflake-connector-python`	Snowflake adapter
`postgres`	`psycopg2-binary`	PostgreSQL adapter
`agent-sdk`	`claude-agent-sdk`	Claude Agent SDK integration
`langchain`	`langchain-core`, `langchain`	LangChain / deepagents integration
`agent-contracts`	`ai-agent-contracts>=0.2.0`	ai-agent-contracts bridge

Optional: Formal Governance with ai-agent-contracts

The library works standalone with lightweight enforcement. Install ai-agent-contracts to upgrade to the formal governance framework:

pip install "agentic-data-contracts[agent-contracts]"

from agentic_data_contracts.bridge.compiler import compile_to_contract

contract = compile_to_contract(dc)  # YAML → formal 7-tuple Contract

What you get with the bridge:

Concern	Standalone	With ai-agent-contracts
Resource tracking	Manual counters	Formal `ResourceConstraints` with auto-enforcement
Rule violations	Exception + retry	`TerminationCondition` with contract state machine
Success evaluation	Log-based	Weighted `SuccessCriterion` scoring, LLM judge support
Contract lifecycle	None	`DRAFTED → ACTIVE → FULFILLED / VIOLATED / TERMINATED`
Framework support	Claude Agent SDK	+ LiteLLM, LangChain, LangGraph, Google ADK
Multi-agent	Single agent	Coordination patterns (sequential, parallel, hierarchical)

When to use it: formal audit trails, success scoring, multi-agent coordination, or integration with non-Claude agent frameworks.

Examples

Three end-to-end working examples, each demonstrating a different governance archetype. All three run in demo mode without the Claude Agent SDK installed — DuckDB is used for the sample data and the tools are exercised directly.

Example	Archetype	Governance patterns it teaches
`examples/revenue_agent/`	Finance / lagging KPIs / audit-strict	Tenant isolation, `hypothesized` impact edges, north-star metric tier, undefined-metric policy recipe
`examples/growth_agent/`	Experimentation / leading indicators	`verified` / `correlated` / `hypothesized` metric impacts with real-ish A/B evidence, time-bounded events rule, `log`-level PII audit invisible to the agent, stale-review detection, `preferred: true` on the canonical `events.user_id → users.id` join (alongside a non-preferred `events.referrer_user_id → users.id` for referral-mechanics questions)
`examples/ops_agent/`	SRE reliability / real-time dashboards	`blocked_columns` for PII, two `log`-level audit rules (governance trail), `require_limit` + `max_joins` caps, negative-direction metric impact (DORA pattern), aggressive resource limits, `blocked_principals` on `sre.deploys` (try `--caller intern@co.com` to see a per-table principal denial)

Run any of them:

uv run python examples/revenue_agent/agent.py "What was Q1 revenue by region?"
uv run python examples/growth_agent/agent.py  "Which onboarding variant lifted activation?"
uv run python examples/ops_agent/agent.py     "What's our MTTR by severity this week?"

Each example directory contains four files:

contract.yml — governance rules, allowed tables, resource limits
semantic.yml — metrics, relationships, metric impacts
setup_db.py — sample DuckDB data (auto-created on first run)
agent.py — runnable demo with a Claude Agent SDK path plus a fallback that exercises the tools directly

Reading all three gives you a complete tour of the library's design space: different enforcement levels (block / warn / log), different impact confidences and directions, and resource profiles tuned for very different user-latency expectations.

All three agent.py files also carry the data-plugin skills overlay behind an opt-in DATA_PLUGIN_PATH env var (off by default, so the examples run with zero external setup):

git clone https://github.com/anthropics/knowledge-work-plugins /tmp/kwp
DATA_PLUGIN_PATH=/tmp/kwp/data \
    uv run python examples/growth_agent/agent.py "Which onboarding variant lifted activation?"

Architecture

See docs/architecture.md for the full design spec covering the layered architecture, YAML schema, validation pipeline, tool design, semantic sources, database adapters, and the optional ai-agent-contracts bridge.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.github		.github
docs		docs
examples		examples
src/agentic_data_contracts		src/agentic_data_contracts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

agentic-data-contracts

How It Works

Installation

Quick Start

1. Write a YAML contract

2. Load the contract and create tools

Per-Caller Access Control (Optional)

Per-Rule Principal Scoping

3. Use with the Claude Agent SDK (requires claude-agent-sdk>=0.1.52)

Layer Anthropic's data plugin on top (governed analyst skills)

4. Or use with deepagents / LangChain (requires langchain>=1.2.17)

5. Or use the tools directly (no SDK required)

The 9 Tools

Domain-Driven Agent Workflow

Defining domains

How the agent uses domains

Why progressive disclosure works

Contract Rules

Semantic Sources

Table Relationships

Relationship Validation

Metric Impacts

How the agent uses impacts

Custom Prompt Rendering

Scaling to Large Organizations

Resource Limits

Optional Dependencies

Optional: Formal Governance with ai-agent-contracts

Examples

Architecture

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 32

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. Use with the Claude Agent SDK (requires `claude-agent-sdk>=0.1.52`)

Layer Anthropic's `data` plugin on top (governed analyst skills)

4. Or use with deepagents / LangChain (requires `langchain>=1.2.17`)

Packages