Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions API.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,31 @@ curl -X POST "http://localhost:4242/export?format=sql" \
--data-binary @datacontract.yaml
```

## Diff Two Data Contracts

Compare two ODCS data contracts and receive a diff report. POST a JSON body with `v1` (source/before) and `v2` (target/after) as YAML strings. Use `?format=text` (default) or `?format=html`.

```bash
curl -X POST "http://localhost:4242/diff?format=text" \
-H "Content-Type: application/json" \
-d '{
"v1": "'"$(cat v1.odcs.yaml)"'",
"v2": "'"$(cat v2.odcs.yaml)"'"
}'
```

To get an HTML report:

```bash
curl -X POST "http://localhost:4242/diff?format=html" \
-H "Content-Type: application/json" \
-d '{
"v1": "'"$(cat v1.odcs.yaml)"'",
"v2": "'"$(cat v2.odcs.yaml)"'"
}' \
-o diff_report.html
```

## Try it out

You can also use the Swagger UI to execute the commands directly.
Expand Down
23 changes: 10 additions & 13 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added
- Added `ci` command for CI/CD-optimized test runs: multi-file support, GitHub Actions annotations and step summary, Azure DevOps annotations, `--fail-on` flag, `--json` output
- Added data contract semantic diff command and API endpoint

### Fixed
- Fix SQL export generating multiple PRIMARY KEY constraints for composite keys (#1026)
Expand All @@ -25,20 +26,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Fix SQL importer type mappings: binary types, datetime/time, uuid now map to correct ODCS logicalType and format (#790)

### Added
- Added support for MySQL for data contract tests (#1101)
- Support additional PyArrow types in Parquet importer (#1091)
- Populate `logicalTypeOptions.format` for SQL import from binary and uuid types (#790)
- Snowflake DDL import with tags, descriptions, and template variable handling (#790)

## [0.11.6] - 2026-03-17

### Fixed
- Fix parser error for CSV / Parquet table names containing special characters (#1066)
- Fix BigQuery export failing with "Unsupported type" for parameterized physicalType like `NUMERIC(18, 4)` (#1083)

### Added
- Added JSON output format for test results (`--output-format json`)
- Added Azure AD / Entra ID authentication support for SQL Server and Microsoft Fabric
- Added Azure AD / Entra ID authentication support for SQL Server (`ActiveDirectoryPassword`, `ActiveDirectoryServicePrincipal`, `ActiveDirectoryInteractive`)
<<<<<<< HEAD
=======
- Add data contract semantic `diff` command and API endpoint
>>>>>>> 83937aa (Update CHANGELOG.md)

## [0.11.5] - 2026-02-19

Expand All @@ -53,7 +47,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Fix mkdir TOCTOU race causing silent JUnit write failure (#1050)
- Fix validation failure for field names with special chars on Databricks (#1049)
- Add Azure support for field name quoting in schema checks (#1025)

hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Comment on lines +50 to +53
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

## [0.11.4] - 2026-01-19

### Changed
Expand Down
6 changes: 6 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,10 @@ datacontract export --format html datacontract.yaml --output datacontract.html

# Import from a different format
datacontract import --format sql --source my-ddl.sql --dialect postgres --output datacontract.yaml

# Find differences between data contracts
datacontract diff datacontract-v1.yaml datacontract-v2.yaml

```

## Project Architecture
Expand All @@ -111,6 +115,8 @@ The Data Contract CLI is an open-source command-line tool for working with data

5. **Linting (`datacontract/lint/`)**: Tools for validating data contract files against schema and best practices.

6. **Semantic Diff (`datacontract/reports/diff/`)**: Semantic diff engine with HTML and text report renderers.

### Extension Pattern

The project uses factory patterns for extensibility:
Expand Down
39 changes: 38 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,12 @@ $ datacontract init odcs.yaml
# lint the odcs.yaml
$ datacontract lint odcs.yaml

# show a semantic diff between two data contracts (plain-text by default)
$ datacontract diff v1.odcs.yaml v2.odcs.yaml

# show a diff as a self-contained HTML report
$ datacontract diff v1.odcs.yaml v2.odcs.yaml --format html --output diff.html

# execute schema and quality checks (define credentials as environment variables)
$ datacontract test odcs.yaml

Expand Down Expand Up @@ -260,6 +266,7 @@ Commands

- [init](#init)
- [lint](#lint)
- [diff](#diff)
- [test](#test)
- [ci](#ci)
- [export](#export)
Expand Down Expand Up @@ -318,10 +325,40 @@ Commands

```

### diff
```

Usage: datacontract diff [OPTIONS] V1 V2

Show a diff between two data contracts.

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮
│ * v1 TEXT The location (path) of the source (before) data contract YAML. [required] │
│ * v2 TEXT The location (path) of the target (after) data contract YAML. [required] │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
│ --format [text|html] The output format for the diff report. [default: text] │
│ --output PATH Specify the file path where the diff report will be │
│ saved. If no path is provided, the output will be printed │
│ to stdout. │
│ --debug --no-debug Enable debug logging │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

```

```bash
# Plain-text diff to stdout
$ datacontract diff v1.odcs.yaml v2.odcs.yaml

# HTML diff saved to a file
$ datacontract diff v1.odcs.yaml v2.odcs.yaml --format html --output diff.html
```

### test
```

Usage: datacontract test [OPTIONS] [LOCATION]
Usage: datacontract test [OPTIONS] [LOCATION]

Run schema and quality tests on configured servers.

Expand Down
69 changes: 67 additions & 2 deletions datacontract/api.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
import logging
import os
from typing import Annotated, Optional
import tempfile
from typing import Annotated, Literal, Optional

import pydantic
import typer
import yaml
from fastapi import Body, Depends, FastAPI, HTTPException, Query, status
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import PlainTextResponse
from fastapi.responses import HTMLResponse, PlainTextResponse
from fastapi.security.api_key import APIKeyHeader
from pydantic import BaseModel

from datacontract.data_contract import DataContract, ExportFormat
from datacontract.model.run import Run
Expand Down Expand Up @@ -223,6 +227,13 @@
"url": "https://cli.datacontract.com/#lint",
},
},
{
"name": "diff",
"externalDocs": {
"description": "Documentation",
"url": "https://cli.datacontract.com/#diff",
},
},
{
"name": "export",
"externalDocs": {
Expand Down Expand Up @@ -358,6 +369,60 @@ async def lint(
return {"result": lint_result.result, "checks": lint_result.checks}


class DiffRequest(BaseModel):
v1: str = DATA_CONTRACT_EXAMPLE_PAYLOAD
v2: str = DATA_CONTRACT_EXAMPLE_PAYLOAD


@app.post(
"/diff",
tags=["diff"],
summary="Show a diff between two data contracts.",
description="""
Compare two ODCS data contract YAMLs and return a diff report.
POST a JSON body with `v1` (source/before) and `v2` (target/after) as YAML strings.
Use the `format` query parameter to choose between `text` (default) and `html` output.
""",
)
async def diff(
body: DiffRequest,
api_key: Annotated[str | None, Depends(api_key_header)] = None,
format: Annotated[
Literal["text", "html"],
Query(description="Output format: 'text' (default) or 'html'."),
] = "text",
):
check_api_key(api_key)
from datacontract.reports.diff.contract_diff_report import ContractDiffReport
from datacontract.reports.diff.diff import ContractDiff
from datacontract.reports.diff.html_contract_diff_renderer import HtmlContractDiffRenderer
from datacontract.reports.diff.text_contract_diff_renderer import TextContractDiffRenderer

with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f1:
f1.write(body.v1)
v1_path = f1.name
with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f2:
f2.write(body.v2)
v2_path = f2.name

try:
contracts_diff = ContractDiff().generate(v1_path, v2_path)
report_data = ContractDiffReport().build_report_data(contracts_diff, source_label="v1", target_label="v2")
if format == "html":
content = HtmlContractDiffRenderer(report_data=report_data).render()
return HTMLResponse(content=content)
else:
content = TextContractDiffRenderer(report_data=report_data).render()
return PlainTextResponse(content=content)
except yaml.YAMLError as e:
raise HTTPException(status_code=422, detail=f"Invalid YAML: {e}")
except pydantic.ValidationError as e:
raise HTTPException(status_code=422, detail=f"Invalid data contract: {e}")
finally:
Comment thread
davidb-tada marked this conversation as resolved.
os.unlink(v1_path)
os.unlink(v2_path)


@app.post(
"/export",
tags=["export"],
Expand Down
43 changes: 43 additions & 0 deletions datacontract/cli.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import logging
import os
import sys
from enum import Enum
from importlib import metadata
from pathlib import Path
from typing import Iterable, List, Optional
Expand Down Expand Up @@ -127,6 +128,48 @@ def enable_debug_logging(debug: bool):
)


class DiffOutputFormat(str, Enum):
text = "text"
html = "html"


@app.command(name="diff")
def diff(
v1: Annotated[
str,
typer.Argument(help="The location (path) of the source (before) data contract YAML."),
],
v2: Annotated[
str,
typer.Argument(help="The location (path) of the target (after) data contract YAML."),
],
format: Annotated[
DiffOutputFormat,
typer.Option(help="The output format for the diff report."),
] = DiffOutputFormat.text,
output: Annotated[
Optional[Path],
typer.Option(
help="Specify the file path where the diff report will be saved. If no path is provided, the output will be printed to stdout."
),
] = None,
debug: debug_option = None,
):
"""
Show a diff between two data contracts.
"""
enable_debug_logging(debug)

from datacontract.reports.diff.contract_diff_report import ContractDiffReport

ContractDiffReport().generate(
v1_path=v1,
v2_path=v2,
fmt=format.value,
output_path=str(output) if output is not None else None,
)


@app.command(name="test")
def test(
location: Annotated[
Expand Down
Empty file.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of "common" packages. Mostly there is not much in common. Please move the logic to the actual reports.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - The intent was not explicit here: I am laying the foundation for the next PR that will focus on breaking changes.

Both diff and breaking will be sharing that rendering logic, report layout and UX, ensuring they are aligned on these aspects. This is the reason common is being introduced here.

Copy link
Copy Markdown
Collaborator

@jschoedl jschoedl Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think they fit better in the reports directory directly. This makes it similarly obvious that they are not specific to either diff or breaking. 2 shared files are not really a reason for a common package.

Also, there are many functions that are only called in one of html_contract_diff_renderer or text_contract_diff_renderer (and in the tests). Please move those to the place where they are called, as @jochenchrist said. Only when they are actually used for the breaking logic, they should be moved into a third file.

Empty file.
Loading