Skip to content

Added changelog command and API endpoint#1118

Open
davidb-tada wants to merge 4 commits intodatacontract:mainfrom
davidb-tada:feat/semantic-diff-writer
Open

Added changelog command and API endpoint#1118
davidb-tada wants to merge 4 commits intodatacontract:mainfrom
davidb-tada:feat/semantic-diff-writer

Conversation

@davidb-tada
Copy link
Copy Markdown

@davidb-tada davidb-tada commented Apr 3, 2026

feat(changelog): semantic changelog for ODCS data contracts with text report

Note

This PR replaces #1112, and implements change requests from the original PR review.

It has a narrower scope, and does not implement the text and html renderers.
The functionality is exposed as changelog rather than diff. The changelog output and associated code are more closely aligned to the existing code base.

Summary

Reintroducing the much loved changelog functionality, with a wider capability including semantic diff with field level granularity. This PR adds a datacontract changelog command that produces a semantic changelog of two data contract YAML files, with a plain-text report format.

Note: Breaking change classification (which changes are breaking vs non-breaking) is out of scope for this PR and will follow in a subsequent one.

# Plain-text changelog to stdout
datacontract changelog v1.odcs.yaml v2.odcs.yaml

A POST /changelog endpoint is also added to the API server, accepting the two contracts as YAML strings in the request body.

Motivation

When evolving a data contract, understanding what changed between two versions is a prerequisite to assessing impact. This provides a structured, human-readable changelog rather than a raw YAML text diff, with changes grouped by field path and presented at both a summary (rolled-up) and detail (leaf) level.

Screenshot

image

How it works

The diff engine normalises both contracts before diffing — converting named lists (e.g. schema[], customProperties[], authoritativeDefinitions[]) from positional arrays to dicts keyed by their natural identifier.

This eliminates false positives from list reordering and produces stable, meaningful field paths in the output e.g.
schema.orders.properties.order_id.logicalType Changed
rather than
schema[0].properties[1].logicalType Changed

Natural key limitation and upstream path - Future Considerations

This section provides context around the current semantic / natural key implementation and envisaged improvements.

The natural key for each named list — the field used to key items when converting arrays to dicts before diffing — is currently hardcoded in normalize.py. Each entry in the table below was determined by inspecting the required array in the ODCS v3 JSON Schema and selecting the field that acts as the stable semantic identifier:

Entity Natural key Source
SchemaObject name required: [name] in JSON schema
SchemaProperty name required: [name] in JSON schema
Server server required: [server, type]server chosen as identifier
SLAProperty property required: [property, value]property chosen as identifier
CustomProperty property required: [property, value]property chosen as identifier
Role role required: [role] in JSON schema
SupportItem channel required: [channel] in JSON schema
TeamMember username required: [username] in JSON schema
DataQuality name no required in JSON schema — name inferred; positional fallback if absent

The DataQuality case illustrates the root problem: the JSON schema does not declare any field as required for that object, so there is no machine-readable source of truth from which the natural key can be derived. The key must instead be inferred from domain knowledge and hardcoded with a fallback.

The correct long-term fix is for the natural key to be declared in the spec itself — specifically, by ensuring that each named-list entity type in odcs-json-schema-v3.1.0.json has a required array whose first entry is unambiguously its stable identifier. That metadata would then flow through to the Pydantic model package as a non-optional field, allowing normalize.py to derive the natural key table dynamically by reflecting over model_fields rather than maintaining it by hand.

If this PR is merged, the author intends to raise issues and submit PRs to both upstream projects — bitol-io/open-data-contract-standard to add the missing required declarations to the JSON schema, and datacontract/open-data-contract-standard-python to surface those as non-optional Pydantic fields — which would allow a follow-up PR here to replace the hardcoded key table with fully derived logic.

Changes

  • datacontract/changelog/changelog.py — ODCS semantic diff and changelog builder
  • datacontract/changelog/normalize.py — pre-diff normalization (named lists → keyed dicts)
  • datacontract/output/text_changelog_results.py — plain-text renderer
  • datacontract/model/changelog.pyChangelogResult and ChangelogEntry models
  • datacontract/data_contract.pyDataContract.changelog() method
  • datacontract/api.pyPOST /changelog endpoint
  • datacontract/cli.pydatacontract changelog command
  • tests/test_changelog*.py — comprehensive test suite covering diff engine, normalization, rendering, and API
  • tests/fixtures/changelog/* — fixture files for ODCS changelog testing
  • tests/fixtures/breaking/* — removed leftover fixture files from the DCS-based implementation
  • API.md — changelog endpoint documentation
  • CHANGELOG.md — unreleased entry
  • README.md — updated with new changelog functionality
  • pyproject.toml — dependency updates

Testing

pytest tests/test_changelog*.py tests/test_cli.py tests/test_api.py

199 tests, all passing.

  • Tests pass
  • ruff format
  • README.md updated
  • CHANGELOG.md entry added

export_args=kwargs,
)

def _to_odcs_dict(self) -> dict:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be part of the central data_contract file.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it outside the class entirely would break encapsulation since it accesses private instance attributes (_data_contract, _data_contract_file, _data_contract_str).

An option would be to nest this inside changelog() , although it might hurts readability.

Here is what we have, let me know where you want to go with this:

Option A — current (class method)

  • Keeps encapsulation; accesses _data_contract, _data_contract_file, _data_contract_str cleanly

Option B — moved to changelog.py

  • Breaks encapsulation; private attributes would be exposed across module boundary

Option C — nested inside changelog()

  • Preserves encapsulation but hurts readability; buries a reusable helper

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff function or the build_changelog could take OpenDataContractStandard parameters.

@jochenchrist
Copy link
Copy Markdown
Contributor

Just the one remark. rest is OK.

… report

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@davidb-tada davidb-tada force-pushed the feat/semantic-diff-writer branch from e139e55 to 7019c1a Compare April 4, 2026 11:01
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@davidb-tada davidb-tada force-pushed the feat/semantic-diff-writer branch from bab986f to eb719af Compare April 5, 2026 09:28
davidb-tada and others added 2 commits April 6, 2026 13:10
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
v1_label = self._data_contract_file or ""
v2_label = other._data_contract_file or ""

raw_diff = diff(self._to_odcs_dict(), other._to_odcs_dict())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move the logic to the changelog file to testability and to keep this data_contract.py small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants