Added changelog command and API endpoint#1118
Added changelog command and API endpoint#1118davidb-tada wants to merge 4 commits intodatacontract:mainfrom
changelog command and API endpoint#1118Conversation
7968019 to
e139e55
Compare
| export_args=kwargs, | ||
| ) | ||
|
|
||
| def _to_odcs_dict(self) -> dict: |
There was a problem hiding this comment.
This should not be part of the central data_contract file.
There was a problem hiding this comment.
Moving it outside the class entirely would break encapsulation since it accesses private instance attributes (_data_contract, _data_contract_file, _data_contract_str).
An option would be to nest this inside changelog() , although it might hurts readability.
Here is what we have, let me know where you want to go with this:
Option A — current (class method)
- Keeps encapsulation; accesses
_data_contract,_data_contract_file,_data_contract_strcleanly
Option B — moved to changelog.py
- Breaks encapsulation; private attributes would be exposed across module boundary
Option C — nested inside changelog()
- Preserves encapsulation but hurts readability; buries a reusable helper
There was a problem hiding this comment.
The diff function or the build_changelog could take OpenDataContractStandard parameters.
|
Just the one remark. rest is OK. |
… report Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
e139e55 to
7019c1a
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bab986f to
eb719af
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| v1_label = self._data_contract_file or "" | ||
| v2_label = other._data_contract_file or "" | ||
|
|
||
| raw_diff = diff(self._to_odcs_dict(), other._to_odcs_dict()) |
There was a problem hiding this comment.
We should move the logic to the changelog file to testability and to keep this data_contract.py small.
feat(changelog): semantic changelog for ODCS data contracts with text report
Note
This PR replaces #1112, and implements change requests from the original PR review.
It has a narrower scope, and does not implement the text and html renderers.
The functionality is exposed as
changelograther thandiff. The changelog output and associated code are more closely aligned to the existing code base.Summary
Reintroducing the much loved
changelogfunctionality, with a wider capability including semantic diff with field level granularity. This PR adds adatacontract changelogcommand that produces a semantic changelog of two data contract YAML files, with a plain-text report format.Note: Breaking change classification (which changes are breaking vs non-breaking) is out of scope for this PR and will follow in a subsequent one.
# Plain-text changelog to stdout datacontract changelog v1.odcs.yaml v2.odcs.yamlA
POST /changelogendpoint is also added to the API server, accepting the two contracts as YAML strings in the request body.Motivation
When evolving a data contract, understanding what changed between two versions is a prerequisite to assessing impact. This provides a structured, human-readable changelog rather than a raw YAML text diff, with changes grouped by field path and presented at both a summary (rolled-up) and detail (leaf) level.
Screenshot
How it works
The diff engine normalises both contracts before diffing — converting named lists (e.g.
schema[],customProperties[],authoritativeDefinitions[]) from positional arrays to dicts keyed by their natural identifier.This eliminates false positives from list reordering and produces stable, meaningful field paths in the output e.g.
schema.orders.properties.order_id.logicalType Changedrather than
schema[0].properties[1].logicalType ChangedNatural key limitation and upstream path - Future Considerations
This section provides context around the current semantic / natural key implementation and envisaged improvements.
The natural key for each named list — the field used to key items when converting arrays to dicts before diffing — is currently hardcoded in
normalize.py. Each entry in the table below was determined by inspecting therequiredarray in the ODCS v3 JSON Schema and selecting the field that acts as the stable semantic identifier:SchemaObjectnamerequired: [name]in JSON schemaSchemaPropertynamerequired: [name]in JSON schemaServerserverrequired: [server, type]—serverchosen as identifierSLAPropertypropertyrequired: [property, value]—propertychosen as identifierCustomPropertypropertyrequired: [property, value]—propertychosen as identifierRolerolerequired: [role]in JSON schemaSupportItemchannelrequired: [channel]in JSON schemaTeamMemberusernamerequired: [username]in JSON schemaDataQualitynamerequiredin JSON schema —nameinferred; positional fallback if absentThe
DataQualitycase illustrates the root problem: the JSON schema does not declare any field asrequiredfor that object, so there is no machine-readable source of truth from which the natural key can be derived. The key must instead be inferred from domain knowledge and hardcoded with a fallback.The correct long-term fix is for the natural key to be declared in the spec itself — specifically, by ensuring that each named-list entity type in
odcs-json-schema-v3.1.0.jsonhas arequiredarray whose first entry is unambiguously its stable identifier. That metadata would then flow through to the Pydantic model package as a non-optional field, allowingnormalize.pyto derive the natural key table dynamically by reflecting overmodel_fieldsrather than maintaining it by hand.If this PR is merged, the author intends to raise issues and submit PRs to both upstream projects —
bitol-io/open-data-contract-standardto add the missingrequireddeclarations to the JSON schema, anddatacontract/open-data-contract-standard-pythonto surface those as non-optional Pydantic fields — which would allow a follow-up PR here to replace the hardcoded key table with fully derived logic.Changes
datacontract/changelog/changelog.py— ODCS semantic diff and changelog builderdatacontract/changelog/normalize.py— pre-diff normalization (named lists → keyed dicts)datacontract/output/text_changelog_results.py— plain-text rendererdatacontract/model/changelog.py—ChangelogResultandChangelogEntrymodelsdatacontract/data_contract.py—DataContract.changelog()methoddatacontract/api.py—POST /changelogendpointdatacontract/cli.py—datacontract changelogcommandtests/test_changelog*.py— comprehensive test suite covering diff engine, normalization, rendering, and APItests/fixtures/changelog/*— fixture files for ODCS changelog testingtests/fixtures/breaking/*— removed leftover fixture files from the DCS-based implementationAPI.md— changelog endpoint documentationCHANGELOG.md— unreleased entryREADME.md— updated with new changelog functionalitypyproject.toml— dependency updatesTesting
pytest tests/test_changelog*.py tests/test_cli.py tests/test_api.py199 tests, all passing.