Skip to content

Lutaml integration#175

Draft
andrew2net wants to merge 52 commits into
mainfrom
lutaml-integration
Draft

Lutaml integration#175
andrew2net wants to merge 52 commits into
mainfrom
lutaml-integration

Conversation

@andrew2net
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown

@hound hound Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some files could not be reviewed due to errors:

Error: unrecognized cop Performance/CaseWhenSplat found in .rubocop.yml, unre...
Error: unrecognized cop Performance/CaseWhenSplat found in .rubocop.yml, unrecognized cop Performance/Count found in .rubocop.yml, unrecognized cop Performance/Detect found in .rubocop.yml, unrecognized cop Performance/FlatMap found in .rubocop.yml, unrecognized cop Performance/ReverseEach found in .rubocop.yml, unrecognized cop Performance/Size found in .rubocop.yml, unrecognized cop Performance/StringReplacement found in .rubocop.yml

Copy link
Copy Markdown

@hound hound Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some files could not be reviewed due to errors:

Error: unrecognized cop Performance/CaseWhenSplat found in .rubocop.yml, unre...
Error: unrecognized cop Performance/CaseWhenSplat found in .rubocop.yml, unrecognized cop Performance/Count found in .rubocop.yml, unrecognized cop Performance/Detect found in .rubocop.yml, unrecognized cop Performance/FlatMap found in .rubocop.yml, unrecognized cop Performance/ReverseEach found in .rubocop.yml, unrecognized cop Performance/Size found in .rubocop.yml, unrecognized cop Performance/StringReplacement found in .rubocop.yml

Copy link
Copy Markdown

@hound hound Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some files could not be reviewed due to errors:

Error: unrecognized cop Performance/CaseWhenSplat found in .rubocop.yml, unre...
Error: unrecognized cop Performance/CaseWhenSplat found in .rubocop.yml, unrecognized cop Performance/Count found in .rubocop.yml, unrecognized cop Performance/Detect found in .rubocop.yml, unrecognized cop Performance/FlatMap found in .rubocop.yml, unrecognized cop Performance/ReverseEach found in .rubocop.yml, unrecognized cop Performance/Size found in .rubocop.yml, unrecognized cop Performance/StringReplacement found in .rubocop.yml

Copy link
Copy Markdown

@hound hound Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some files could not be reviewed due to errors:

Error: unrecognized cop Performance/CaseWhenSplat found in .rubocop.yml, unre...
Error: unrecognized cop Performance/CaseWhenSplat found in .rubocop.yml, unrecognized cop Performance/Count found in .rubocop.yml, unrecognized cop Performance/Detect found in .rubocop.yml, unrecognized cop Performance/FlatMap found in .rubocop.yml, unrecognized cop Performance/ReverseEach found in .rubocop.yml, unrecognized cop Performance/Size found in .rubocop.yml, unrecognized cop Performance/StringReplacement found in .rubocop.yml

@andrew2net andrew2net requested a review from ronaldtse April 23, 2025 01:31
The openssl 3.3.0 causes OpenSSL::SSL::SSLError error
Some small bugfixes have also been done
…; update scraper to use except method for params; modify bibliography_spec to enable tests and adjust expectations; fix scraper_spec to enable isoref test; update VCRs
…pubid

Add handling for Pubid::Core::Identifier objects in HitCollection#find and create_pubid methods.
This allows the code to work with Pubid::Core::Identifier instances directly and use Relaton::Index
binary search for better performance when possible.
Using ID keys for conrol of index ID scructure.

Cache parsed index data in tests to avoid re-parsing the zip/YAML every test, improving test performance.
andrew2net and others added 28 commits March 23, 2026 17:08
remove dates outside the cut-off dates
and add corresponding tests
* update Gemfile to use GH version of relaton-bib & lutaml-model 0.8.0

* fix: update document identifier to use pubid for consistency in file writing

* feat: add model ItemData to Bibdata and Bibitem classes; update schema version mapping in Ext class

* Update VCRs

* refactor: update ItemBase class to inherit from Lutaml::Model::Serializable and remove unnecessary attributes

* chore: remove stale commented-out attribute mutation in Contributor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add key_value mapping for attributes in Ext class

* chore: remove unused create_relation method from ItemData class

* fix Gemfile

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the ICS-page scraper-driven DataFetcher (and its persistent
queue/threaded model) with a streaming consumer of
`iso_deliverables_metadata.jsonl` and `iso_technical_committees.jsonl`,
plus a new DataParser that converts one record into an Iso::ItemData.
Adds `iso-open-data` (incremental, gated on upstream Last-Modified) and
`iso-open-data-all` (full refresh) source modes; Scraper is retained
only as a fallback for `Bibliography.get` lookups missing from the
curated index. Refreshes fixtures, cassettes, and docs accordingly.
Build a publicationDate index alongside the reference index so
DataParser can stamp each emitted relation's bibitem with a
`published` date when the related document is itself present in
the Open Data feed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Align spec fixtures and assertions with the Open Data ingest output:
relation types use obsoletedBy/updatedBy, abstracts are flattened,
place uses <city>, and the corrected-date case is skipped since
Open Data does not expose it.
Open Data emits stub records with a `Withdrawn` reference prefix for
abandoned projects (publicationDate: null, stage *.98). The previous
"Withdrawn" → "ISO" rewrite produced strings like "ISO 1701/Add 1" that
pubid-iso can't parse, leaving String docids in the index and crashing
`index.save` whenever a Pubid entry sorted ahead of them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up the `<date type="published">` elements that Open Data ingest
now attaches to related bibitems, and re-records the HTTP cassettes
that drifted in the meantime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ISO's Azure Blob endpoint intermittently returns HTTP 403
AuthorizationFailure, causing the scheduled crawler to fail. Retry up to
4 times (30/60/120/240s) before raising.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant