Skip to content

Port #125 (preserve XML-fragment markup) to v2/lutaml-integration#127

Merged
andrew2net merged 2 commits into
lutaml-integrationfrom
port-preserve-xml-markup-v2
May 20, 2026
Merged

Port #125 (preserve XML-fragment markup) to v2/lutaml-integration#127
andrew2net merged 2 commits into
lutaml-integrationfrom
port-preserve-xml-markup-v2

Conversation

@andrew2net
Copy link
Copy Markdown
Contributor

Summary

  • Ports fd20c9d (Preserve XML-fragment markup in Bibcollection title/author #125, Preserve XML-fragment markup in Bibcollection title/author) onto the v2 lutaml-integration branch.
  • Switches Bibcollection.from_xml to read collection title and author via inner_html (new ElementFinder#find_html) so XML-fragment markup and entities survive into the in-memory model.
  • Applies | strip_html on the <head><title> tag position in _index.liquid so the browser tab title stays plain text while the coverpage title keeps its inline markup.

Skipped vs the original commit (intentional on v2):

  • relaton-cli.gemspec: original bumped relaton to ~> 1.20.3; v2 is on ~> 2.1.0 and stays there.
  • .github/workflows/rake.yml: workflow_dispatch: is already present on this branch.

Refs metanorma/isodoc#785.

Test plan

  • bundle exec rspec spec/relaton/cli/xml_to_html_renderer_spec.rb — 8 examples, 0 failures, including the new index-with-markup.xml regression context.
  • Full bundle exec rspec on CI.

🤖 Generated with Claude Code

Port of fd20c9d (#125) to the v2/lutaml-integration branch.

Switch Bibcollection.from_xml to read the collection title and author
via inner_html instead of Nokogiri's .text, so the in-memory strings
keep their XML-fragment form (markup + entities intact). Apply the
strip_html Liquid filter on the HTML <title> tag position so the
browser tab title stays plain text. Adds find_html to ElementFinder
alongside find_text. Adds a regression spec with markup and &amp; in
both the collection title and the author name.

Refs metanorma/isodoc#785.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andrew2net andrew2net merged commit 93a9b49 into lutaml-integration May 20, 2026
11 checks passed
andrew2net added a commit that referenced this pull request May 29, 2026
Port the write-path fix from #128 to v2/lutaml-integration. The read-path
half of #128 (find_html + strip_html) was already ported via #127; this
commit ports the remaining to_xml escaping.

bibcollection.rb to_xml was writing the collection title and author
directly into XML without escaping, producing bare & in the output
when the values came from YAML (e.g. name: "A test & playground ...").
A bare & is invalid XML; libxml2 in recovery mode emits FATAL
"xmlParseEntityRef: no name" and then silently drops all subsequent
&amp; entities in the same document — corrupting every individual
document title's & in the collection index HTML output.

Add a private xml_escape helper that escapes only unencoded & (not
already-encoded &amp;, &#nnn;, &#xhh;) and leaves inline markup tags
(<em>, <strong>, etc.) untouched, so valid HTML fragments round-tripped
via find_html pass through unchanged.

Fixes metanorma/isodoc#785: metanorma/isodoc#785

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants