Skip to content

fix: preserve carriage returns inside CDATA#835

Open
viraatdas wants to merge 2 commits into
NaturalIntelligence:masterfrom
viraatdas:libra/issues/2-carriage-return-characters-r-are-incorrectly-rep
Open

fix: preserve carriage returns inside CDATA#835
viraatdas wants to merge 2 commits into
NaturalIntelligence:masterfrom
viraatdas:libra/issues/2-carriage-return-characters-r-are-incorrectly-rep

Conversation

@viraatdas

Copy link
Copy Markdown

Fixes #512

Problem

parseXml normalizes line endings with xmlData.replace(/\r\n?/g, "\n") at the top of OrderedObjParser, which also rewrites \r inside CDATA sections. CDATA content must be preserved byte for byte, so a literal carriage return inside CDATA comes back as \n.

Fix

Replace the blanket normalization with a CDATA-aware pass that skips CDATA sections (and tags/comments) and normalizes line endings only outside CDATA. Carriage returns inside CDATA are now preserved; everything else still normalizes as before. Edits src/xmlparser/OrderedObjParser.js (source), not the generated bundle.

Tests

Added a regression test ("should preserve carriage returns in CDATA") that fails before the change and passes after, plus a guard test for normalization outside CDATA. Full jasmine suite passes.

libra-exla[bot] and others added 2 commits June 1, 2026 00:23
The 'should normalize carriage returns outside CDATA' test passed on
master both with and without the source change, so it guarded no
behavior. Keep the 'should preserve carriage returns in CDATA' test,
which exercises the actual fix.
@amitguptagwl

Copy link
Copy Markdown
Member

Thanks for the PR. new line and carriage return were being removed in starting to simplify regex if I remember correctly. But now most of the code is not using regex. So I'm not sure if we need to remove them at all. Will check once

@amitguptagwl

Copy link
Copy Markdown
Member

I went through this and I felt that we need 2 more conditions to handle. Current edit suggest to handle CDATA only but the new line problem is real for other tags too like pre or script for HTML. Same with XML tags with xml:space="preserve". So I was wondering if we can provide an option for the user

preserveSpaceFor : ["pre","script","style"]

It should by default support CDATA and xml:space="preserve". What is your view on this?

But this will not be backward comparible. so we'll have to introduce a flag or rather support Expressions

preserveSpaceFor : ["..pre","[xml:space="preserve"]"]

But expressions support at the time of traversing not before parsing starts. I think this need to be designed properly. However, @nodable/flexible-xml-parser would be easier to have this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Carriage Return characters (\r) are incorrectly replaced with Newline characters (\n)

2 participants