Skip to content

fix(xml): decode XML character entities in attribute values (fixes #2877, #1199)#2971

Open
MaxwellM34 wants to merge 1 commit into
software-mansion:mainfrom
MaxwellM34:fix/issue-2877-decode-xml-entities
Open

fix(xml): decode XML character entities in attribute values (fixes #2877, #1199)#2971
MaxwellM34 wants to merge 1 commit into
software-mansion:mainfrom
MaxwellM34:fix/issue-2877-decode-xml-entities

Conversation

@MaxwellM34

Copy link
Copy Markdown

What

Fixes #2877 (and the older, never-fixed #1199 — same root cause).

SVG attribute values containing XML numeric character references like 
 and 
, or the five standard named entities (& < > " '), were passed through unchanged by the JS parser in src/xml.tsx and ended up in the native renderer.

The native side cannot handle raw entity references in attribute values (especially d on <path>) and throws an UnexpectedData error in native code. Because the throw happens on the native side, neither React error boundaries nor the <SvgXml onError> prop can catch it, and the whole app crashes — exactly what the issue describes.

Reproduction

The exact SVG from the issue body, condensed:

<ErrorBoundary>
  <SvgXml
    onError={(e) => console.log('caught:', e)}
    xml={`<svg viewBox="0 0 10 10" xmlns="http://www.w3.org/2000/svg">
      <path d="M0,0&#xD;&#xA;L10,10" fill="none"/>
    </svg>`}
  />
</ErrorBoundary>

Before this PR: native crash, neither the error boundary nor onError fires.
After this PR: path d is decoded to "M0,0 L10,10" before reaching native; renders cleanly.

What I changed

src/xml.tsx:

  1. New exported decodeXmlEntities(value: string): string that handles:
    • The 5 standard XML named entities (amp, lt, gt, quot, apos).
    • Decimal numeric character references (&#NNN;).
    • Hex numeric character references (&#xHHH; / &#XHHH;, including 4-byte code points like &#x1F600;).
    • Unknown / malformed references are left intact rather than silently dropped, so a typo'd entity stays visible in the output instead of disappearing. This was an explicit choice — SvgUri Error. UnexpectedData &#10;&#9;&#9; #1199 originally proposed a regex that stripped them, which I think is the wrong default.
  2. getAttributeValue() now calls decodeXmlEntities on the raw value before returning, so every parsed attribute comes out fully decoded.

__tests__/xml.test.tsx (new file):

All 10 pass. The existing css.test.tsx snapshot failures on this branch are pre-existing on main (stale snapshots from a prior classclassName change) and unrelated to this PR.

Design notes I want to flag for review

  • Why not pull in he (the de-facto JS entity decoder)? Two reasons: this parser is intentionally zero-dependency, and he decodes the full HTML5 named-entity set, which for SVG (XML) is incorrect — &nbsp; etc. are not valid XML entities and should arguably stay raw. The 5-entity XML-strict approach also has a much smaller footprint.
  • Why leave unknown refs intact? Silent stripping is dangerous in a parser people pipe untrusted SVG through — it changes the rendered output in invisible ways. Preserving the literal &nbsp; makes the bad input visible to the developer.
  • Why decode in getAttributeValue rather than wherever the value is consumed? Centralizing it at the parser boundary means every downstream consumer (web view, native view, AST→React transform) sees a normalized string. Decoding at each consumer would be both repetitive and easy to miss.

This contribution was AI-assisted (Claude). The fix and tests were drafted with LLM help and reviewed against the issue repro before submission. Happy to revise anything — or close if the approach isn't what you'd want here.

Fixes software-mansion#2877 and software-mansion#1199.

SVG attribute values containing XML numeric character references
(e.g. &#xD;, &#xA;) or the standard named entities (&amp; &lt; etc)
were passed through unchanged by the JS parser and ended up in the
native renderer, which throws an 'UnexpectedData' error in native
code. Because the throw happens on the native side, neither React
error boundaries nor the <SvgXml onError> prop can catch it — the
whole app crashes.

This commit adds an exported decodeXmlEntities(value) helper and calls
it inside getAttributeValue() so every parsed attribute value is fully
decoded before reaching the AST (and the native renderer).

Decoded:
  - the five standard XML named entities: &amp; &lt; &gt; &quot; &apos;
  - decimal numeric character references: &#NNN;
  - hex numeric character references: &#xHHH; / &#XHHH;
    (including 4-byte code points like &#x1F600;)

Unknown or malformed references are left intact rather than dropped, so
a typo'd entity remains visible in the output rather than silently
disappearing.

Adds 10 tests in __tests__/xml.test.tsx covering the decoder in
isolation plus three integration tests through parse(), including the
exact path-d string from the software-mansion#2877 reproduction case.

Signed-off-by: Maxwellm34 <maxwellmcinnis123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnexpectedData cannot be caught by an error boundary SvgUri Error. UnexpectedData &#10;&#9;&#9;

1 participant