Skip to content

LLSD serialize: fix crash/correctness bugs and optimize hot paths#306

Merged
RyeMutt merged 2 commits into
developfrom
rye/llsd-serialize-hardening
Jun 11, 2026
Merged

LLSD serialize: fix crash/correctness bugs and optimize hot paths#306
RyeMutt merged 2 commits into
developfrom
rye/llsd-serialize-hardening

Conversation

@RyeMutt

@RyeMutt RyeMutt commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

Hardening and optimization pass over the LLSD XML / Binary / Notation serializers and parsers. These run on untrusted, server-fed data (capability responses, the object cache, mesh/navmesh assets) in all three formats and are hot paths, so this fixes the crash/correctness defects and speeds up the per-byte loops.

Memory-safety / crash fixes

  • unzip_llsdNavMesh double-free / use-after-free — on a mid-stream inflate error (corrupt gzip from a server-fed navmesh) the cleanup freed result and in, then fell through and realloc()'d the freed result and freed both a second time after the loop. Now returns immediately; inflateInit2 result is also checked.
  • Binary array preallocation trusted the wire sizeemptyReservedArray(size) took the size field verbatim: 0xFFFFFFFF sign-extended into a multi-GB reserve() (uncaught length_error), 0x7FFFFFFF was a 16 GB OOM. Negative sizes now fail; preallocation is capped at 4096 and append() grows past it.
  • Notation b16 overrun / infinite loop — odd hex-digit counts read past the chunk terminator into uninitialized stack and could overrun the write buffer; an unterminated b16"... looped forever while growing memory. Both fail cleanly.
  • Negative b(len) / s(len) sizes sign-extended into huge resize() requests. Now PARSE_FAILURE.
  • UB in ctype callsisspace/isalpha were called on negative char values (high-bit bytes in untrusted input), UB on the MSVC ctype table. The parse loops keep the int from peek()/get().

Correctness fixes

  • Notation reals silently lost precision — the formatter used the default stream precision (6 significant digits), corrupting any real not representable in 6 digits on the round trip (inventory cache, derender lists, poser files, …). format() now sets max_digits10 (17); an installed realFormat still wins.
  • Binary parser accepted truncated scalars — truncated i/r/u payloads parsed as zero-valued success (only d failed), and a bogus map-key marker was silently swallowed as an empty key, desynchronizing the stream. Both now fail.
  • deserialize() + SIZE_UNLIMITED computed (-1 − header_len) as the byte budget, failing any sized payload behind a header.
  • b64"" set failbit; <binary /> parsed to undef instead of an empty LLSD::Binary (regression vs the old apr path). Both fixed.
  • strip_deprecated_header left the '\n' after <? LLSD/Binary ?> (which the binary parser rejects) and reported 18 bytes while skipping 17. Now consumes the newline and reports the actual count.
  • XML with no <llsd> element returned PARSE_FAILURE only by accident (a bogus (char)EOF byte forced an expat error); made explicit via mSawLLSDElement.

Performance

  • deserialize_string_delim (every notation string + map key) and get_till_eol (every byte of every caps/EventPoll XML body): per-char istream::get() with sentry → streambuf::sbumpc() into a std::string.
  • Removed the boost::regex whitespace strip per <binary> element — simdutf forgiving-base64 skips whitespace natively and binary_length_from_base64 sizes exactly.
  • serialize_string and XML escapeString: per-char ostream<< → run-based bulk writes.
  • b16 formatting via a nibble LUT + single write; pretty indents via the fill constructor; skipped XML content no longer buffered.

Deliberately unchanged

  • Binary date ('d') stays host-byte-order on both sides — matches the sim's C++ implementation; "fixing" it would break wire compat.
  • XML formatter keeps precision 25 for settings-file byte stability.

Tests

Adds coverage for the hostile-input, edge-case, and round-trip paths above (hostile size fields, truncated scalars, b16/b64 edges, <binary/>, SIZE_UNLIMITED + header, header strip, zip_llsd/navmesh-gzip corruption round-trips). llsdserialize suite 83/83; commonmisc, llsd, llsdutil, llbase64, io, llsdmessagebuilder/reader, llmessageconfig all pass. Full tree builds clean.

🤖 Generated with Claude Code

These parsers run on untrusted, server-fed data (caps responses, object
cache, mesh/navmesh assets) in all three formats, so harden the defects
and speed up the per-byte loops.

Memory-safety / crash:
- unzip_llsdNavMesh: on a mid-stream inflate error the cleanup freed
  result+in then fell through and realloc()'d the freed result and
  double-freed both after the loop. Return immediately; also check
  inflateInit2.
- Binary parseArray trusted the wire size field for emptyReservedArray():
  0xFFFFFFFF sign-extended into a multi-GB reserve(), 0x7FFFFFFF OOM'd.
  Reject negative sizes; cap preallocation at 4096.
- Notation b16: odd hex-digit counts read past the chunk terminator into
  uninitialized stack; an unterminated b16"... infinite-looped while
  growing memory. Both fail cleanly now.
- Negative b(len)/s(len) sizes sign-extended into huge resize() requests.
- isspace/isalpha were called on negative char values (high-bit bytes in
  untrusted input), UB on the MSVC ctype tables; keep the int from
  peek()/get().

Correctness:
- Notation reals used the default stream precision (6 digits), silently
  corrupting any real that needs more on the round trip. Set max_digits10.
- Binary parser accepted truncated i/r/u payloads as zero-valued success
  and silently swallowed a bogus map-key marker as an empty key,
  desyncing the stream. Both fail now.
- deserialize() with SIZE_UNLIMITED computed (-1 - header_len) as the byte
  budget, failing any sized payload behind a header.
- b64"" set failbit; <binary /> parsed to undef instead of empty Binary.
- strip_deprecated_header left the post-header newline (which the binary
  parser rejects) and misreported the skipped byte count.
- XML with no <llsd> element returned PARSE_FAILURE only by accident;
  made explicit via mSawLLSDElement.

Performance:
- deserialize_string_delim and get_till_eol: per-char istream::get() with
  sentry -> streambuf::sbumpc() into std::string.
- Dropped the boost::regex whitespace strip per <binary> element; simdutf
  forgiving-base64 skips whitespace and sizes exactly.
- serialize_string / XML escapeString: run-based bulk writes.
- b16 formatting via nibble LUT + single write; pretty indents via fill
  ctor; skipped XML content no longer buffered.

Adds tests for the hostile-input, edge-case, and round-trip paths above
(llsdserialize suite 83/83).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d31d371b-a4af-4fb4-89cc-49fc7e0280e8

📥 Commits

Reviewing files that changed from the base of the PR and between db64020 and 2b5a363.

📒 Files selected for processing (1)
  • indra/llcommon/tests/stringize_test.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • indra/llcommon/tests/stringize_test.cpp

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Safer parsing/serialization: stricter input validation, correct byte-budget handling, explicit failures for malformed/truncated data, and rejection of negative/hostile size fields
    • XML parsing requires a top-level element and avoids mis-buffering; deprecated-header handling is more reliable
    • Compression helpers more robust against allocation and inflate errors
  • Performance

    • Faster string and binary I/O with bulk reads/writes and more efficient hex/format output
  • Tests

    • Expanded regression and edge-case tests covering formats, encoding edge cases, compression, and failure modes
  • New Features

    • Notation formatter updated for exact real-number round-tripping

Walkthrough

Hardens LLSD parsing/formatting: consistent llssize byte-budgeting, safer stream/ctype usage, validated binary/notation sizes and markers, streambuf-based string IO, exact Real round-trip formatting, XML/base64 changes, improved zlib error handling, and expanded regression tests.

Changes

LLSD Serialization Robustness & Performance

Layer / File(s) Summary
Public API declaration for notation formatter
indra/llcommon/llsdserialize.h
Declares explicit override of format() method with EFormatterOptions in LLSDNotationFormatter and restores base-class overload visibility.
Parsing budget consistency
indra/llcommon/llsdserialize.cpp
Change parse_using to llssize, compute a remaining byte budget (preserving SIZE_UNLIMITED) for subsequent parsing, and pass it consistently to XML/binary/notation dispatch.
XML parser validation and base64 handling
indra/llcommon/llsdserialize_xml.cpp
Add mSawLLSDElement validation, buffered line reader via rdbuf()->sbumpc(), optimized escapeString, skip-path content discarding, and replace regex-based base64 whitespace stripping with simdutf decoding.
Notation parser ctype safety
indra/llcommon/llsdserialize.cpp
Use int for peek()/ctype classification, cast delimiters for deserialize_string_delim calls, and ensure correct putback() char casting across URI/date/map/array parsing.
Notation binary parsing (b, b64, b16)
indra/llcommon/llsdserialize.cpp
Reject negative b(len) sizes, read b64/b16 encoded payloads directly from streambuf with termination enforcement, treat EOF as parse failure, and validate hex-pair alignment.
Binary parser payload validation
indra/llcommon/llsdserialize.cpp
Mark parse failures on truncated fixed-width reads, reject negative sizes for raw binary/map/array, validate map key markers to avoid desync, and cap array reserve to MAX_RESERVE.
String parsing and serialization optimization
indra/llcommon/llsdserialize.cpp
Refactor deserialize_string_delim to use sbumpc() into a std::string with escape-handling via direct +=, add negative-length guard in deserialize_string_raw, and rewrite serialize_string to emit unescaped runs in bulk.
Notation formatter precision and hex rendering
indra/llcommon/llsdserialize.cpp
Override notation format() to temporarily set ostream precision to std::numeric_limits<F64>::max_digits10 for exact Real round-tripping; optimize indentation and emit uppercase hex using a static table plus ostream::write.
Compression utility error handling
indra/llcommon/llsdserialize.cpp
unzip_llsd returns ZR_MEM_ERROR on output allocation failure; map inflateInit/inflateInit2 failures to error codes and return; simplify inflate loop condition and ensure immediate cleanup/return on inflate errors to avoid double-free; use constexpr deprecated-header size and consistent header_size reporting.
Comprehensive test additions
indra/llcommon/tests/llsdserialize_test.cpp, indra/llcommon/tests/stringize_test.cpp
Add <zlib.h> include and tests for Real round-trips, SIZE_UNLIMITED header handling, deprecated header stripping, zip/unzip and navmesh unzip round-trips and corruption rejection, XML empty <binary>, notation b16/b64/negative-size edge cases, binary robustness tests, and update expected stringize precision.

🎯 4 (Complex) | ⏱️ ~75 minutes

🐰 Bits and bytes now dance with care,
No negative sizes lurking there,
Precision kept so reals return,
Hex shouts loud with uppercase burn,
Zlib wakes guarded, safe and fair.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'LLSD serialize: fix crash/correctness bugs and optimize hot paths' clearly and concisely summarizes the main objective: hardening LLSD serialization with security fixes and performance improvements.
Description check ✅ Passed The description is comprehensive and well-structured, covering memory-safety fixes, correctness improvements, performance optimizations, and test coverage. It exceeds the template requirements with detailed technical context and explicitly references deliberately unchanged behavior.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@indra/llcommon/llsdserialize.cpp`:
- Around line 885-915: The code reads entire quoted payloads (string delimiters,
"b64"/"b16" literal bodies) into memory before calling account(), allowing
attackers to exceed mMaxBytesLeft; fix by threading the remaining-byte budget
into deserialize_string_delim() and into the base64/base16 read loops used in
the b64/b16 branches so each character read decrements and checks the remaining
budget immediately and returns fail when it would go negative instead of calling
account() after buffering; update calls to deserialize_string_delim(...) to
accept a remaining/limit parameter (propagating from mMaxBytesLeft or
remaining), and in the b64/b16 loops decrement/check that same budget on each
sbumpc() iteration and abort with istr.setstate(failbit) or return false when
the budget is exceeded, keeping the existing account(...) calls but ensuring
they cannot be bypassed by large quoted literals.

In `@indra/llcommon/tests/llsdserialize_test.cpp`:
- Around line 1514-1515: The test currently uses ensureParse("empty b64",
"b64\"\"", LLSD(LLSD::Binary()), 1) but the parser must reject empty base64;
change the test so it asserts a parse failure for the input "b64\"\"" instead of
expecting an empty LLSD::Binary. Locate the call to ensureParse in
llsdserialize_test.cpp and replace it with the appropriate failure assertion
(e.g. use the test helper that expects failure or set the expected-failure flag)
so that "b64\"\"" sets the parser failbit rather than producing LLSD::Binary().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1e22ff4a-120b-40c2-a485-639b98f505a4

📥 Commits

Reviewing files that changed from the base of the PR and between ce11b86 and 4c42ed3.

📒 Files selected for processing (4)
  • indra/llcommon/llsdserialize.cpp
  • indra/llcommon/llsdserialize.h
  • indra/llcommon/llsdserialize_xml.cpp
  • indra/llcommon/tests/llsdserialize_test.cpp

Comment thread indra/llcommon/llsdserialize.cpp
Comment thread indra/llcommon/tests/llsdserialize_test.cpp
LLSDNotationFormatter now emits Reals at max_digits10 so they round-trip
exactly; the golden notation string still expected the old 6-digit form.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@RyeMutt RyeMutt force-pushed the rye/llsd-serialize-hardening branch from db64020 to 2b5a363 Compare June 11, 2026 17:34

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
indra/llcommon/llsdserialize.cpp (1)

1708-1708: ⚡ Quick win

Missing explicit cast for delimiter parameter.

The call to deserialize_string_delim(istr, value, c) passes c (an int from istr.get()) directly, causing implicit narrowing conversion. Other call sites at lines 667, 690 explicitly cast to char for consistency and clarity.

♻️ Proposed fix
-        rv = deserialize_string_delim(istr, value, c);
+        rv = deserialize_string_delim(istr, value, (char)c);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@indra/llcommon/llsdserialize.cpp` at line 1708, Call site passes the int
variable c (from istr.get()) into deserialize_string_delim causing an implicit
narrowing; change the call at deserialize_string_delim(istr, value, c) to
explicitly cast c to char (e.g., static_cast<char>(c)) so it matches other call
sites and avoids implicit conversion — update the call where c is used and
ensure the variable name and function deserialize_string_delim are referenced
consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@indra/llcommon/llsdserialize.cpp`:
- Line 1708: Call site passes the int variable c (from istr.get()) into
deserialize_string_delim causing an implicit narrowing; change the call at
deserialize_string_delim(istr, value, c) to explicitly cast c to char (e.g.,
static_cast<char>(c)) so it matches other call sites and avoids implicit
conversion — update the call where c is used and ensure the variable name and
function deserialize_string_delim are referenced consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7f56ee3f-ffd5-475f-8a7d-d8f22e3ee0f4

📥 Commits

Reviewing files that changed from the base of the PR and between 48b0e7e and db64020.

📒 Files selected for processing (3)
  • indra/llcommon/llsdserialize.cpp
  • indra/llcommon/tests/llsdserialize_test.cpp
  • indra/llcommon/tests/stringize_test.cpp
💤 Files with no reviewable changes (1)
  • indra/llcommon/tests/llsdserialize_test.cpp

@RyeMutt RyeMutt merged commit 54ff67f into develop Jun 11, 2026
17 checks passed
@RyeMutt RyeMutt deleted the rye/llsd-serialize-hardening branch June 11, 2026 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant