LLSD serialize: fix crash/correctness bugs and optimize hot paths#306
Conversation
These parsers run on untrusted, server-fed data (caps responses, object cache, mesh/navmesh assets) in all three formats, so harden the defects and speed up the per-byte loops. Memory-safety / crash: - unzip_llsdNavMesh: on a mid-stream inflate error the cleanup freed result+in then fell through and realloc()'d the freed result and double-freed both after the loop. Return immediately; also check inflateInit2. - Binary parseArray trusted the wire size field for emptyReservedArray(): 0xFFFFFFFF sign-extended into a multi-GB reserve(), 0x7FFFFFFF OOM'd. Reject negative sizes; cap preallocation at 4096. - Notation b16: odd hex-digit counts read past the chunk terminator into uninitialized stack; an unterminated b16"... infinite-looped while growing memory. Both fail cleanly now. - Negative b(len)/s(len) sizes sign-extended into huge resize() requests. - isspace/isalpha were called on negative char values (high-bit bytes in untrusted input), UB on the MSVC ctype tables; keep the int from peek()/get(). Correctness: - Notation reals used the default stream precision (6 digits), silently corrupting any real that needs more on the round trip. Set max_digits10. - Binary parser accepted truncated i/r/u payloads as zero-valued success and silently swallowed a bogus map-key marker as an empty key, desyncing the stream. Both fail now. - deserialize() with SIZE_UNLIMITED computed (-1 - header_len) as the byte budget, failing any sized payload behind a header. - b64"" set failbit; <binary /> parsed to undef instead of empty Binary. - strip_deprecated_header left the post-header newline (which the binary parser rejects) and misreported the skipped byte count. - XML with no <llsd> element returned PARSE_FAILURE only by accident; made explicit via mSawLLSDElement. Performance: - deserialize_string_delim and get_till_eol: per-char istream::get() with sentry -> streambuf::sbumpc() into std::string. - Dropped the boost::regex whitespace strip per <binary> element; simdutf forgiving-base64 skips whitespace and sizes exactly. - serialize_string / XML escapeString: run-based bulk writes. - b16 formatting via nibble LUT + single write; pretty indents via fill ctor; skipped XML content no longer buffered. Adds tests for the hostile-input, edge-case, and round-trip paths above (llsdserialize suite 83/83). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughHardens LLSD parsing/formatting: consistent llssize byte-budgeting, safer stream/ctype usage, validated binary/notation sizes and markers, streambuf-based string IO, exact Real round-trip formatting, XML/base64 changes, improved zlib error handling, and expanded regression tests. ChangesLLSD Serialization Robustness & Performance
🎯 4 (Complex) | ⏱️ ~75 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@indra/llcommon/llsdserialize.cpp`:
- Around line 885-915: The code reads entire quoted payloads (string delimiters,
"b64"/"b16" literal bodies) into memory before calling account(), allowing
attackers to exceed mMaxBytesLeft; fix by threading the remaining-byte budget
into deserialize_string_delim() and into the base64/base16 read loops used in
the b64/b16 branches so each character read decrements and checks the remaining
budget immediately and returns fail when it would go negative instead of calling
account() after buffering; update calls to deserialize_string_delim(...) to
accept a remaining/limit parameter (propagating from mMaxBytesLeft or
remaining), and in the b64/b16 loops decrement/check that same budget on each
sbumpc() iteration and abort with istr.setstate(failbit) or return false when
the budget is exceeded, keeping the existing account(...) calls but ensuring
they cannot be bypassed by large quoted literals.
In `@indra/llcommon/tests/llsdserialize_test.cpp`:
- Around line 1514-1515: The test currently uses ensureParse("empty b64",
"b64\"\"", LLSD(LLSD::Binary()), 1) but the parser must reject empty base64;
change the test so it asserts a parse failure for the input "b64\"\"" instead of
expecting an empty LLSD::Binary. Locate the call to ensureParse in
llsdserialize_test.cpp and replace it with the appropriate failure assertion
(e.g. use the test helper that expects failure or set the expected-failure flag)
so that "b64\"\"" sets the parser failbit rather than producing LLSD::Binary().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1e22ff4a-120b-40c2-a485-639b98f505a4
📒 Files selected for processing (4)
indra/llcommon/llsdserialize.cppindra/llcommon/llsdserialize.hindra/llcommon/llsdserialize_xml.cppindra/llcommon/tests/llsdserialize_test.cpp
LLSDNotationFormatter now emits Reals at max_digits10 so they round-trip exactly; the golden notation string still expected the old 6-digit form. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
db64020 to
2b5a363
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
indra/llcommon/llsdserialize.cpp (1)
1708-1708: ⚡ Quick winMissing explicit cast for delimiter parameter.
The call to
deserialize_string_delim(istr, value, c)passesc(anintfromistr.get()) directly, causing implicit narrowing conversion. Other call sites at lines 667, 690 explicitly cast tocharfor consistency and clarity.♻️ Proposed fix
- rv = deserialize_string_delim(istr, value, c); + rv = deserialize_string_delim(istr, value, (char)c);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@indra/llcommon/llsdserialize.cpp` at line 1708, Call site passes the int variable c (from istr.get()) into deserialize_string_delim causing an implicit narrowing; change the call at deserialize_string_delim(istr, value, c) to explicitly cast c to char (e.g., static_cast<char>(c)) so it matches other call sites and avoids implicit conversion — update the call where c is used and ensure the variable name and function deserialize_string_delim are referenced consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@indra/llcommon/llsdserialize.cpp`:
- Line 1708: Call site passes the int variable c (from istr.get()) into
deserialize_string_delim causing an implicit narrowing; change the call at
deserialize_string_delim(istr, value, c) to explicitly cast c to char (e.g.,
static_cast<char>(c)) so it matches other call sites and avoids implicit
conversion — update the call where c is used and ensure the variable name and
function deserialize_string_delim are referenced consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7f56ee3f-ffd5-475f-8a7d-d8f22e3ee0f4
📒 Files selected for processing (3)
indra/llcommon/llsdserialize.cppindra/llcommon/tests/llsdserialize_test.cppindra/llcommon/tests/stringize_test.cpp
💤 Files with no reviewable changes (1)
- indra/llcommon/tests/llsdserialize_test.cpp
Summary
Hardening and optimization pass over the LLSD XML / Binary / Notation serializers and parsers. These run on untrusted, server-fed data (capability responses, the object cache, mesh/navmesh assets) in all three formats and are hot paths, so this fixes the crash/correctness defects and speeds up the per-byte loops.
Memory-safety / crash fixes
unzip_llsdNavMeshdouble-free / use-after-free — on a mid-stream inflate error (corrupt gzip from a server-fed navmesh) the cleanup freedresultandin, then fell through andrealloc()'d the freedresultand freed both a second time after the loop. Now returns immediately;inflateInit2result is also checked.emptyReservedArray(size)took the size field verbatim:0xFFFFFFFFsign-extended into a multi-GBreserve()(uncaughtlength_error),0x7FFFFFFFwas a 16 GB OOM. Negative sizes now fail; preallocation is capped at 4096 andappend()grows past it.b16overrun / infinite loop — odd hex-digit counts read past the chunk terminator into uninitialized stack and could overrun the write buffer; an unterminatedb16"...looped forever while growing memory. Both fail cleanly.b(len)/s(len)sizes sign-extended into hugeresize()requests. NowPARSE_FAILURE.isspace/isalphawere called on negativecharvalues (high-bit bytes in untrusted input), UB on the MSVC ctype table. The parse loops keep theintfrompeek()/get().Correctness fixes
format()now setsmax_digits10(17); an installedrealFormatstill wins.i/r/upayloads parsed as zero-valued success (onlydfailed), and a bogus map-key marker was silently swallowed as an empty key, desynchronizing the stream. Both now fail.deserialize()+SIZE_UNLIMITEDcomputed(-1 − header_len)as the byte budget, failing any sized payload behind a header.b64""set failbit;<binary />parsed to undef instead of an emptyLLSD::Binary(regression vs the old apr path). Both fixed.strip_deprecated_headerleft the'\n'after<? LLSD/Binary ?>(which the binary parser rejects) and reported 18 bytes while skipping 17. Now consumes the newline and reports the actual count.<llsd>element returnedPARSE_FAILUREonly by accident (a bogus(char)EOFbyte forced an expat error); made explicit viamSawLLSDElement.Performance
deserialize_string_delim(every notation string + map key) andget_till_eol(every byte of every caps/EventPoll XML body): per-charistream::get()with sentry →streambuf::sbumpc()into astd::string.boost::regexwhitespace strip per<binary>element — simdutf forgiving-base64 skips whitespace natively andbinary_length_from_base64sizes exactly.serialize_stringand XMLescapeString: per-charostream<<→ run-based bulk writes.b16formatting via a nibble LUT + single write; pretty indents via the fill constructor; skipped XML content no longer buffered.Deliberately unchanged
'd') stays host-byte-order on both sides — matches the sim's C++ implementation; "fixing" it would break wire compat.Tests
Adds coverage for the hostile-input, edge-case, and round-trip paths above (hostile size fields, truncated scalars,
b16/b64edges,<binary/>,SIZE_UNLIMITED+ header, header strip,zip_llsd/navmesh-gzip corruption round-trips).llsdserializesuite 83/83;commonmisc,llsd,llsdutil,llbase64,io,llsdmessagebuilder/reader,llmessageconfigall pass. Full tree builds clean.🤖 Generated with Claude Code