Harden LLMessageSystem UDP paths, fix packet ring corruption, cut per-packet allocations#304
Conversation
…tore asserts bufferInboundPacket() received straight into the ring slot at mHeadIndex before knowing whether a packet had actually arrived. When the ring is full, that slot holds the oldest *unread* packet, and the would-block receive that terminates every drainSocket() pass reset its size to 0 and desynced the byte/packet accounting. Since the drain path runs exactly when message processing falls behind and the ring pins at its cap, every drain under sustained overload corrupted one buffered packet and leaked its byte count. Receive into a scratch buffer and only commit the slot once we have a real packet, the same way the SOCKS branch already works. This was the single root cause behind all three asserts disabled in cbf99f6 (they were raw assert(), so they only ever fired in Debug). With the clobbering gone they are true invariants again; restore them as llassert with comments on why each holds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Rye <rye@alchemyviewer.org>
…ive stats - zeroCodeExpand() overflow case 3 reset outptr to the buffer start and kept decoding, handing a scrambled packet to the parser. Discard the malformed packet instead, mirroring overflow cases 1 and 2. - sendMessage() computed the appended-ack budget with unsigned math, so an over-MTU payload (ChildAgentUpdate, SendXferPacket) wrapped to a huge space_left and got acks appended despite having no room. Signed math yields a negative count and skips the append. - Record the message number in the receive-count list. It was never set (TODO from the babbage era), so dumpReceiveCounts() -- the diagnostic that fires on message storms -- has always attributed nothing. - operator<< probed mMessageNumbers with operator[], permanently inserting a null entry for every missing id it touched. Use find(). - Log packets at exactly the minimum valid size (>= vs >). - Fix the big-endian MVT_S16Array swizzle iterating n % 2 elements instead of n / 2. Inert on little-endian builds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Rye <rye@alchemyviewer.org>
…truncation When more than 255 bytes were stuffed into a Variable-1 field, addData() NUL-terminated the payload by casting away const and writing into the caller's buffer -- undefined behavior through std::string::c_str() (both addString overloads route here) and a straight crash for a string literal. Clamp the size, let addData() copy the payload, then terminate the *stored* copy. Wire bytes and stored bytes are unchanged; only the caller mutation is gone. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Rye <rye@alchemyviewer.org>
…raffic Hardening: - decodeData() trusted the wire-claimed size of MVT_VARIABLE fields. A malformed or hostile packet could drive the copy up to 64KB past the receive buffer via a 2-byte length, or hand new U8[] a negative size via the 4-byte form. Clamp the claimed size to the bytes remaining in the packet and log it as ran-off-end. Regression test included (test<46>, verified to fail against the unclamped decoder). Allocation overhaul -- previously every variable of every block of every inbound packet cost a map-node insert plus a new U8[] + copy, all freed again at clearMessage(): - LLMsgVarData stores payloads up to 24 bytes inline, sized to cover every fixed wire type (largest is LLVector3d), so only long Variable fields heap-allocate. getData() computes the pointer so vector growth cannot dangle into a moved-from element. - Replace the per-block LLIndexedVector (vector + std::map index) with an insertion-ordered flat vector scanned by prehashed name pointer: same API and iteration order, zero per-entry node allocations. operator[] inserts keyed by name so repeated misses return the same entry, and the reader's getSize() overloads query with find() so lookups never insert at all. - decodeData() takes the variable ref returned by addVariable() instead of re-finding it by name for addData(); reader getData() does one lookup instead of find() + operator[]. test<47> covers the Variable-1 truncation fix from the previous commit: caller buffer untouched, stored copy clamped and NUL-terminated. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Rye <rye@alchemyviewer.org>
…sion getData() now returns the number of bytes it copied, and both getString() overloads NUL-terminate at exactly that length. The std::string overload no longer zero-fills a 1501-byte stack buffer on every chat/name/string read, and the char* overload no longer leaves the span between the copied data and buffer_size-1 uninitialized, where an unterminated wire string could expose stale stack contents to the caller's strlen. Reinstate the outbound zerocode compression stats (TODO'd out since the babbage era, leaving summarizeLogs reporting zeros) by accounting them in LLMessageSystem::sendMessage(), which owns the counters and can detect the buffer swap compressMessage() performs only when compression wins. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Rye <rye@alchemyviewer.org>
…ss the wrap A SOCKS datagram no larger than its header made bufferInboundPacket() report 0, which drainSocket() reads as "socket empty", stranding any packets still queued behind it. Discard the runt but report its raw size so the drain keeps going; the existing accounting counts it as received-but-dropped. checkPacketInID()'s gap filler compared packet ids with plain <, so a small forward gap spanning the 24-bit wrap (expecting 0xFFFFF8, got 0x000002) was misread as an out-of-order arrival: log spam plus a resync that never marked the skipped ids as potentially lost. Use the modular forward distance instead; the fill loop already wraps. A genuinely old id yields a huge modular distance and still takes the out-of-order path. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Rye <rye@alchemyviewer.org>
|
Warning Review limit reached
More reviews will be available in 14 minutes and 5 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR refactors the LL message system to support inline storage for small variable-data payloads, adds bounds-aware decoding to prevent buffer overruns, and strengthens packet ring buffering invariants. The changes include a new ChangesMessage System Safety and Inline Storage
🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@indra/llmessage/llmessagetemplate.cpp`:
- Around line 49-56: The copy at the end of the constructor uses
htolememcpy(dest, data, mType, size) which applies endian-swizzling based on
mType even though new entries created via LLMsgBlkData::addData() are
initialized as MVT_U8; change the call to use the incoming/declared type (the
variable that represents the incoming data's MVT, not mType) so the initial copy
uses that type (e.g., pass the incomingType or dataType variable) when calling
htolememcpy, leaving mType for later updates; update references around
mData/mInlineData/INLINE_DATA_SIZE and htolememcpy accordingly.
In `@indra/llmessage/lltemplatemessagereader.cpp`:
- Around line 676-681: The code zeroes tsize but leaves decode_pos unchanged,
causing remaining payload bytes to be mis-parsed; when the on-wire length is
invalid (tsize > llmax(mReceiveSize - decode_pos, 0)), after calling
logRanOffEndOfPacket (and respecting custom), advance decode_pos to mReceiveSize
(i.e. consume the rest of the packet) and set tsize = 0 so later field reads
will see no bytes rather than reinterpret packet garbage; alternatively you may
choose to bail out by returning an error from the enclosing decode
function—apply this change in the block referencing decode_pos, tsize,
mReceiveSize and logRanOffEndOfPacket.
- Around line 684-685: The code calls vardata.addData(&buffer[decode_pos],
tsize, mvci.getType()) which discards the original wire data_size causing
LLMsgVarData::mDataSize to remain -1; update the call site in
lltemplatemessagereader.cpp so addData is invoked with the original data_size
(not tsize) so the stored LLMsgVarData preserves the wire length-width; this
will restore correct behavior for LLSDMessageBuilder::copyFromMessageData() /
copyToBuilder() round-trips by ensuring getDataSize() reflects the wire format
used when decoding.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 25127de0-c891-4f96-85ec-5b2fcd90e50f
📒 Files selected for processing (11)
indra/llmessage/llcircuit.cppindra/llmessage/llmessagetemplate.cppindra/llmessage/llmessagetemplate.hindra/llmessage/llpacketring.cppindra/llmessage/llsdmessagebuilder.cppindra/llmessage/lltemplatemessagebuilder.cppindra/llmessage/lltemplatemessagereader.cppindra/llmessage/lltemplatemessagereader.hindra/llmessage/message.cppindra/llmessage/message.hindra/llmessage/tests/lltemplatemessagebuilder_test.cpp
…ared type When a variable-length field claims more bytes than remain, advance decode_pos to the end of the packet so subsequent fields zero-fill -- consistent with the other ran-off-end paths -- rather than parsing the malformed field's payload as later fields. LLMsgVarData::addData() now swizzles with the caller-declared type instead of mType, which is still the default MVT_U8 when an entry is created without addVariable() (pre-existing; identical under the old LLIndexedVector). No effect on little-endian builds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Rye <rye@alchemyviewer.org>
Review sweep of
LLMessageSystemand its hot UDP receive/decode/dispatch paths. Six commits, each standalone; defect fixes first, then the allocation work.Security / robustness
lltemplatemessagereader.cpp):decodeData()trusted the wire-claimed size ofMVT_VARIABLEfields. A malformed or hostile packet could drive the copy up to 64KB past the receive buffer via a 2-byte length, or handnew U8[]a negative size via the 4-byte form. The claimed size is now clamped to the bytes remaining in the packet. Regression test included (test<46>, verified to fail against the unclamped decoder).zeroCodeExpand()overflow case 3 kept decoding into a scrambled buffer after resettingoutptr; it now discards the packet like cases 1 and 2.lltemplatemessagebuilder.cpp): oversized payloads were NUL-terminated by casting away const and writing into the caller's buffer — UB throughstd::string::c_str(), a crash for a literal. The stored copy is terminated instead; wire bytes unchanged (test<47>).getString(char*)could expose stale stack: the span between the copied data andbuffer_size-1was left uninitialized, so an unterminated wire string read back as garbage. Both overloads now terminate at the exact copied length (getData()reports it).Packet ring (root cause behind the asserts disabled in cbf99f6)
bufferInboundPacket()received straight into the ring slot atmHeadIndexbefore knowing whether a packet had arrived. With the ring full, that slot is the oldest unread packet, and the would-block receive terminating everydrainSocket()pass clobbered it to size 0 and desynced the byte accounting — once per drain under sustained overload. It now receives into a scratch buffer and only commits the slot for a real packet (matching the SOCKS branch), and the three asserts are restored asllassertsince they're true invariants again.Also: a runt SOCKS wrapper no longer terminates the drain loop early with packets still queued.
Correctness
sendMessage()computed the appended-ack budget with unsigned math, so over-MTU payloads (ChildAgentUpdate, SendXferPacket) wrapped to a hugespace_leftand got acks appended despite having no room.checkPacketInID()'s gap filler compared ids with plain<, misreading a small forward gap across the 24-bit wrap as out-of-order (log spam + resync without loss marking). It now uses the modular forward distance.dumpReceiveCounts()(the message-storm diagnostic) attributes counts instead of printing nothing.sendMessage()where the counters live.operator<<(LLMessageSystem)probedmMessageNumberswithoperator[], permanently inserting null entries; big-endianMVT_S16Arrayswizzle iteratedn % 2elements instead ofn / 2.Hot-path allocations
Previously every variable of every block of every inbound packet cost a map-node insert plus a
new U8[]+ copy, all freed again atclearMessage()— thousands of allocations per frame under object-update load.LLMsgVarDatastores payloads ≤ 24 bytes inline (covers every fixed wire type; largest isLLVector3d), so only long Variable fields heap-allocate.getData()computes the pointer, so vector growth can't dangle.LLIndexedVector(vector +std::mapindex) is replaced by an insertion-ordered flat vector scanned by prehashed name pointer — same API and iteration order, zero per-entry node allocations.operator[]inserts keyed by name; the reader'sgetSize()overloads query withfind()so lookups never insert.decodeData()takes the variable ref fromaddVariable()instead of re-finding it by name;getString(std::string&)no longer zero-fills a 1501-byte buffer per call.Testing
All llmessage suites pass in RelWithDebInfo (builder 47/47 incl. two new regression tests, llsdmessagebuilder 46/46, llsdmessagereader 21/21, message, parser, dispatcher 4/4); library also builds clean in Debug with the restored asserts active. Both new tests verified to fail against the pre-fix code.
🤖 Generated with Claude Code