Harden the view layer against malformed input (found via real arm64 binaries)#37
Merged
Conversation
…inaries) Ran the full view-node dumper under ASan/UBSan over real macOS arm64 binaries (Python extension .so from arm64 wheels) plus thousands of truncated and byte-flipped variants. This surfaced — and this commit fixes — a batch of real crashes that the header-only moex-parse and the bundled samples never reached: - Wrong static_cast in the rebase opcode view (REBASE_OPCODE_DO_REBASE_ADD_ ADDR_ULEB cast to the ADD_ADDR_ULEB wrapper) — undefined downcast. - Unbounded table walks reading past a truncated/crafted file: symbol table (nlist), indirect symbols & relocations, data-in-code, function starts, string table, section data sizes, and the DYLD_INFO rebase/bind opcode streams. Each is now clamped to the mapped file. - nsects not bounded to the segment command size, so section structs could be read past the file during node construction. - Stack overflow from unbounded view-tree recursion (default max_depth=0) in the dumper walks and the GUI tree build; both now cap recursion depth. - ParseStringLiteral now only returns NUL-terminated strings so callers can safely treat them as C strings. - Signed LEB128 sign-extension used a negative left shift (UB); use an unsigned shift. After the fixes, ~1300 ASan/UBSan fuzz iterations over the real binaries report no out-of-bounds access and no out-of-memory. (A pervasive but benign misaligned-read UB from raw pointer dereferences into the mapping remains; it does not fault on x86_64/arm64 and is tracked for the NodeData migration.) https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
I sourced real macOS arm64 binaries by downloading arm64 Python wheels from PyPI (their
.soextensions are real Mach-O arm64), then ran the full view-node dumper under ASan/UBSan over them plus thousands of truncated and byte-flipped variants. This surfaced — and this PR fixes — a batch of real crashes that the header-onlymoex-parseand the bundled samples never reached:REBASE_OPCODE_DO_REBASE_ADD_ADDR_ULEBwas cast to theADD_ADDR_ULEBwrapper).nlist), indirect symbols & relocations, data-in-code, function starts, string table, section data sizes, and theDYLD_INFOrebase/bind opcode streams.nsectsnot bounded to the segment command size → section structs read past the file during node construction.max_depth=0) in the dumper and the GUI tree build → both now cap recursion depth.ParseStringLiteralnow only returns NUL-terminated strings (safe forstd::string/strlen).Verified
tests/regression/run_all.shpasses; builds with Qt6 + Capstone.Known remaining (tracked, not fixed here)
A pervasive but benign misaligned-read UB from raw-pointer dereferences into the mapping (does not fault on x86_64/arm64); fully eliminating it needs the
NodeDatamigration.https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
Generated by Claude Code