Skip to content

Harden LEB128 decoding against malformed Mach-O input#29

Merged
everettjf merged 11 commits into
masterfrom
claude/review-and-plan-g4p9o
May 22, 2026
Merged

Harden LEB128 decoding against malformed Mach-O input#29
everettjf merged 11 commits into
masterfrom
claude/review-and-plan-g4p9o

Conversation

@everettjf
Copy link
Copy Markdown
Owner

Add an explicit end-of-buffer bound to readUnsignedLeb128/readSignedLeb128
so truncated or crafted dyld info / function-starts streams can no longer
read past the mapped file. Thread the bound through every caller, replace
release-stripped asserts on segment indices with runtime checks, and bound
the bind-opcode symbol-name scan with memchr instead of strlen.

Also extend crash-regression with truncated copies of the sample binaries
so the LEB128 and trie-walk bounds checks are exercised against real
LC_DYLD_INFO / LC_FUNCTION_STARTS payloads, and add the missing noexcept on
the non-Apple NodeException::what() override so the parser builds under
modern GCC/Clang.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok

claude added 11 commits May 20, 2026 20:31
Add an explicit end-of-buffer bound to readUnsignedLeb128/readSignedLeb128
so truncated or crafted dyld info / function-starts streams can no longer
read past the mapped file. Thread the bound through every caller, replace
release-stripped asserts on segment indices with runtime checks, and bound
the bind-opcode symbol-name scan with memchr instead of strlen.

Also extend crash-regression with truncated copies of the sample binaries
so the LEB128 and trie-walk bounds checks are exercised against real
LC_DYLD_INFO / LC_FUNCTION_STARTS payloads, and add the missing noexcept on
the non-Apple NodeException::what() override so the parser builds under
modern GCC/Clang.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
Bounds-check the NodeData copy against the mapped file before reading, so a
struct that straddles EOF can no longer be memcpy'd out of range. Read the
load-command header through an aligned copy and require cmdsize to honor the
pointer-size alignment, and reject fat slices whose offset is not 8-byte
aligned. Together these stop the parser from dereferencing Mach-O headers and
load commands at misaligned addresses on crafted input (undefined behaviour
flagged by UBSan).

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
FatHeaderViewNode built its table with the no-argument CreateTableView(),
leaving TableViewData::GetRAW unset, so the AddRow(field,...) template called
an empty std::function and aborted with bad_function_call. This crashed both
the Fat Header node in the GUI and any --cli dump of a fat binary.

Give the fat header table a real GetRAW so row offsets are correct, and make
the AddRow template tolerate an unset callback as a safety net. Extend the CLI
smoke test to analyze the fat sample so the regression is covered.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
Opening a large binary or dyld shared cache previously blocked the UI thread
for the whole parse because LayoutController::initModel parsed and built the
tree synchronously inside LayoutDockWidget::openFile.

Split the controller into a thread-safe parse() (libmoex only, no Qt objects)
and a GUI-thread buildModel(), and run parse() through QtConcurrent with a
QFutureWatcher. The tree is cleared while parsing and rebuilt on completion;
re-entrant opens are ignored until the in-flight parse finishes so the worker
never races a deleted controller.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
The two dialog sources included their generated uic headers with a lowercased
name (ui_aboutdialog.h) that only resolves on a case-insensitive filesystem,
so AutoUic failed on Linux. Match the .ui filename case so the project
configures and builds out of the box with stock qt6-base-dev.

Make the CLI smoke test fall back to the plain build/MachOExplorer binary when
there is no macOS .app bundle, and add a build_linux.sh helper plus Linux build
notes. The full regression suite now passes on Linux.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
The Code Signature node previously showed only the raw __LINKEDIT blob. Parse
the embedded-signature SuperBlob (big-endian, byte-wise so unaligned offsets
stay safe and every access is bounds-checked) and list each sub-blob with its
slot type, magic, offset and length. When an entitlements blob is present,
decode and display the plist XML line by line with file offsets, capped so a
large blob cannot flood the table.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
Selecting a node with a large table (symbols, dyld cache images, ObjC
metadata) ran ViewNode::Init() synchronously on the GUI thread, briefly
freezing the UI. Build uninitialized nodes through QtConcurrent and display
them when ready; already-parsed nodes still render instantly. Builds are
single-flight and the most recent selection always wins, so rapid clicking
never races or shows a stale table.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
There was no way to search across the whole structure tree; only individual
tables could be filtered. Add a search field above the layout tree backed by a
recursive QSortFilterProxyModel: typing keeps matching nodes and their
ancestors and expands the tree to reveal them, clearing restores the default
expansion. Selection handling maps proxy indices back to the source model so
node activation keeps working.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
The layout search only matched node display names. Add a custom filter proxy
that also searches the cell contents of any node whose view data has already
been built, so once a table (symbols, strings, ...) has been opened its rows
become searchable from the layout search box. Nodes that have not been parsed
are still matched by name, so lazy loading is preserved and no node is forced
to parse just to search.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
The disassembly-based xref scanner did not compile against modern Capstone:
cs_regs_access requires uint16_t register arrays (cs_regs is uint16_t[64], not
uint8_t), and ReadPointerAtVm takes a MachHeader* but was passed the
MachHeaderPtr shared_ptr. Use the correct register array type and pass the raw
pointer so a Capstone-enabled build succeeds and the xref report resolves
call/jump targets.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
Build the project (Qt6 + Capstone) and run the regression suite on Ubuntu for
every push to master and every pull request, so the Linux build and the parser
hardening / crash regressions stay green.

https://claude.ai/code/session_013kBiVXftgoEsyGVyrvfGok
@everettjf everettjf merged commit f6027ea into master May 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants