perf: eliminate redundant node clones during Trie::recalculate_root_hash()#6935
perf: eliminate redundant node clones during Trie::recalculate_root_hash()#6935cylewitruk-stacks wants to merge 3 commits intostacks-network:developfrom
Trie::recalculate_root_hash()#6935Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes MARF trie insert performance by reducing redundant node cloning/writes during Trie::recalculate_root_hash(), especially when computing the root hash with skiplist updates enabled.
Changes:
- Gate the “pre-flush then final write” behavior to only the root node when
update_skiplist == true. - Add
TrieStorageConnection::write_node_hash()to update only a node’s stored hash in the uncommitted trie state (avoiding a full node clone/write). - Minor logging/message and comment hygiene updates in trie operations.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
stackslib/src/chainstate/stacks/index/trie.rs |
Avoids redundant pre-flush writes for non-root ancestors; uses hash-only update for the root’s final write when skiplist updating. |
stackslib/src/chainstate/stacks/index/storage.rs |
Introduces TrieStorageConnection::write_node_hash() to support hash-only updates in uncommitted state. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #6935 +/- ##
============================================
+ Coverage 59.40% 70.02% +10.61%
============================================
Files 412 412
Lines 218187 218190 +3
Branches 338 338
============================================
+ Hits 129610 152778 +23168
+ Misses 88577 65412 -23165
... and 317 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
federico-stacks
left a comment
There was a problem hiding this comment.
the change makes sense to me. Just added a nit
| let root_ptr = storage.root_trieptr(); | ||
| let is_root_node = node.is_node256() && ptr == root_ptr; | ||
|
|
||
| if is_root_node && update_skiplist { |
There was a problem hiding this comment.
nit: I noticed this gate condition appears twice within the method. Would it make sense to extract it into a local variable (e.g., should_flush_root or a similar name) to avoid duplication and improve readability?
There was a problem hiding this comment.
I was actually concerned that that might reduce readability instead as this already reads pretty clearly. The "cleanest" alternative to me probably would have been to let is_preflushed_root = if is_root_node && update_skiplist { ...; true } else { false };-ish and gated the below on that, which would then read more like a control flow over a previous side-effect. 🤷
There was a problem hiding this comment.
not a blocker at all! Feel free to resolve the comment if you think otherwise :)
There was a problem hiding this comment.
I'll leave it open to see if anyone else has any preferences :) I don't have any strong opinions either way.
| // Flush the root node's pointers before calculating the skiplist hash. | ||
| // Root hash derivation performs ancestor lookups that expect the current | ||
| // trie structure to be materialized. Not needed when skipping the skiplist. | ||
| storage.write_nodetype(ptr.ptr(), &node, TrieHash::default())?; |
There was a problem hiding this comment.
Nit: it's equivalent code, but just for defensive purposes, can you continue to use TrieHash([0; 32])?
There was a problem hiding this comment.
How would you feel about using a TrieHash::ZERO or similar const (here and elsewhere)? I'm just a bit averse to inline "magic values" -- but I'm fine reverting it as well.
| // however, since we're going to update the hash in the next write anyways, just write an empty buff | ||
| storage.write_nodetype(ptr.ptr(), &node, TrieHash([0; 32]))?; | ||
| let root_ptr = storage.root_trieptr(); | ||
| let is_root_node = node.is_node256() && ptr == root_ptr; |
There was a problem hiding this comment.
nit: ptr == root_ptr should be both necessary and sufficient. The code already ensures that the root is always a node256.
There was a problem hiding this comment.
You're right, I must've overlooked the explicit check above 👍 Will fix when I'm in a place to switch back to this branch.
jcnelson
left a comment
There was a problem hiding this comment.
Couple nits but LGTM. Will approve once resolved. Thanks @cylewitruk-stacks!
Description
Background
Trie::recalculate_root_hash()is called on every MARF insert (regardless of trie hashing mode).Currently, all ancestor nodes are pre-flushed with a zero-hash prior to being written with their final hash, which results in two
write_nodetype()calls (and two full-node clones) per node.The pre-flush exists to materialize a node's child pointers before
get_trie_root_hash()performs Merkle skiplist ancestor lookups, but that lookup only happens for root nodes, and only whenupdate_skiplist == true(i.e., non-Deferredhashing modes). Non-root ancestors never trigger ancestor lookups, and neither does the root inDeferredmode.The Change
This PR gates the two-write pattern on
is_root_node && update_skiplistand thus eliminates one full-node clone per-ancestor-per-insert.Applicable issues
Additional info (benefits, drawbacks, caveats)
Core Optimization:
TrieStorageConnection::write_node_hash()function which updates only a node's stored hash without cloning and overwriting the full node body.write_nodetype()— one clone instead of two (both hash modes).Deferredmode: write final hash directly viawrite_nodetype()— one clone instead of two (no skiplist lookup, no pre-flush needed).Immediate(andAll) modes: child pointers are still pre-flushed viawrite_nodetype()before skiplist hash derivation, but the final write is now a hash-only update viawrite_node_hash()— one clone instead of two.Minor Hygiene:
trace!& comments cleanup.Benchmark Diff:
Using
marf-benchin PR #6932Write path allocation counts down between 3-8%, allocated bytes down 3-18% (with synthetic benchmark data) and some minor improvements in timings.
Checklist