Eliminate unnecessary parallel contention in recursive index search#150
Merged
Conversation
AgentElement
approved these changes
Jun 15, 2026
AgentElement
left a comment
Collaborator
There was a problem hiding this comment.
Good catches, LGTM
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Our
assembly::recurse_index_searchfunction previously used a combination of atomics and parallel updates to find the best assembly index across child states (best_child_index) and the total number of descendant states searched (states_searched) within the recursive calls. The advantage of this approach, at least forbest_child_index, is that we could attempt to update the globalbest_indexused across all states' branch-and-bound efforts as soon as a better bound is discovered. (There is no analogous advantage for updatingstates_searchedin parallel.) The disadvantage is that child threads are all contending to update their parent's atomic variables in parallel, potentially causing unnecessary slow-downs.This PR moves all of these updates into a more idiomatic
map/collectcall where child states return their results to the parent without having to wait on their siblings. The parent then calculatesbest_child_indexandstates_searchedafter all children are done, eliminating contention. The only potential downside of this is that better index bounds discovered by a child will not propagate tobest_indexuntil all its siblings also finish and the parent identifies it as the new minimum. But in practice......benchmarks indicate that this is 0–10% improvement for all search scenarios. There are a few minor (< 3%) regressions in the match enumeration benchmarks, though my code does not touch those parts of the algorithm (not sure what's happening there). Everything else benefits from removing contention over atomics.
Benchmark e7b976e (this PR) vs. 86b2482 (current main) as a baseline
Requesting review from @AgentElement who I am confident can critique my Rust concurrency.