Eliminate unnecessary parallel contention in recursive index search by jdaymude · Pull Request #150 · DaymudeLab/assembly-theory

jdaymude · 2026-06-13T00:19:26Z

Our assembly::recurse_index_search function previously used a combination of atomics and parallel updates to find the best assembly index across child states (best_child_index) and the total number of descendant states searched (states_searched) within the recursive calls. The advantage of this approach, at least for best_child_index, is that we could attempt to update the global best_index used across all states' branch-and-bound efforts as soon as a better bound is discovered. (There is no analogous advantage for updating states_searched in parallel.) The disadvantage is that child threads are all contending to update their parent's atomic variables in parallel, potentially causing unnecessary slow-downs.

This PR moves all of these updates into a more idiomatic map/collect call where child states return their results to the parent without having to wait on their siblings. The parent then calculates best_child_index and states_searched after all children are done, eliminating contention. The only potential downside of this is that better index bounds discovered by a child will not propagate to best_index until all its siblings also finish and the parent identifies it as the new minimum. But in practice...

...benchmarks indicate that this is 0–10% improvement for all search scenarios. There are a few minor (< 3%) regressions in the match enumeration benchmarks, though my code does not touch those parts of the algorithm (not sure what's happening there). Everything else benefits from removing contention over atomics.

Benchmark e7b976e (this PR) vs. 86b2482 (current main) as a baseline

bench_matches/gdb13_1201/nauty                        
                        time:   [55.455 ms 55.491 ms 55.529 ms]
                        change: [+2.5558% +2.6540% +2.7541%] (p = 0.00 < 0.05)
                        Performance has regressed.
bench_matches/gdb13_1201/tree-nauty                        
                        time:   [34.372 ms 34.389 ms 34.404 ms]
                        change: [-0.7790% -0.6026% -0.4228%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_matches/gdb17_200/nauty                        
                        time:   [201.28 ms 201.33 ms 201.38 ms]
                        change: [+1.4360% +1.4862% +1.5343%] (p = 0.00 < 0.05)
                        Performance has regressed.
bench_matches/gdb17_200/tree-nauty                        
                        time:   [105.69 ms 105.80 ms 105.91 ms]
                        change: [-1.5228% -1.3726% -1.2234%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_matches/checks/nauty                        
                        time:   [30.040 ms 30.048 ms 30.054 ms]
                        change: [+2.3781% +2.4219% +2.4626%] (p = 0.00 < 0.05)
                        Performance has regressed.
bench_matches/checks/tree-nauty                        
                        time:   [14.088 ms 14.103 ms 14.116 ms]
                        change: [+0.4625% +0.5498% +0.6364%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_matches/coconut_55/nauty                        
                        time:   [177.38 ms 177.49 ms 177.60 ms]
                        change: [+1.5123% +1.5720% +1.6454%] (p = 0.00 < 0.05)
                        Performance has regressed.
bench_matches/coconut_55/tree-nauty                        
                        time:   [83.930 ms 83.973 ms 84.015 ms]
                        change: [+0.0319% +0.2535% +0.4635%] (p = 0.03 < 0.05)
                        Change within noise threshold.

bench_bounds/gdb13_1201/no-bounds                        
                        time:   [59.919 ms 60.344 ms 60.760 ms]
                        change: [-0.2976% +0.8455% +1.9369%] (p = 0.16 > 0.05)
                        No change in performance detected.
bench_bounds/gdb13_1201/log                        
                        time:   [59.565 ms 59.832 ms 60.120 ms]
                        change: [-1.5341% -0.5862% +0.3994%] (p = 0.25 > 0.05)
                        No change in performance detected.
bench_bounds/gdb13_1201/int                        
                        time:   [56.687 ms 57.097 ms 57.479 ms]
                        change: [-3.0753% -2.1457% -1.2216%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb13_1201/int-vec                        
                        time:   [53.897 ms 54.260 ms 54.634 ms]
                        change: [-1.7727% -0.8509% +0.1070%] (p = 0.09 > 0.05)
                        No change in performance detected.
bench_bounds/gdb13_1201/int-matchable                        
                        time:   [57.130 ms 57.468 ms 57.808 ms]
                        change: [+1.5966% +3.0318% +4.3650%] (p = 0.00 < 0.05)
                        Performance has regressed.
bench_bounds/gdb17_200/no-bounds                        
                        time:   [204.70 ms 206.61 ms 208.53 ms]
                        change: [-3.5225% -2.4647% -1.3238%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb17_200/log                        
                        time:   [145.96 ms 146.75 ms 147.55 ms]
                        change: [-5.8140% -5.0916% -4.3907%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb17_200/int                        
                        time:   [48.790 ms 49.053 ms 49.317 ms]
                        change: [-8.2354% -7.6077% -6.9951%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb17_200/int-vec                        
                        time:   [47.663 ms 47.897 ms 48.129 ms]
                        change: [-7.3690% -6.7550% -6.1313%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb17_200/int-matchable                        
                        time:   [38.135 ms 38.369 ms 38.602 ms]
                        change: [-10.262% -9.4983% -8.6938%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/checks/no-bounds                        
                        time:   [163.18 ms 165.88 ms 168.46 ms]
                        change: [-5.9487% -3.4512% -0.6589%] (p = 0.02 < 0.05)
                        Change within noise threshold.
bench_bounds/checks/log time:   [95.671 ms 96.596 ms 97.491 ms]
                        change: [-8.4133% -7.0966% -5.5236%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/checks/int time:   [10.821 ms 10.879 ms 10.959 ms]
                        change: [-9.0838% -8.1021% -7.0562%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/checks/int-vec                        
                        time:   [7.6657 ms 7.6929 ms 7.7167 ms]
                        change: [-11.000% -10.193% -9.4266%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/checks/int-matchable                        
                        time:   [6.1039 ms 6.1228 ms 6.1519 ms]
                        change: [-10.987% -10.361% -9.6373%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/coconut_55/no-bounds                        
                        time:   [1.8498 s 1.9084 s 1.9756 s]
                        change: [-10.654% -6.2739% -1.5195%] (p = 0.01 < 0.05)
                        Performance has improved.
bench_bounds/coconut_55/log                        
                        time:   [1.3219 s 1.3788 s 1.4420 s]
                        change: [-11.443% -5.7299% +0.2989%] (p = 0.07 > 0.05)
                        No change in performance detected.
bench_bounds/coconut_55/int                        
                        time:   [129.05 ms 130.79 ms 132.53 ms]
                        change: [-9.2253% -7.5879% -5.9182%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/coconut_55/int-vec                        
                        time:   [114.42 ms 115.38 ms 116.27 ms]
                        change: [-9.3261% -8.0173% -6.6800%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/coconut_55/int-matchable                        
                        time:   [42.754 ms 42.898 ms 43.038 ms]
                        change: [-6.4748% -6.0767% -5.6924%] (p = 0.00 < 0.05)
                        Performance has improved.

bench_memoize/gdb13_1201/no-memoize                        
                        time:   [42.528 ms 42.819 ms 43.083 ms]
                        change: [-2.8592% -1.9585% -1.0690%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/gdb13_1201/nauty-index                        
                        time:   [59.407 ms 59.810 ms 60.212 ms]
                        change: [-2.1398% -1.3728% -0.6672%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_memoize/gdb13_1201/tree-nauty-index                        
                        time:   [56.595 ms 56.988 ms 57.391 ms]
                        change: [-3.0719% -1.9975% -0.8188%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_memoize/gdb17_200/no-memoize                        
                        time:   [23.480 ms 23.578 ms 23.752 ms]
                        change: [-20.173% -17.076% -13.750%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/gdb17_200/nauty-index                        
                        time:   [40.466 ms 40.601 ms 40.734 ms]
                        change: [-11.073% -10.502% -9.9432%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/gdb17_200/tree-nauty-index                        
                        time:   [38.091 ms 38.312 ms 38.529 ms]
                        change: [-10.556% -9.9717% -9.4088%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/checks/no-memoize                        
                        time:   [4.1325 ms 4.1529 ms 4.1796 ms]
                        change: [-16.418% -15.756% -15.025%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/checks/nauty-index                        
                        time:   [7.0126 ms 7.0464 ms 7.0740 ms]
                        change: [-9.7850% -9.1110% -8.4609%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/checks/tree-nauty-index                        
                        time:   [6.1725 ms 6.2131 ms 6.2626 ms]
                        change: [-10.512% -9.8740% -9.1781%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/coconut_55/no-memoize                        
                        time:   [33.601 ms 33.772 ms 34.003 ms]
                        change: [-9.5923% -8.3510% -7.1997%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/coconut_55/nauty-index                        
                        time:   [47.445 ms 47.573 ms 47.710 ms]
                        change: [-6.2327% -5.8082% -5.4051%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/coconut_55/tree-nauty-index                        
                        time:   [42.710 ms 42.857 ms 43.016 ms]
                        change: [-7.2619% -6.8085% -6.3095%] (p = 0.00 < 0.05)
                        Performance has improved.

Requesting review from @AgentElement who I am confident can critique my Rust concurrency.

AgentElement

Good catches, LGTM

jdaymude added 2 commits June 12, 2026 16:03

perf: Move states searched tallying out of parallel atomic

f3771f1

perf: Also move best index updates out of child state parallelism

e7b976e

jdaymude requested a review from AgentElement June 13, 2026 00:19

jdaymude self-assigned this Jun 13, 2026

jdaymude added the optimization An optimization to existing implementation label Jun 13, 2026

AgentElement approved these changes Jun 15, 2026

View reviewed changes

AgentElement merged commit e7ca3d4 into main Jun 15, 2026
11 checks passed

jdaymude deleted the parallel-collect branch June 15, 2026 22:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eliminate unnecessary parallel contention in recursive index search#150

Eliminate unnecessary parallel contention in recursive index search#150
AgentElement merged 2 commits into
mainfrom
parallel-collect

jdaymude commented Jun 13, 2026 •

edited

Loading

Uh oh!

AgentElement left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jdaymude commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AgentElement left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jdaymude commented Jun 13, 2026 •

edited

Loading