Skip to content

Track removed matches by match index instead of offset#149

Merged
Garrett-Pz merged 1 commit into
mainfrom
match-identifiers
Jun 11, 2026
Merged

Track removed matches by match index instead of offset#149
Garrett-Pz merged 1 commit into
mainfrom
match-identifiers

Conversation

@jdaymude

@jdaymude jdaymude commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Each assembly state (i.e., collection of fragments) keeps track of the matches (i.e., pairs of isomorphic, edge-disjoint subgraphs) that were removed from the original molecule to arrive at the current state. Before this PR, the state::removal_order variable identified removed matches by their index in the state's list of compatible matches as returned by matches::matches_to_remove. These identifiers are local to each state and depend on the (potentially nondeterministic) best assembly index found across all assembly states thus far.

This PR changes the identifiers to use the match indices computed by matches::matches_to_remove directly, as suggested in #141. This ensures consistent references to matches across states, enabling, for example, pathway extraction as desired for #22.

Benchmarks indicate that this is a unilateral improvement, especially for memoization:

Benchmark 1063655 (this PR) vs. af9f9b2 (current main) as a baseline
bench_matches/gdb13_1201/nauty                        
                        time:   [54.112 ms 54.143 ms 54.177 ms]
                        change: [+0.5351% +0.6232% +0.7109%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_matches/gdb13_1201/tree-nauty                        
                        time:   [33.946 ms 33.999 ms 34.034 ms]
                        change: [-0.7976% -0.5837% -0.3486%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_matches/gdb17_200/nauty                        
                        time:   [197.78 ms 197.81 ms 197.85 ms]
                        change: [+0.6810% +0.7358% +0.7998%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_matches/gdb17_200/tree-nauty                        
                        time:   [105.29 ms 105.33 ms 105.36 ms]
                        change: [-1.0344% -0.9088% -0.7812%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_matches/checks/nauty                        
                        time:   [29.482 ms 29.493 ms 29.501 ms]
                        change: [-0.9254% -0.8738% -0.8218%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_matches/checks/tree-nauty                        
                        time:   [14.010 ms 14.017 ms 14.026 ms]
                        change: [-0.5240% -0.4370% -0.3428%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_matches/coconut_55/nauty                        
                        time:   [174.74 ms 174.78 ms 174.83 ms]
                        change: [-0.7346% -0.6968% -0.6573%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_matches/coconut_55/tree-nauty                        
                        time:   [83.694 ms 83.781 ms 83.877 ms]
                        change: [+0.2011% +0.3107% +0.4453%] (p = 0.00 < 0.05)
                        Change within noise threshold.

bench_bounds/gdb13_1201/no-bounds                        
                        time:   [62.314 ms 62.611 ms 62.905 ms]
                        change: [-2.7559% -2.0603% -1.3649%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb13_1201/log                        
                        time:   [61.791 ms 62.032 ms 62.293 ms]
                        change: [-2.6149% -2.0680% -1.5045%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb13_1201/int                        
                        time:   [58.577 ms 58.881 ms 59.193 ms]
                        change: [-2.6606% -1.9214% -1.1213%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb13_1201/int-vec                        
                        time:   [56.223 ms 56.384 ms 56.546 ms]
                        change: [-2.0538% -1.5293% -1.0005%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb13_1201/int-matchable                        
                        time:   [58.775 ms 59.075 ms 59.339 ms]
                        change: [-2.3757% -1.6314% -0.9255%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_bounds/gdb17_200/no-bounds                        
                        time:   [211.75 ms 213.38 ms 215.06 ms]
                        change: [-0.3847% +0.5865% +1.5312%] (p = 0.26 > 0.05)
                        No change in performance detected.
bench_bounds/gdb17_200/log                        
                        time:   [152.79 ms 153.53 ms 154.27 ms]
                        change: [-1.4634% -0.6304% +0.2018%] (p = 0.16 > 0.05)
                        No change in performance detected.
bench_bounds/gdb17_200/int                        
                        time:   [51.803 ms 52.002 ms 52.221 ms]
                        change: [-4.1993% -3.6976% -3.1788%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb17_200/int-vec                        
                        time:   [50.464 ms 50.642 ms 50.814 ms]
                        change: [-5.0088% -4.4325% -3.8253%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/gdb17_200/int-matchable                        
                        time:   [40.221 ms 40.425 ms 40.661 ms]
                        change: [-8.4451% -7.7739% -7.0551%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/checks/no-bounds                        
                        time:   [168.81 ms 171.81 ms 174.56 ms]
                        change: [-2.5650% -0.4905% +1.5683%] (p = 0.66 > 0.05)
                        No change in performance detected.
bench_bounds/checks/log time:   [102.58 ms 104.15 ms 105.57 ms]
                        change: [-2.1526% -0.1010% +1.7503%] (p = 0.92 > 0.05)
                        No change in performance detected.
bench_bounds/checks/int time:   [11.486 ms 11.576 ms 11.679 ms]
                        change: [-2.8341% -1.7868% -0.6392%] (p = 0.00 < 0.05)
                        Change within noise threshold.
bench_bounds/checks/int-vec                        
                        time:   [8.2036 ms 8.2430 ms 8.2888 ms]
                        change: [-3.1910% -2.4937% -1.7369%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/checks/int-matchable                        
                        time:   [6.4451 ms 6.4654 ms 6.4864 ms]
                        change: [-7.8567% -7.0892% -6.2846%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_bounds/coconut_55/no-bounds                        
                        time:   [1.9063 s 1.9498 s 1.9930 s]
                        change: [-1.9724% +0.8771% +3.6953%] (p = 0.56 > 0.05)
                        No change in performance detected.
bench_bounds/coconut_55/log                        
                        time:   [1.3770 s 1.4014 s 1.4244 s]
                        change: [-0.7213% +2.4648% +6.1264%] (p = 0.17 > 0.05)
                        No change in performance detected.
bench_bounds/coconut_55/int                        
                        time:   [138.63 ms 140.12 ms 141.53 ms]
                        change: [-2.3719% -1.1357% +0.2561%] (p = 0.11 > 0.05)
                        No change in performance detected.
bench_bounds/coconut_55/int-vec                        
                        time:   [122.99 ms 124.43 ms 125.64 ms]
                        change: [-2.1988% -0.6514% +0.7441%] (p = 0.40 > 0.05)
                        No change in performance detected.
bench_bounds/coconut_55/int-matchable                        
                        time:   [44.383 ms 44.510 ms 44.630 ms]
                        change: [-4.6370% -4.1721% -3.6967%] (p = 0.00 < 0.05)
                        Performance has improved.

bench_memoize/gdb13_1201/no-memoize                        
                        time:   [45.396 ms 45.770 ms 46.141 ms]
                        change: [-2.9750% -1.6782% -0.3553%] (p = 0.02 < 0.05)
                        Change within noise threshold.
bench_memoize/gdb13_1201/nauty-index                        
                        time:   [61.192 ms 61.452 ms 61.751 ms]
                        change: [-2.4070% -1.7159% -1.0387%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/gdb13_1201/tree-nauty-index                        
                        time:   [58.438 ms 58.745 ms 59.078 ms]
                        change: [-3.1534% -2.3646% -1.6131%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/gdb17_200/no-memoize                        
                        time:   [25.785 ms 25.976 ms 26.286 ms]
                        change: [-14.313% -10.176% -5.8851%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/gdb17_200/nauty-index                        
                        time:   [42.912 ms 43.166 ms 43.435 ms]
                        change: [-7.2339% -6.5954% -5.8989%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/gdb17_200/tree-nauty-index                        
                        time:   [40.005 ms 40.223 ms 40.435 ms]
                        change: [-9.0074% -8.3548% -7.7102%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/checks/no-memoize                        
                        time:   [4.5218 ms 4.5416 ms 4.5641 ms]
                        change: [-9.2962% -8.5597% -7.8636%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/checks/nauty-index                        
                        time:   [7.2325 ms 7.2573 ms 7.2867 ms]
                        change: [-8.5966% -7.2042% -5.9937%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/checks/tree-nauty-index                        
                        time:   [6.4385 ms 6.4591 ms 6.4829 ms]
                        change: [-9.2160% -8.0030% -7.0526%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/coconut_55/no-memoize                        
                        time:   [35.357 ms 35.521 ms 35.700 ms]
                        change: [-4.7073% -3.3918% -2.1883%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/coconut_55/nauty-index                        
                        time:   [48.753 ms 48.872 ms 48.999 ms]
                        change: [-4.4748% -4.1005% -3.6904%] (p = 0.00 < 0.05)
                        Performance has improved.
bench_memoize/coconut_55/tree-nauty-index                        
                        time:   [44.296 ms 44.472 ms 44.649 ms]
                        change: [-4.4658% -3.9191% -3.3711%] (p = 0.00 < 0.05)
                        Performance has improved.

@jdaymude jdaymude added the architecture Refactors and non-functional structure improvements label Jun 10, 2026
@jdaymude jdaymude marked this pull request as ready for review June 11, 2026 00:12
@jdaymude jdaymude requested a review from Garrett-Pz June 11, 2026 00:12
@jdaymude jdaymude self-assigned this Jun 11, 2026

@Garrett-Pz Garrett-Pz left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Garrett-Pz Garrett-Pz merged commit 86b2482 into main Jun 11, 2026
11 checks passed
@jdaymude jdaymude deleted the match-identifiers branch June 11, 2026 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

architecture Refactors and non-functional structure improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants