vg giraffe chaining mode speedup by faithokamoto · Pull Request #4910 · vgteam/vg

faithokamoto · 2026-05-18T21:09:33Z

Changelog Entry

To be copied to the draft changelog by merger:

Significant time and memory optimizations to vg giraffe chaining/long-read mode

Description

A collection of small tweaks to optimize around ziptrees. Each tweak was individually tested to make sure it actually was a speed improvement. Output does not change.

Ziptree construction changes:
- When in add_child_to_chain(), instead of using the zipcode to look up the chain component for the new child, use the memory within the sorting information; much quicker to access.
- When in sort_one_interval() and we're sorting the children of a chain, instead of separately looking up each thing the zipcode stores, page through the entire zipcode at once (see get_chain_child_sort_info())
Ziptree distance_iterator change: instead of using pos.index to look up the item at the current index, store an iterator for the backing vector and dereference that as needed.
- Yes this makes the code a bit more fragile, but as long as no one tries to move pos.index outside of the helper shift_index_by() / change_index_to() functions, it'll work.
- I'm still storing the full ziptree & the index, since I need them for stuff like easy teleportation (hard to jump an iterator to an index if you don't know where you are) and being able to look up values within distance matrices.
DP changes:
- The vector which holds transitions gets space pre-allocated.
- Instead of generating a bunch of transitions and then iterating through them to throw out the ones which aren't read-reachable, now we check each transition for whether it's read-reachable before we even bother saving it. (Note this does mean I had to kill the missing-transition checker, since we are purposely missing transitions, but whatever it was never really used.)

I'm futzing around with a final optimization, will push that if it turns out to help.

faithokamoto · 2026-05-18T23:19:39Z

Nope the last thing I was playing with ended up being bad, so this is it.

adamnovak

This looks pretty good, but I think the sort_value_t generation could be substantially simplified to avoid throwing around the big tuples.

adamnovak · 2026-05-20T15:52:52Z

+            ZipCode::code_type_t code_type;
+            size_t offset_in_chain;
+            size_t length;
+            size_t chain_component;
+            bool is_reversed;
+            std::tie(code_type, offset_in_chain, length, chain_component, is_reversed) \
+                = seed.zipcode.get_chain_child_sort_info(interval.depth+1);


Nope we're unpacking it ourselves and not just < comparing it.

When I see sorting and tuples together I assume we're building a sort key and we're going to compare those tuples with the tuple operator<, but that's not what's happening here.

This is probably a good candidate for a struct, which then gives a place to explain how/why we can't just make an operator<. But then we need to ask why that struct isn't already just the same as sort_value_t? Why can't get_chain_child_sort_info() take seed.pos and node_offset, which seem to be the only inputs to the code below here to set up sort_values_by_seed[zipcode_sort_order[i]] that don't come out of get_chain_child_sort_info(), and then return the completed sort_value_t, instead of a whole collection of things that we use only to build it?

I set it up so now the ZipCodeForest takes responsibility for figuring out its sort values, and it actually just directly sets the various fields of the sort_value_t by reference instead of return.

faithokamoto added 8 commits May 14, 2026 16:57

minor speedup

4b3b06c

look up less

afb622b

look up even less

8f8789e

reuse old lookup

686b032

try out this helper function

788506b

maybe use an iterator hehe

95dc0c3

avoid over-reallocating

4b7d7f6

save less stuff

74e5199

Merge branch 'master' into speedup

98990a2

adamnovak reviewed May 20, 2026

View reviewed changes

move responsibility elsewhere for cleaner code

a608e55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vg giraffe chaining mode speedup#4910

vg giraffe chaining mode speedup#4910
faithokamoto wants to merge 10 commits into
masterfrom
speedup

faithokamoto commented May 18, 2026 •

edited

Loading

Uh oh!

faithokamoto commented May 18, 2026

Uh oh!

adamnovak left a comment

Uh oh!

Uh oh!

adamnovak May 20, 2026

Uh oh!

faithokamoto May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

faithokamoto commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog Entry

Description

Uh oh!

faithokamoto commented May 18, 2026

Uh oh!

adamnovak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adamnovak May 20, 2026

Choose a reason for hiding this comment

Uh oh!

faithokamoto May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

faithokamoto commented May 18, 2026 •

edited

Loading