Conversation
This is what I managed to get out of Anthropic Claude on the subject of removing fragmenting and coalescing things to go straight from zip code trees to chaining. I had to make a couple changes to make it pass the Giraffe tests. I read through the code and it looks plausible, but it's possible the funnel logic is wrong or that the less apt parameter defaults get kept. This needs to be evaluated for mapping and calling accuracy against the version that has the fragmenting code but defaults to bypassing it.
|
If we're getting rid of chaining, the vg/src/algorithms/chain_items.cpp Lines 329 to 332 in aa8171c But if we simply chain all seeds directly, then every seed will correspond to an anchor border, since every seed will be its own anchor. |
|
@faithokamoto We can't get rid of the abstraction of having an vg/src/minimizer_mapper_from_chains.cpp Lines 1386 to 1391 in 0c86c8f So I think even with the removal of fragmenting, we still have to deal with having seeds in play that are not |
|
I checked this on calling with It looks like this removes a few calling errors. I also evaluated speed previously and we don't get too much slower. So I think this is ready. |
|
This should also be tested on R10 |
|
I tested this on R10y2025 reads, and there it increases calling errors. So on HiFi we reduce total errors by 0.13%, and on R10 we increase them by 0.019%. |
Changelog Entry
To be copied to the draft changelog by merger:
Description
To avoid metaphysical angst about why recombination penalties at fragmenting make things worse instead of better, this PR removes fragmenting entirely (on top of some commits merely bypassing it).
Bypassing fragmenting seems to decrease speed substantially on simulated hifi reads, increase accuracy somewhat on simulated hifi reads, and decrease speed somewhat on real hifi reads. (I haven't gotten R10 results yet because my whole-node timing jobs are still in queue.)
This code has been almost all synthesized by Anthropic Claude, using almost all of its patience (aka token limit for the day). I reviewed it and it appears to have done what I wanted to do and glommed the two step functions together (even though it did this by writing a new one and then deleting the old ones), but this still needs to be tested for mapping and calling accuracy effects (vs. d1625a9).