Split recombination penalty and enable by default#4911
Open
adamnovak wants to merge 25 commits into
Open
Conversation
…--rec-consistency-bonus
This reverts commit d8a8146.
…mbination mode by default from payload type
…be tempted to lower it
…naligned mapping generation
…ejecting path-payload minimizers
Member
Author
|
@jltsiren What if we changed the path cover generation to somehow mark the resulting GBZ as being a path cover, maybe with a tag? Then I could look at it when doing minimizer indexing and know not to make a recombination-aware minimizer index because I don't have real haplotypes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changelog Entry
To be copied to the draft changelog by merger:
vg giraffe--rec-penalty-chainparameter has been split into--rec-penalty(for chaining),--rec-consistency-bonus(a bonus for haplotype consistency used during chaining but not incorporated into the chain score), and--rec-penalty-aln(used to penalize alignment scores per recombination).--rec-modetovg minimizernow just makes it fail if recombination-aware minimizer indexing isn't on (because of too many haplotypes).vg giraffe, if a recombination-aware minimizer index file is loaded and you are using thehifiorr10presets. To turn it off, pass--no-rec-mode. There's no longer a distinction between.pathminimizer and zipcodes files and normal ones. If using a path cover instead of real haplotypes, like with an unphased VCF as a Giraffe input, you must pass--no-rec-modemanually or else recombination-aware mapping will be used despite being inappropriate for a path cover. The wiki will need to be updated to reflect this.hifiandr10presets forvg giraffehave been updated with tuned recombination penalty settings.vg giraffeno longer produces alignments with nonempty path and negative or zero score. Potential alignment that would reach or go below a score of 0 (perhaps because of--rec-penalty-aln) will be removed, and if needed an unmapped alignment record will be emitted for the read.Description
This sets up recombination-aware mapping to be on by default when possible, using separately-tuned parameters for the different ways we use recombination information.
To write the tests cleanly, I had to get rid of negative-score alignments, which I don't think we want anyway, but I'm currently testing to make sure that doesn't do anything bad to our calling results. (I'm not evaluating what it does to our mapping results, because I don't think a read put at the correct position with a negative score should ever have really counted.)
I tried to smarten up how we decide whether to use recombination-aware mapping, but there's a couple things that could change:
giraffe_main.cppand guess what the IndexRegistry is going to do. I didn't like any of these options so I'm making the user remember to turn it off, for now.