fix: mismatch-filtered reads misclassified as TooShort in Log.final.out
Summary
When all transcript candidates for a read are removed by the mismatch count or mismatch-rate filter (--outFilterMismatchNmax, --outFilterMismatchNoverLmax), rustar-aligner classifies those reads as UnmappedReason::TooShort rather than UnmappedReason::TooManyMismatches. This inflates the Number of reads unmapped: too short counter and keeps Number of reads unmapped: too many mismatches at zero.
STAR reference behaviour
STAR's ReadAlign_mappedFilter.cpp distinguishes:
- lines 20–26: score/match threshold failure →
unmappedShort → Log.final.out "too short"
- lines 27–30: mismatch count or rate exceeded →
unmappedMismatch → Log.final.out "too many mismatches"
Location
src/align/read_align.rs, lines 636–638:
let unmapped_reason = if transcripts.is_empty() {
Some(UnmappedReason::TooShort) // always TooShort — wrong when mismatches fired
} else {
None
};
The retain() closure at lines 415–522 already tracks filter reasons via a HashMap<&str, usize> (filter_reasons), but this map is only used for log::debug! output and is dropped before the catch-all above.
Fix
Replace the string-keyed filter_reasons HashMap with two booleans in the retain() closure:
let mut any_mismatch_filter = false;
let mut any_other_filter = false;
transcripts.retain(|t| {
// score / match filters:
if /* score or match condition */ {
any_other_filter = true;
return false;
}
// mismatch count / rate filters:
if /* mismatch condition */ {
any_mismatch_filter = true;
return false;
}
true
});
let unmapped_reason = if transcripts.is_empty() {
if any_mismatch_filter && !any_other_filter {
Some(UnmappedReason::TooManyMismatches)
} else {
Some(UnmappedReason::TooShort)
}
} else {
None
};
The existing filter_reasons HashMap can be removed (the two booleans replace its only production use).
Related
fix: mismatch-filtered reads misclassified as
TooShortinLog.final.outSummary
When all transcript candidates for a read are removed by the mismatch count or mismatch-rate filter (
--outFilterMismatchNmax,--outFilterMismatchNoverLmax), rustar-aligner classifies those reads asUnmappedReason::TooShortrather thanUnmappedReason::TooManyMismatches. This inflates theNumber of reads unmapped: too shortcounter and keepsNumber of reads unmapped: too many mismatchesat zero.STAR reference behaviour
STAR's
ReadAlign_mappedFilter.cppdistinguishes:unmappedShort→Log.final.out"too short"unmappedMismatch→Log.final.out"too many mismatches"Location
src/align/read_align.rs, lines 636–638:The
retain()closure at lines 415–522 already tracks filter reasons via aHashMap<&str, usize>(filter_reasons), but this map is only used forlog::debug!output and is dropped before the catch-all above.Fix
Replace the string-keyed
filter_reasonsHashMap with two booleans in theretain()closure:The existing
filter_reasonsHashMap can be removed (the two booleans replace its only production use).Related
Log.final.outfolds all unmapped reads intotoo short;otherbucket always 0 #48 (the "other = 0" part was fixed separately; the mismatch bucket was not).uT:A:tag value once unmapped-record tags are implemented