Skip to content

fix: mismatch-filtered reads misclassified as TooShort in Log.final.out #79

@Psy-Fer

Description

@Psy-Fer

fix: mismatch-filtered reads misclassified as TooShort in Log.final.out

Summary

When all transcript candidates for a read are removed by the mismatch count or mismatch-rate filter (--outFilterMismatchNmax, --outFilterMismatchNoverLmax), rustar-aligner classifies those reads as UnmappedReason::TooShort rather than UnmappedReason::TooManyMismatches. This inflates the Number of reads unmapped: too short counter and keeps Number of reads unmapped: too many mismatches at zero.

STAR reference behaviour

STAR's ReadAlign_mappedFilter.cpp distinguishes:

  • lines 20–26: score/match threshold failure → unmappedShortLog.final.out "too short"
  • lines 27–30: mismatch count or rate exceeded → unmappedMismatchLog.final.out "too many mismatches"

Location

src/align/read_align.rs, lines 636–638:

let unmapped_reason = if transcripts.is_empty() {
    Some(UnmappedReason::TooShort)   // always TooShort — wrong when mismatches fired
} else {
    None
};

The retain() closure at lines 415–522 already tracks filter reasons via a HashMap<&str, usize> (filter_reasons), but this map is only used for log::debug! output and is dropped before the catch-all above.

Fix

Replace the string-keyed filter_reasons HashMap with two booleans in the retain() closure:

let mut any_mismatch_filter = false;
let mut any_other_filter = false;

transcripts.retain(|t| {
    // score / match filters:
    if /* score or match condition */ {
        any_other_filter = true;
        return false;
    }
    // mismatch count / rate filters:
    if /* mismatch condition */ {
        any_mismatch_filter = true;
        return false;
    }
    true
});

let unmapped_reason = if transcripts.is_empty() {
    if any_mismatch_filter && !any_other_filter {
        Some(UnmappedReason::TooManyMismatches)
    } else {
        Some(UnmappedReason::TooShort)
    }
} else {
    None
};

The existing filter_reasons HashMap can be removed (the two booleans replace its only production use).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions