Skip to content

CompactDNA is not effective for small length sequences #2

@Trecek

Description

@Trecek
Sequence Length Bases/µs Sequences/µs
5 818.21 163.64
10 1,593.39 159.34
15 2,454.58 163.64
20 2,418.44 120.92
25 4,066.13 162.65
30 4,908.97 163.63
35 5,761.40 164.61
40 6,560.00 164.00

The reason smaller sequences perform worse may be because of how I extract score from u64.
Currently I will pack as many whole sequences as I can into u64.
I use 3bit per base, so 21 bases fit into one u64.
This means if we pack a word with multiple words, we have to take extra steps per u64 in order to extract. It would likely be better to use a different usize than to pack many words into one u64. There may be more effective ways to sum are scores.
It may be better to just pack 1 word into most effective usize.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions