Skip to content

Split reference sequences into different fasta files by their annotation from treesapp purity #86

@cmorganl

Description

@cmorganl

treesapp purity is (mostly) good at indicating whether reference packages will end up classifying off-target homologs. A missing piece, however, leaves users unsure of what to do with the reference package if off-target hits were found.

While not perfect, the sequences of a reference package could be split across fasta files based on what orthologous group they were classified to. A fasta containing all sequences that were not classified would be written as well. With these files, users could concatenate the sequences they believe to belong to their targeted protein family and recreate the refpkg.

This would be ideal in cases when misannotated, nonhomologous sequences were included in the initial set used by treesapp create.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions