Skip to content

Stitching fragmented ORFs #91

@cmorganl

Description

@cmorganl

Is your feature request related to a problem? Please describe.
A user, Aditi Nagaraj, found a series of ORFs predicted by Prodigal (within treesapp assign) that had fragmented a single RpoB protein sequence into five consecutive ORFs.

Example outputs can be generated from the following command with rpob_test.txt and the RpoB reference package from RefPkgs:

treesapp assign \
-i rpob_test.txt -o RpoB_fragment_test/ \
--refpkg_dir RefPkgs/Translation/RpoB/seed_refpkg/final_outputs/ 

Describe the solution you'd like
A single ORF should be reported in cases where the whole protein sequence has been fragmented into pieces.

The 'stitching' can happen after the profile HMM alignment results have been parsed. A new function needs to be written that compares the alignment positions of ORFs on a single contig or scaffold (i.e. parent sequence). If it finds multiple ORFs from the same parent sequence whose profile HMM positions do not overlap and are located on the same strand, then the ORFs must be stitched.

Stitching involves going back to the (untranslated) input sequences, finding the start and stop positions, deducing the frame in which the ORFs were translated in, and conceptually translating a single sequence using the same translation table used by Prodigal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestA request for a new feature unlike one that already exists

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions