TreeSAPP Create Bug When Using Guarantee Flag

I noticed that when creating reference packages that have guaranteed sequences from TIGRFAM, the header gets truncated and as a result, when querying the NCBI, the sequences gets misclassified as "r__Root".

For example for NapA, here's the base treesapp create command

`treesapp create -c NapA -p 0.85 --min_taxonomic_rank c -n 16 -i RefPkgs/Nitrogen_metabolism/Denitrification/NapA/ENOG501NS3T.faa --guarantee RefPkgs/Nitrogen_metabolism/Denitrification/NapA/TIGR01706.faa --cluster --trim_align --outdet_align --headless --fast --overwrite -o TS_Make_Lin_Table_For_Eval/Base/NapA/ --profile RefPkgs/Nitrogen_metabolism/Denitrification/NapA/TIGR01706.HMM --deduplicate --min_seq_length 600`

For the TIGRFAM file, here are the sequence headers:
>SP|Q56350|NAPA_PARDT/2-831
>SP|P39185|NAPA_ALCEU/2-831

If you look at both the accession table and any trees that are generated for this package
These both get truncated to:
SP|   r__Root

When they should be:
Q56350  r__Root; d__Bacteria; p__Pseudomonadota; c__Alphaproteobacteria; o__Rhodobacterales; f__Paracoccaceae; g__Paracoccus; s__Paracoccus pantotrophus

P39185  r__Root; d__Bacteria; p__Pseudomonadota; c__Betaproteobacteria; o__Burkholderiales; f__Burkholderiaceae; g__Cupriavidus; s__Cupriavidus necator

When removing the prefix of the headers, this fixes the issue, however, I'm wondering if the header truncation needs to be addressed.

However, this doesn't seem to be an issue when running treesapp with these prefixes in the base fasta input file (Example for RadA), or when treesapp update is used, after which the final clustered sequences go to treesapp create.

 - TreeSAPP Version [e.g. 0.11.4]



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TreeSAPP Create Bug When Using Guarantee Flag #98

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TreeSAPP Create Bug When Using Guarantee Flag #98

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions