Skip to content

TreeSAPP classify bug for TaxIDs that have been merged #99

@janstett

Description

@janstett

The NCBI sometimes merges old TaxIDs into new ones. However, depending on how the header name is structured, treesapp can't match the taxIDs to a lineage and will erroneously assign a sequence to root:

For Example, in the SoxZ package:

1525715.IX54_08960 get read as having a taxID as 1525715,
however, 1525715 has been merged into 1545044.
TreeSAPP will classify this sequence as Root.

However, the taxonomy should be:
Bacteria; Pseudomonadota; Alphaproteobacteria; Rhodobacterales; Paracoccaceae; Paracoccus; Paracoccus sanguinis
https://www.ncbi.nlm.nih.gov/protein/694216822

For cases where the protein accession is listed without a taxID prefix, this issue is avoided. It seems that this is more of an issue for sequences that originate from EggNog.

  • TreeSAPP Version [e.g. 0.11.4]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

bugUnexpected error raised? Weird results? Use this label.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions