Skip to content

ARETE use case #12

@rbeiko

Description

@rbeiko

The software should be able to read phylogenetic output files produced by ARETE. These will comprise a reference tree and a set of (typically thousands of) gene trees that contain subsets of the leaves in the reference tree. Leaf labels produced by the pipeline should be consistent between the reference and gene trees.

Output trees will have been inferred using IQ-TREE or (more frequently) FastTree.

  • Trees will have support values, these will be scaled from 0 to 1 in FastTree output.
  • They will generally not be rooted. The user may specify an outgroup when they invoke the pipeline, but in general this will be impractical as outgroups may be complicated and uncertain. Midpoint rooting is acceptable as a default if rooting is necessary.
  • Multifurcations and zero-length branches are possible (and probable) so the software should expect these.

My recommendation is some combination of building a wrapper script (probably using Python) that can do the necessary conversions and invoke rSPR, and / or modifying the rSPR code itself to perform some of these tasks internally.

  • Assume the input consists of a reference tree and one file for each gene tree. Producing a single file with one line per gene tree would probably be OK although we need to keep the connection with the original file names.
  • It would be great if rSPR could perform the comparisons in batches of user-defined size. Doing each comparison as a separate process will lead to thousands of jobs that need to wait in the queue.
  • rSPR should first run the approximation on the entire set of trees. (maybe this can be done as a single job?)
  • The user should be able to set a threshold approximation score above which a tree is not submitted for the exact analysis. This can be based on the highest approximation score for any given cluster of a tree, not necessarily the entire tree.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions