Skip to content

ramithuh/diff-evol-tree-search

Repository files navigation

Differentiable Search of Evolutionary Trees from Leaves

Our work introduces a differentiable approach to phylogenetic tree construction, optimizing both tree and ancestral sequences.

Pre-print - https://www.biorxiv.org/content/10.1101/2023.07.23.550206v1

Optimization of seqs and tree

To run examples in colab, click the below link

Open In Colab

Setup

conda create -n trees python=3.12 -y && conda activate trees
pip install -r requirements.txt

GPU is auto-detected. Use -g 0 to select a specific GPU.

Three optimization modes

# Bilevel optimization (implicit differentiation) — best results
python search_bilevel.py -l 16 -m 50 -sl 256 -nl 20 -e 5000 -ai 1 -ic 100 -lr 0.1 -lr_seq 0.01 -tLs "[0,0.005,10,50]" -s 42

# Alternating optimization (tree update -> seq update loop)
python search_alt.py -l 16 -m 50 -sl 256 -nl 20 -e 5000 -ai 1 -ic 100 -lr 0.1 -lr_seq 0.01 -tLs "[0,0.005,10,50]" -s 42

# Joint optimization (single optimizer, both param sets)
python search_joint.py -l 16 -m 50 -sl 256 -nl 20 -e 5000 -ic 100 -lr 0.1 -tLs "[0,0.005,10,50]" -s 42

Key params :

  • -l : number of leaves
  • -sl : sequence length
  • -m : mutations per bifurcation
  • -nl : alphabet size
  • -e : epochs/steps
  • -ic : initialization count to run in parallel (vmapped)
  • -ai : for alternating mode: number of seq updates per tree update. For bilevel mode: number of inner solver steps before implicit diff computes the outer gradient.

During running, every 200 steps it will print the surrogate_cost, hard_cost and loss side-by-side. Tree visualizations and sequence heatmaps are saved to figures/.


Current Limitations :
  • Groundtruth trees we evaluate against (optimal solutions) are perfect binary trees. We need to evaluate on diverse grountruth trees of uneven leaf levels
  • Get rid of site-wise independence assumption

We are working on these aspects in another repo : https://github.com/ramithuh/differentiable-trees. Once those are tested and verified, this repo will be updated. If you have any suggestions/comments/feedback feel free to reach us.

About

We introduce a differentiable approach to phylogenetic tree construction, optimizing tree and ancestral sequences in its original representation itself, thus requiring no prior training data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors