Skip to content

Scripts to reproduce iTRAILS results in StatisticalPopulationGenomics-2ndEd

Notifications You must be signed in to change notification settings

StatisticalPopulationGenomics-2ndEd/iTRAILS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains the files needed to run the example analysis of iTRAILS chapter in Statistical Population Genomics 2nd edition. It should be run in the following order:

1- Install iTRAILS package:

iTRAILS can be installed as a pip package:

pip install itrails

Or as a conda package from conda-forge

conda install conda-forge::itrails

2- Download necessary alignment file

Download the example 4 way alignment file from a chromosome 1 region between Human, Chimpanzee, Gorilla and Sumatran Orangutan (outgroup), available in Zenodo.

DOI

Place the file in ./data/input/optimization/4way_chr1_human_chimp_gorilla_orangutan.maf.

3- Run optimization of parameters

Use the Command Line Interface (CLI) to run the optimization:

itrails-optimize "./data/input/optimization/example_config.yaml" --input "./data/input/optimization/4way_chr1_human_chimp_gorilla_orangutan.maf" --output "./data/output/optimization/example"

For the given output, optimization was run for 3 hours in a HPC with 64 CPUs and 1GB allocated per CPU. Optimization results can be seen in ./data/output/optimization.

4- Run Viterbi decoding

Use the Command Line Interface (CLI) to run Viterbi decoding using the resulting best model from the optimization as input:

itrails-viterbi --config-file "./data/output/optimization/example.best_model.yaml" --reference "hg38"

Viterbi results shown in ./data/output/viterbi/exampleblck32.viterbi.csv are pruned to alignment block number 32 for tidyness, output for the viterbi function will include in similar order information for every alignment block named as example.viterbi.csv.

5- Run posterior decoding

Use the Command Line Interface (CLI) to run posterior decoding using the resulting best model from the optimization as input:

itrails-posterior --config-file "./data/output/optimization/example.best_model.yaml" --reference "hg38"

Posterior results shown in ./data/output/posterior/exampleblck32.posterior.csv are pruned to alignment block number 32 for tidyness, output for the posterior function will include in similar order information for every alignment block named as example.posterior.csv.

6- Plot posterior probabilities along with most likely topology based on posterior/Viterbi decoding

The final figure of the chapter (figure 3) can be reproduced by running the R code ./generate_plot.R on a R environment with the tidyverse package. The figure shows the aggregated likelihood of each tree topology along the 32nd alignment block, as well as the most likely topologies based on Viterbi decoding and posterior decoding.

Figure:

GenInt2 TRAILS plot

About

Scripts to reproduce iTRAILS results in StatisticalPopulationGenomics-2ndEd

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages