This project implements the Longest Common Subsequence (LCS) algorithm and applies it to construct gene genealogy trees using both local and global approaches.
Genetic sequences are analyzed to infer evolutionary relationships. The project uses the LCS method to compute similarity matrices and builds hierarchical relationships (trees) based on the resulting scores.
- LCS Implementation: Calculates the length matrix for gene sequences.
- Local Approach: Constructs genealogy trees using pairwise similarity and a bottom-up strategy.
- Global Approach: Constructs trees using a global similarity matrix and dynamic programming techniques.
- Visualization: Displays hierarchical trees and similarity matrices for interpretation.
- Probability Calculation: Calculates the probabilities of insertion, deletion, and mutation in gene sequences using the edit distance algorithm.
- Efficient LCS matrix computation.
- Dynamic programming-based similarity inference.
- Modular code with a focus on clarity and tree construction logic.
- Includes annotated explanations in markdown cells.
- Python 3
- Jupyter Notebook
matplotlib(for visualization)- Standard libraries like
itertoolsandnumpy
- Clone the repository:
git clone https://github.com/ganbnuray/Genealogy-Tree-Reconstruction-Using-DNA-sequences.git
- Install required packages (if not already available):
pip install matplotlib numpy
- Open the notebook:
jupyter notebook genetreereconstruction.ipynb