Skip to content

🧬 Command-line tool for global protein/DNA sequence alignment using a simplified version of the Needleman-Wunsch algorithm

License

Notifications You must be signed in to change notification settings

stanuch/seq-global-align

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image Sequence Global Alignment Tool

License Python Status Stage Last Commit

Project Overview

This project is a command-line tool for performing global alignment of DNA or protein sequences using the Needleman-Wunsch algorithm. It allows users to:

  • Load sequences using FASTA format
  • Perform global alignment between two sequences
  • Read and display info included in FASTA files

Key Features

  • FASTA file input: Supports standard DNA/protein sequence files
  • Global alignment: Implements simplified Needleman-Wunsch algorithm for full-sequence comparison
  • Similarity: Calculates a similarity percentage score based on the optimal alignment of two sequences
  • Scoring system: Match, mismatch, and gap penalties, substitution matrix (BLOSUM62)

Needleman-Wunsch Algorithm

The algorithm works by dividing the global alignment problem into smaller subproblems. It uses a scoring system that assigns values for matches, mismatches, and gaps. Then, it systematically fills a matrix where each cell represents the best score achievable for aligning the sequences up to that point. After filling the matrix, the algorithm traces back from the last cell to the first to determine the optimal alignment.

The simplified version focuses on the core concepts:

  • Initialization of the scoring matrix with gap penalties.
  • Matrix filling using a basic scoring scheme for matches, mismatches, and gaps.
  • Traceback to reconstruct one optimal alignment, without considering multiple equally optimal solutions.

This simplified approach retains the main idea of finding a global alignment while being easier to understand and implement, making it ideal for educational purposes or quick sequence comparisons.

How to use

  1. Clone the repository:
git clone https://github.com/stanuch/seq-global-align.git
cd dna-global-align
  1. Install dependencies
  • Make sure you have Python 3 installed, then run:
pip install -r requirements.txt
  1. Prepare your FASTA files
  • Place your sequence files in the sequences/ folder (already included in the repo).
  • Use the .fasta format.
  • When running the program, enter only the file name without the extension (e.g., for seq1.fasta, just type seq1).
  1. Run the program
python src/main.py
  1. Follow the prompts
  • Enter the names of the two FASTA files you want to align.
  • The program will read the sequences, perform a global alignment using the Needleman–Wunsch algorithm, and display the results in your terminal.

References

About

🧬 Command-line tool for global protein/DNA sequence alignment using a simplified version of the Needleman-Wunsch algorithm

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages