Skip to content

sdegeorgia/LAAVA-summary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

LAAVA: Long-read Adeno-Associated Virus Analysis

LAAVA is a comprehensive internal pipeline for detecting, characterizing, and visualizing recombinant AAV (rAAV) integration sites using Oxford Nanopore long-read sequencing data.

⚠️ This repository is a summary only. The full pipeline code is not publicly available due to licensing restrictions and internal use.


Overview

LAAVA supports scalable rAAV integration analysis from raw FASTQ reads through to annotated visual summaries across multiple gene therapy research projects.

Pipeline Summary

  1. Primary alignment to project-specific AAV transgene or plasmid sequences
  2. Secondary alignment to full reference genome (e.g., mm39)
  3. Assembly of target-mapped reads using Canu
  4. Integration site detection and chromosomal distribution analysis
  5. Visualization of mapping rates, read characteristics, and coverage profiles

My Role

  • Designed and implemented the pipeline core using Bash, R, and shell scripts
  • Automated dual-alignment strategy with metadata-driven sample handling
  • Built modular project structure to support different targets across studies
  • Developed Docker-compatible workflow for HPC (LSF) execution
  • Generated publication-ready plots and integration summaries in R

Tools & Technologies

Tool Purpose
minimap2 Long-read alignment to target and genome references
samtools BAM processing and QC
Canu Local assembly of high-coverage integration loci
R Summary statistics and ggplot2 visualizations
Docker Containerization for reproducible, portable execution
LSF HPC job scheduling

Input Files

  • Sample metadata CSV — defines sample names, references, and FASTQ paths
  • FASTQ files — stored in a central fasta_files/ directory
  • Target reference FASTA files — e.g., rAAV, transgene, or plasmid
  • Full genome FASTA — for secondary genome-wide integration alignment

Output Summary

  • BAMs for primary and secondary alignments
  • BED files for predicted integration coordinates
  • Coverage reports and per-sample summary statistics
  • Plots:
    • combined_capture_efficiency_plot.png
    • secondary_alignment_mapping_rates.png
    • combined_coverage_plot.png
    • violin_plot.png (read length distributions)

Integration Site Analysis

  • Hotspot identification from BED coordinates
  • Chromosome-level distribution analysis
  • Coverage threshold tracking at 25x, 50x, 100x, etc.
  • Comparative read mapping efficiency across multiple samples

Interpretation Examples

Observation Interpretation
High target alignment + low genome mapping High integration specificity
Widespread genome mapping + hotspots Off-target insertion sites present
Low target alignment Poor capture / sample quality

Citation

If referencing this pipeline in your work:

Sophia DeGeorgia. LAAVA: Long-read AAV Integration Analysis Pipeline. Internal research tool. 2024.
Tools include: minimap2, samtools, Canu, ggplot2

Also cite key dependencies:

  • Li H. Minimap2. Bioinformatics (2018).
  • Koren S. et al. Canu. Genome Research (2017).
  • Li H. et al. SAMtools. Bioinformatics (2009).

Disclaimer

This repository summarizes an internally developed bioinformatics pipeline.
The source code is not open-source and is not available for distribution.


Related Projects

About

A comprehensive pipeline for detecting, characterizing, and visualizing recombinant AAV (rAAV) integration sites using Oxford Nanopore long-read sequencing data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors