Panmethyl is a nextflow pipeline that maps epigenetic data from long-reads to pangenomes.
apptainer pull --arch amd64 panmethyl.sif library://cgroza/collection/panmethyl:latestIt takes the following inputs:
--out- directory in which panmethyl will write the output files.--bams- CSV file listing the BAM files to be mapped to the pangenome.
The format of this CSV file is:
sample,path
name1,path/to/bam1
name2,path/to/bam2These BAM files must be annotated with the appropriate epigenetic information.
For example, the location of modified bases must be encoded in the MM tag
and the likelihoods of modification must be encoded in the ML tag.
--graph- Pangenome in GFA format.--lift- Lift nucleotide positions from graph coordinates to path (assembly) coordinates. Assumes GFA withPlines. Useful for generating frequency distributions.--bed- BED file enumerating the intervals over which to summarize modification levels. Intervals must be in path (assembly) coordinates.--aligner- Choose which aligner will map the reads to the graph. Possibilities are `minigraph` and `GraphAligner`.--code- Modification code in hte MM/ML tags, according to the SAM spec. For example,C+m.--motif- Nucleotides (or dinucleotides) targeted by the modification. Possible values areA,T,G,C,CG.
Panmethyl outputs a CSV plain text file for each entry in
--bams. This file is a CSV file listing the graph node, the position of the
modified base, its strand, the coverage on the modified base, and the average
methylation level, encoded on a scale from 0 to 255 (as in the ML tag).
If using panmethyl, please cite:
- Groza, C., Ge, B., Cheung, W. A., Pastinen, T., & Bourque, G. (2025). Expanded methylome and quantitative trait loci detection by long-read profiling of personal DNA. Genome research, 35(4), 644–652. https://doi.org/10.1101/gr.279240.124