This repository provides an implementation of a tool for constructing variation graphs using Locally Consistent Parsing (LCP) tool. It processes genomic sequences from FASTA files, integrates variations from VCF files, and generates variation graphs in rGFA/GFA format for genome assembly and analysis.
- Efficient Variation Graph Construction: Utilizes LCP cores to represent sequences and variations.
- Variation Representation: Creates bubble structures in the graph to represent sequence differences.
- rGFA/GFA Output: Generates graphs in rGFA/GFA format, suitable for graph-based genome analysis.
Clone this repository and use the provided Makefile to build the project.
git clone --recursive --depth 1 https://github.com/BilkentCompGen/lcpan.git
cd lcpan
# install lcptools
make install
# compile lcpan
makeYou can run make clean command to remove cleanup binaries and the executable.
Run the tool using the following command-line options:
./lcpan [PROGRAM] [OPTIONS]Program:
-vg: Constructs a variation graph using a reference genome and VCF. In this mode, the initial partitioning is done with LCP, and each segment is further divided into sub-segments if variations are present.-vgx: Constructs an expanded variation graph using a reference genome and VCF. In this mode, each variation is represented by an alternative arc, which connects the latest non-overlapping LCP core to the first LCP core afterward.
Options:
-r | --ref: Path to the input FASTA file.-v | --vcf: Path to the input VCF file.-p | --prefix: Prefix for the log and output file [default lcpan].-s | --no-verlap: Output overlapping gfa.-l | --level: LCP parsing level (integer) [default 5].-t | --thread: Thread number (integer) [default 1].-v | --verbose: Verbose [default false].--gfa: Output as graphical fragment assembly.--rgfa: Output as reference gfa [default].--skip-masked: Skit masked (N) characters. In this mode, segments will contain only nucleotides.--tload-factor: How much workload is assigned per thread relative to the pool size [default 2].
The lcpan tool runs in parallel, hence, it generates multiple output file. Note that these files are dependent, expect the first file (as it stores the partitioned reference genome). At the end of the program execution, you can run lcpan-merge.sh lcpan.log script that will merge all the files.
./lcpan -vg -r genome.fasta -v variations.vcf -p output -l 4
bash lcpan-merge.sh output.logThis command constructs a variation graph for the input FASTA and VCF files, applying LCP parsing at level 4 using single thread, and saves the result to various files. Then, you need to merge the files (which will be done by lcpan-merge.sh script).
If you use LCPan in your work, please cite:
- LCPan: efficient variation graph construction using Locally Consistent Parsing. Akmuhammet Ashyralyyev, Zülal Bingöl, Begüm Filiz Öz, Salem Malikic, Uzi Vishkin, S. Cenk Sahinalp, Can Alkan. arXiv: 2511.12205, 2025.
lcpan is released under the BSD 3-Clause License, which allows for redistribution and use in source and binary forms, with or without modification, under certain conditions. For more detailed terms, please refer to the license file.