Skip to content

SFGLab/SvPhaser

SvPhaser

Haplotype-aware structural-variant (SV) phasing and genotyping from long-read data

PyPI version Python License


SvPhaser assigns haplotype-aware genotypes to pre-called structural variants (SVs) using HP-tagged long-read alignments (PacBio HiFi, ONT Q20+, etc.).

It fills a critical gap in long-read SV analysis:

  • SV callers (e.g. Sniffles2) discover variants
  • SvPhaser phases and genotypes them (1|0, 0|1, 1|1, or ./.)
  • with explicit read-level evidence and a quantitative genotype quality (GQ)

SvPhaser is caller-agnostic, deterministic, and designed for large-scale benchmarking and biological interpretation.


Key features

  • Post-hoc SV phasing from HP-tagged BAM/CRAM (no re-calling required)
  • Per-chromosome parallelization (efficient on HPC and multi-core systems)
  • SV-type-aware evidence detection (DEL / INS / INV / BND / DUP)
  • Deterministic Δ-based decision logic (no HMMs, no sampling)
  • Explicit confidence modeling via GQ and reason codes
  • CSV-first design for transparent benchmarking and debugging
  • VCF-compliant output with rich SVP_* INFO annotations

Installation

From PyPI (recommended)

# Requires Python >= 3.9
pip install svphaser

Optional extras:

pip install "svphaser[plots]"   # plotting utilities
pip install "svphaser[bench]"   # benchmarking helpers
pip install "svphaser[dev]"     # development + linting

From source

git clone https://github.com/SFGLab/SvPhaser.git
cd SvPhaser
pip install -e .

Inputs & requirements

SvPhaser requires two inputs only:

  1. Unphased SV VCF (.vcf / .vcf.gz)

    • Produced by an SV caller (e.g. Sniffles2)
    • May optionally contain RNAMES INFO for precise read support
  2. HP-tagged BAM/CRAM

    • Long-read alignments with haplotype tags (HP=1/2)
    • Generated by an upstream phasing pipeline (e.g. WhatsHap)

⚠️ If the BAM does not contain HP tags, SvPhaser cannot assign haplotypes.


Quick start (CLI)

svphaser phase \
  sample_unphased.vcf.gz \
  sample.sorted_phased.bam \
  --out-dir results/ \
  --min-support 10 \
  --min-tagged-support 3 \
  --major-delta 0.60 \
  --equal-delta 0.10 \
  --support-mode hybrid \
  --dynamic-window \
  --tie-to-hom-alt \
  --gq-bins "30:High,10:Moderate" \
  --threads 32

Outputs

For an input sample.vcf.gz, SvPhaser produces:

  • sample_phased.csvprimary analysis artifact

    • Per-SV read support (hp1, hp2, nohp)
    • Derived metrics (tagged_total, support_total, Δ)
    • Final decisions (gt, gq, reason)
  • sample_phased.vcf(.gz) — interoperability output

    • FORMAT/GT, FORMAT/GQ
    • Optional SVP_* INFO annotations when --svp-info is enabled

The CSV is intended for benchmarking, visualization, and interpretation; the VCF is a downstream-consumable representation.


Algorithm & methodology

A full, implementation-faithful description of the algorithm—including:

  • evidence collection
  • haplotype decision logic
  • pseudoalgorithm
  • workflow diagram

is provided in:

➡️ docs/Methodology.md

This document is the authoritative reference for reviewers and users seeking algorithmic clarity.


Python API

from pathlib import Path
from svphaser.phasing.io import phase_vcf

phase_vcf(
    Path("sample.vcf.gz"),
    Path("sample.sorted_phased.bam"),
    out_dir=Path("results"),
    min_support=10,
    min_tagged_support=3,
    major_delta=0.60,
    equal_delta=0.10,
    support_mode="hybrid",
    dynamic_window=True,
    tie_to_hom_alt=True,
    gq_bins="30:High,10:Moderate",
    threads=8,
)

Repository structure

SvPhaser/
├─ src/svphaser/        # core package
├─ docs/                # methodology & design notes
├─ tests/               # unit + regression tests
├─ notebooks/           # benchmarking & analysis
├─ pyproject.toml
├─ README.md
└─ CHANGELOG.md

Citing SvPhaser

If SvPhaser contributes to your research, please cite:

@software{svphaser2026,
  author  = {Pranjul Mishra and Sachin Gadakh},
  title   = {SvPhaser: Haplotype-aware phasing of structural variants from long-read data},
  version = {2.1.x},
  year    = {2026},
  url     = {https://github.com/SFGLab/SvPhaser},
  note    = {PyPI: https://pypi.org/project/svphaser/}
}

For maximum reproducibility, include the exact git commit hash used.


License

SvPhaser is released under the MIT License — see LICENSE.


Contact

Developed at SFG Lab (BioAI).

Bug reports and feature requests: please open a GitHub issue.

About

Optimal Tool to Phase Structural Variants

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages