CLI
VCF Input Formats
Outputs
Benchmarking
Annotated VCF
Stats
Plots
Consensus
Annotated VCF
Stats
The full list of verix parameters is provided below.
verix bench: compares a query VCF against a target (truth) VCF
usage: verix bench [options] -q QUERY_VCF -t TARGET_VCF -o OUTPUT_DIR
params:
-h, --help show this help message and exit
-q, --query VCF file with query SVs
-t, --target VCF file with target SVs
--plot Generate PDF report with summary plots (default: False)
-o, --output_dir Output directory
-d, --match_thr Max distance between matching breakpoints (default: 500)
-s, --sizemin Minimum SV interval size (default: 0)
-S, --sizemax Maximum SV interval size (default: None)
-b, --merge_thr Collapse breakends in a CSV within this distance into a single breakpoint (default: 1)
--enforce_type Require SV types to match (default: False)
--enforce_genotype Require SV genotypes to match (default: False)
-f, --formats {multi,single,default} [{multi,single,default} ...]
Format type for each VCF (for bench: query, target) (default: default[default...])
-l, --csv_links LINK [LINK ...]
INFO field for CSV linking in each VCF (for bench: query, target) (default: [])
-svt, --types TYPES [TYPES ...]
INFO field for SV type extraction (default: SVTYPE[SVTYPE...])verix consensus: merges multiple VCF files into a single consensus callset
usage: verix consensus [options] -i VCF [VCF ...] -o OUTPUT_DIR
params:
-h, --help show this help message and exit
-i, --inputs VCF [VCF ...] List of VCF files to merge
-n, --names NAME [NAME ...] Ordered list of names for each VCF (default: [])
-o, --output_dir Output directory
-d, --match_thr Max distance between matching breakpoints (default: 500)
-s, --sizemin Minimum SV interval size (default: 0)
-S, --sizemax Maximum SV interval size (default: None)
-b, --merge_thr Collapse breakends in a CSV within this distance into a single breakpoint (default: 1)
--enforce_type Require SV types to match (default: False)
--enforce_genotype Require SV genotypes to match (default: False)
-f, --formats {multi,single,default} [{multi,single,default} ...]
Format type for each VCF (for bench: query, target) (default: default[default...])
-l, --csv_links LINK [LINK ...]
INFO field for CSV linking in each VCF (for bench: query, target) (default: [])
-svt, --types TYPES [TYPES ...]
INFO field for SV type extraction (default: SVTYPE[SVTYPE...])-
--match_thr: a pair of breakpoints is considered a candidate match if they are at most this many base pairs apart -
--merge_thr: breakends from the same CSV record within this distance are collapsed into a single breakpoint at the median position -
--sizemin/--sizemax: size filters applied to intervals between consecutive breakpoints on the same chromosome within a CSV; events with at least one interval smaller than--sizeminor larger than--sizemaxare removed
verix assembles CSVs by linking related VCF records. The --formats parameter can be used to specify
how to group input records in each VCF:
-
default: collects breakpoints fromPOS,END, theTARGETINFO field, and BND mates in the ALT field -
single: usesdefaultfields and a custom INFO field containing internal breakpoints specified using--csv_links(see details on the field format below) -
multi:usesdefaultfields and links records that share the same ID specified as a custom INFO field using--csv_links
Note: formats can be mixed across inputs but not within the same VCF file.
Genotype parsing: if each input VCF file contains multiple samples, only the genotype of the first sample is retained.
SV type parsing: the SVTYPE field (or the field specified via --types) is used to extract the type from each
linked record; if a CSV spans records with multiple distinct types, the values are joined with + in sorted order
to form a consolidated type string.
The INFO field provided using --csv_links (storing the internal breakpoints of a CSV record) is expected in the
following format:
<prefix>-<bp_1>[-<bp_2>...]
where each <bp_i> is either <chr>:<pos> or just <pos> (defaults to the record's CHROM).
Note: the leading <prefix> is ignored by the verix parser (anything before the first - is discarded )
All verix outputs are written to the --output_dir folder, which is created if it doesn't exist.
Each run also produces a main.log file recording input parameters and execution details.
The outputs of each command are described below.
The bench command generates the following files:
matches.vcf: query SVs annotated with match information (see below)report.json: summary statisticsreport.pdf: diagnostic plots (only with--plot)query.vcfandtarget.vcf: SVs assembled from the query and target input VCFs, respectively (in asinglerecord VCF format, with all breakpoints listed in theBKPSfield)
Contains the assembled query SVs (one line per SV). Each record includes:
SVTYPE: consolidated SV typeBKPS: comma-separated list of all consolidated breakpoints (chr:pos,...)CHROM2: chromosome of the last breakpoint, set only when it differs from the record'sCHROM
Match-related INFO fields:
BEST_MATCH_CLASS: overall match classification, one of:complete: every query breakpoint matched every target breakpointpartial: some query breakpoints matched and either the rest matched no target or target has extra pointsaggregate: distinct subsets of query breakpoints matched multiple distinct target eventsspurious: no query breakpoints matched any target
SPURIOUS: number of query breakpoints that matched no target (omitted when 0)FRAGMENTED: flag set when the optimal target was also matched by other queries (only for target breakpoints that were not matched by this query)MATCHES: pipe-separated list of all candidate alignments considered (see format below)
For records not classified as spurious, the following fields describe the optimal alignment:
BEST_MATCH_ID,BEST_MATCH_TYPE: target event ID and SV typeBEST_N_MATCHED,BEST_BND_DIST: number of matched breakpoints and total breakpoint distanceBEST_MATCH_COV:fullif every target breakpoint was matched,partialotherwise
Each alignment entry in MATCHES is a comma-separated tuple:
<target_ID>,<target_SVTYPE>,<n_matched>,<bp_dist>,<target_coverage>,<bp_pair_1>,<bp_pair_2>,...
where:
<target_ID>,<target_SVTYPE>: ID and SV type of the matched target event<n_matched>: number of matched breakpoints<bp_dist>: total breakpoint distance<coverage>:fullif every target breakpoint was matched,partialotherwise<bp_pair_i>: matched breakpoint pair<query_chr:pos>-<target_chr:pos>(e.g.chr1:817452-chr1:817452)
A query CSV on chr2 with four breakpoints, classified as aggregate because two distinct subsets of
its breakpoints align to two different target events, the best alignment has the smaller breakpoint distance:
chr2 1450200 q_42 END=1492800;SVTYPE=INVDUP;BKPS=chr2:1450200,chr2:1471050,chr2:1488300,chr2:1492800;
BEST_MATCH_CLASS=aggregate;BEST_MATCH_ID=truth_88;BEST_MATCH_TYPE=INV;BEST_MATCH_COV=full;BEST_N_MATCHED=2;BEST_BND_DIST=15;
MATCHES=truth_88,INV,2,15,full,chr2:1450200-chr2:1450195,chr2:1471050-chr2:1471060|truth_91,DUP,2,42,full,chr2:1488300-chr2:1488280,chr2:1492800-chr2:1492840
A report with various summary statistics, including:
n_query: number of query SVs after parsing and size filteringn_target: number of target SVs after parsing and size filteringtp-query,tp-target,fp,fn,precision,recall,f1; note: onlycompletematches count as TPclass_proportions: shows the fraction of query events in each match categoryby_classmaps each non-spurious match category to metrics computed over the optimal alignment of each query in this category:num_matches: number of query recordsmean_breakpoint_distance: mean per-breakpoint distancemean_breakpoint_hit_rate: mean fraction of target breakpoints matchedmean_spurious_breakpoint_rate: mean fraction of query breakpoints that matched no targetmean_targets_per_record: mean number of distinct target events appearing in the candidate alignments of each queryquery_type_counts: count of query records in this category broken down by predicted SV typequery_type_proportions: per SV type, the fraction of all query records of that type that fell into this categorytarget_type_counts: count of unique target events in this category broken down by truthset SV typetarget_type_proportions: per SV type, the fraction of all target events of that type that fell into this category
With --plot, verix bench writes a multi-page PDF with various diagnostic plots summarizing how query events
aligned to the target, which includes breakpoint-level accuracy and coverage, the prevalence of spurious and fragmented
calls, the correspondence between query and target SV types -- stratified by SV type and match category.
The consensus command generates the following files:
merged.vcf: the consensus callset (see below)report.json: summary statistics
Contains one record per SV cluster (group of matching SVs); the SV with the most breakpoints
is used as the cluster representative (its CHROM, POS, END, SVTYPE, BKPS, and optionally CHROM2
are used to write the record).
Consensus-related INFO fields:
SUPPORT: number of distinct input callsets contributing SVs to this clusterSUPPORT_COUNT: comma-separated count vector, one entry per input VCF, giving the number of SVs contributed to this cluster by this inputSUPPORT_BINARY: concatenated 0/1 string, where 1 indicates that the input VCF has SV in this cluster, 0 otherwise
A per-sample REC FORMAT field lists the input records merged into each consensus call.
For each input sample, the value is either . (no contribution) or a |-separated list of records, each formatted
as <sv_id>,<sv_type>,<bp1>,<bp2>,....
A report with various input and consensus-based stats, including:
n_total_variants: total number of variants summed across all input callsets (after parsing and size filtering)n_variants_in_sample: maps each input sample name to its variant countn_clusters: number of consensus clusterssupport_vec_types: list of distinct support patterns observed across clustersmax_cluster_size,min_cluster_size: maximum and minimum total number of input SVs contained in any cluster