scPlOver

Method scPlOver for inferring DNA content from overlapping fragments in scDNA-seq data.

COMING SOON: end-to-end pipeline to run scPlOver starting from a BAM file (including counting fragment overlaps and running HMMCopy)

Requirements

Tested version in parentheses (scPlOver is likely flexible to different versions)

Python (3.9)
numpy (1.23.0)
scipy (1.13.1)
pandas (2.1.4)
statsmodels (0.14.1)
anndata (0.10.3)
click (8.1.7)

Input

See test_input/FUCCI.h5ad for example.

Anndata of cells x bins, with obs indices containing cell IDs, var indices containing bins in chr:start-end format, and layers:

X containing read count
state containing integer copy number state
overlaps containing the number of fragment overlaps
overlap_bases containing the number of overlap bases
n_fragments containing the total number of fragments
mean_fragment_length containing the average fragment length

Required var fields:

chr: chromosome
start: bin start position
end: bin end position
gc: GC content
in_blacklist: flag indicating whether bin is present in blacklist
map: mappability (currently unused)

Usage

python run_scplover_adata.py \
  --adata <path> \
  --output_row <path> \
  --output_table <path> \
  --output_adata <path> \
  --cell_df_dir <dir> \
  [options]

Example usage

Running on test data (should take on the order of 10 minutes with 8 cores):

python run_scplover_adata.py \
  --adata test/FUCCI_test_input.h5ad \
  --output_row test/output/FUCCI_test_best_fits.csv \
  --output_table test/output/FUCCI_test_full_table.csv \
  --output_adata test/output/FUCCI_test_adata.h5ad \
  --cell_df_dir test/output/cell_dfs \
  --max_k 12 \
  --covariance_type full \
  --min_mean_scale 0.8 \
   --max_mean_scale 1.2 \
  --bases_dist_quantile 0.8 \
  --lowess_frac 0.2 \
  --fit_transitions \
  --iqr_threshold 2 \
  --min_bins_per_state 50
  --cores 8

Typical usage in paper for experimental datasets:

python run_scplover_adata.py --adata {input.adata} \
  --output_row {output.row} \
  --output_table {output.full_table} \
  --output_adata {output.adata} \
  --cell_df_dir {params.cell_df_dir} \
  --max_k 12 \
  --cells {params.cells_arg} \
  --iqr_threshold 2 \
  --covariance_type full \
  --correct_gc \
  --bases_dist_quantile 0.8 \
  --lowess_frac 0.2 \
  --min_mean_scale 0.8 \
  --max_mean_scale 1.2 \
  --means scale \
  --fit_transitions \
  --min_bins_per_state 50 \

Required arguments

Argument	Description
`--adata`	Path to input `.h5ad` file containing read counts and overlap bases per bin per cell
`--output_row`	Output CSV with one row per cell (best-scoring model result)
`--output_table`	Output CSV with all results across all ploidy initializations
`--output_adata`	Output `.h5ad` with `ghmm_state` layer added containing inferred copy number states
`--cell_df_dir`	Directory to write per-cell regression DataFrames (one CSV per cell)

Cell selection (mutually exclusive)

Argument	Default	Description
`--cells`	all cells	Comma-separated list of cell IDs to process
`--cells_file`	—	File with one cell ID per line to process

Model options

Argument	Default	Description
`--max_k`	`12`	Maximum copy number state (max supported: 29)
`--covariance_type`	`full`	Covariance structure: `full`, `diag`, or `spherical`
`--means`	`fixed`	Mean treatment: `fixed` (held at initial values), `scale` (learned per-feature scale factor), or `free` (fully learned)
`--fit_transitions`	`False`	If set, learn transition matrix from data (default: fixed)
`--include_increments`	`False`	If set, also explore ploidy initializations formed by integer increments from the input copy number states (otherwise, consider multiples only)

Mean scaling bounds (used when `--means scale`)

Argument	Default	Description
`--min_mean_scale`	`0`	Lower bound on per-feature mean scale factor
`--max_mean_scale`	`inf`	Upper bound on per-feature mean scale factor
`--scale_reads`	`False`	If set, apply mean scaling to the reads dimension; otherwise only the overlap bases dimension is scaled

Filtering options

Argument	Default	Description
`--iqr_threshold`	`None`	IQR multiplier for outlier removal per state (e.g. `5`); disabled if not set
`--min_bins_per_state`	`0`	Minimum number of bins a state must have; bins in rarer states are removed before fitting

GC correction options

Argument	Default	Description
`--correct_gc`	`False`	If set, apply LOWESS-based modal quantile GC correction to reads and overlap bases
`--lowess_frac`	`0.2`	Fraction of data used in LOWESS smoothing during GC correction
`--clip_corrected_values`	`False`	If set, clip GC-corrected values to a valid range
`--bases_dist_quantile`	`0.8`	Quantile of the overlap bases distribution used during GC correction

Performance

Argument	Default	Description
`--cores`	`1`	Number of parallel worker processes for fitting cells

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
test		test
LICENSE		LICENSE
README.md		README.md
run_scplover_adata.py		run_scplover_adata.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scPlOver

Requirements

Input

Usage

Example usage

Required arguments

Cell selection (mutually exclusive)

Model options

Mean scaling bounds (used when `--means scale`)

Filtering options

GC correction options

Performance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scPlOver

Requirements

Input

Usage

Example usage

Required arguments

Cell selection (mutually exclusive)

Model options

Mean scaling bounds (used when --means scale)

Filtering options

GC correction options

Performance

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Mean scaling bounds (used when `--means scale`)

Packages