-
Notifications
You must be signed in to change notification settings - Fork 41
Using the Tool SBS ID DBS Input
S M Ashiqul Islam edited this page Jan 27, 2026
·
1 revision
SigProfilerMatrixGenerator generates mutational matrices for Single Base Substitutions (SBS), Insertions/Deletions (ID), and Doublet Base Substitutions (DBS) from input variant files.
From within a Python session, generate matrices as follows:
python3
>>> from SigProfilerMatrixGenerator.scripts import SigProfilerMatrixGeneratorFunc as matGen
>>> matrices = matGen.SigProfilerMatrixGeneratorFunc("test", "GRCh37", "/Users/test/Desktop/test/")From within an R session:
R
> library("reticulate")
> use_python("path_to_your_python3")
> py_config()
> library("SigProfilerMatrixGeneratorR")
> matrices <- SigProfilerMatrixGeneratorR("test", "GRCh37", "/Users/test/Desktop/test/")| Parameter | Type | Description | Example |
|---|---|---|---|
project |
string | Project name for this instance of matrix generation | "alexandrov_lab_test_1" |
genome |
string | Reference genome to use | "GRCh37" |
vcfFiles |
string | Full path to the input files folder | "/Users/test/Desktop/test/" |
| Parameter | Type | Default | Description |
|---|---|---|---|
exome |
boolean | False |
Downsamples mutational matrices to exome regions |
bed_file |
string | None |
Path to BED file for custom region downsampling |
chrom_based |
boolean | False |
Outputs chromosome-based matrices |
plot |
boolean | False |
Integrates with SigProfilerPlotting for visualizations |
tsb_stat |
boolean | False |
Outputs transcriptional strand bias test results |
seqInfo |
boolean | False |
Outputs original mutations with SigProfilerMatrixGenerator classification |
cushion |
integer | 100 |
Adds Xbp cushion to exome/bed_file ranges |
Note: All string arguments must be surrounded by quotation marks (e.g., "test"), and all boolean arguments must be True or False.
This tool supports the following input formats:
| Format | Description | Example |
|---|---|---|
| MAF | Mutation Annotation Format | example.maf |
| VCF | Variant Call Format (each sample as separate file) | example.vcf |
| ICGC | ICGC submission format | ICGC docs |
| Simple text | Tab-delimited text file | See example below |
The simple text format requires the following columns:
- Sample name
- Chromosome
- Position
- Reference allele
- Alternate allele
The final output is divided into three folders:
Contains copies of the user-provided input files.
Contains error and log files for the submitted job:
sigProfilerMatrixGenerator_[project]_[genome].errsigProfilerMatrixGenerator_[project]_[genome].out
Contains the following subfolders:
- DBS/ - Doublet base substitution matrices
- SBS/ - Single base substitution matrices
- ID/ - Insertion/deletion matrices
- TSB/ - Transcriptional strand bias results
- plots/ - Generated visualizations
- vcf_files/ - Processed VCF files
Output files have extensions indicating which arguments were passed:
| Extension | Description |
|---|---|
.all |
Default - all mutations |
.exome |
Mutations mapped to exome regions (exome=True) |
.region |
Mutations mapped to custom BED file regions |
.chrX |
Chromosome-specific mutations (chrom_based=True) |