Skip to content

Commit ceb662f

Browse files
committed
Add reference and minor improvements in README.md
1 parent b5d4ec7 commit ceb662f

1 file changed

Lines changed: 84 additions & 108 deletions

File tree

README.md

Lines changed: 84 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
1-
![PyMaSC](https://raw.githubusercontent.com/ronin-gw/PyMaSC/master/pymasc.png)
1+
<div align="center">
2+
<img src="https://raw.githubusercontent.com/ronin-gw/PyMaSC/master/pymasc.png" alt="PyMaSC Logo">
3+
</div>
24

35
PyMaSC
46
======
@@ -15,7 +17,28 @@ PyMaSC
1517
Python implementation to calc mappability-sensitive cross-correlation
1618
for fragment length estimation and quality control for ChIP-Seq.
1719

18-
Visit [PyMaSC web site](https://pymasc.sb.ecei.tohoku.ac.jp/) for more information and to get human genome mappability tracks.
20+
🐾 Visit [PyMaSC web site](https://pymasc.sb.ecei.tohoku.ac.jp/) for more information and to get human genome mappability tracks.
21+
22+
📃 If you use this software in your work, please cite the following paper.
23+
24+
> Anzawa, Hayato, Hitoshi Yamagata, and Kengo Kinoshita. "Theoretical characterisation of strand cross-correlation in ChIP-seq." BMC bioinformatics 21.1 (2020): 417. https://doi.org/10.1186/s12859-020-03729-6
25+
26+
* * *
27+
28+
<ol>
29+
<li><a href="#introduction">Introduction</a></li>
30+
<li><a href="#install">Install</a></li>
31+
<li>
32+
<a href="#usage">Usage</a>
33+
<ul>
34+
<li><a href="#pymasc-command">pymasc command</a></li>
35+
<li><a href="#pymasc-precalc-command">pymasc-precalc command</a></li>
36+
<li><a href="#pymasc-plot-command">pymasc-plot command</a></li>
37+
</ul>
38+
</li>
39+
<li><a href="#computation-details">Computation details</a></li>
40+
<li><a href="#references">References</a></li>
41+
</ol>
1942

2043
* * *
2144

@@ -110,129 +133,75 @@ SAM and BAM file format are acceptable.
110133

111134
#### Output files
112135

113-
#### General options
114-
115-
##### -v / --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
116-
Set logging message level. (Default: info)
136+
| Name | Description |
137+
|------------------|-------------|
138+
| `*_stats.tab` | Tab-delimited run summary includes statistics like NSC (normalized strand coefficient), RSC (relative strand coefficient) and VSN (virtual S/N ratio).|
139+
| `*.pdf` | A multipage figure summarizing the run: cross-correlation curves (naïve and MaSC) versus shift, with the inferred fragment length highlighted.|
140+
| `*_cc.tab` | Naïve strand cross-correlation coefficients by shift (rows). Columns are `shift` (in bp), `whole` (all chromosomes), followed by per-chromosome values. |
141+
| `*_mscc.tab` | Mappability-sensitive cross-correlation (MSCC) coefficients by shift with the same layout as `*_cc.tab`. Produced only when a mappability BigWig is supplied. |
142+
| `*_nreads.tab` | Positive/negative strand read counts reported as `pos-neg` pairs for `whole` and per chromosome. The `raw` row reports number of reads. If mappability is supplied, numbers of reads in doubly mappable positions at each shift are also reported. |
117143

118-
##### --disable-progress
119-
Disable progress bars.
120-
Note that progress bar will be disabled automatically if stderr is not connected to terminal.
144+
Additionaly, PyMaSC generates a JSON file as cache for mappability analyses. See the `--mappability-stats` option for details.
121145

122-
##### --color {TRUE,FALSE}
123-
Switch coloring log output. (Default: auto; enable if stderr is connected to terminal)
124-
125-
#### --version
126-
Show program's version number and exit
146+
#### General options
127147

148+
| Option & Argument | Description | Default |
149+
|------------------------|-------------|---------|
150+
| `-v, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}` | Set logging message level. | `INFO` |
151+
| `--disable-progress` | Disable progress bars. Note that progress bar will be disabled automatically if stderr is not connected to terminal. | auto |
152+
| `--color {TRUE,FALSE}` | Switch coloring log output. | auto (enable if stderr is connected to terminal) |
153+
| `--version` | Show program's version number and exit | |
128154

129155
#### Processing settings
130156

131-
##### -p / --process [int]
132-
Set number of worker process. (Default: 1)
133-
For indexed BAM file, PyMaSC parallel process each reference (chromosome).
134-
135-
#### --successive
136-
Calc with successive algorithm instead of bitarray implementation (Default: false)
137-
Bitarray implementation is recommended in most situation. See `Computation details`
138-
for more information.
139-
140-
##### --skip-ncc
141-
Both `-m/--mappability` and `--skip-ncc` specified, PyMaSC skips calculate naïve cross-correlation
142-
and calculates only mappability-sensitive cross-correlation. (Default: False)
143-
144-
##### --skip-plots
145-
Skip output figures. (Default: False)
146-
157+
| Option & Argument | Description | Default |
158+
|----------------------|-------------|---------|
159+
| `-p/--process [int]` | Set number of worker process. For indexed BAM file, PyMaSC parallel process each reference (chromosome). | 1 |
160+
| `--successive` | Calc with successive algorithm instead of bitarray implementation Bitarray implementation is recommended in most situation. See `Computation details` for more information. | off (use bit array implementation) |
161+
| `--skip-ncc` | Both `-m/--mappability` and `--skip-ncc` specified, PyMaSC skips calculate naïve cross-correlation and calculates only mappability-sensitive cross-correlation. | off |
162+
| `--skip-plots` | Skip output figures. | off |
147163

148164
#### Input alignment file settings
149165

150-
##### -r / --read-length [int]
151-
Specify read length explicitly. (Default: get representative by scanning)
152-
PyMaSC needs representative value of read length to plot figures and to calc
153-
mappability-sensitive cross-correlation. By default, PyMaSC scans input file
154-
read length to get representative read length. If read length is specified, PyMaSC
155-
skips this step.
156-
Note that this option must be specified to treat unseekable input (like stdin).
157-
158-
##### --readlen-estimator {MEAN,MEDIAN,MODE,MIN,MAX}
159-
Specify how to get representative value of read length. (Default: median)
160-
161-
##### -l / --library-length
162-
Specify expected fragment length. (Default: None)
163-
PyMaSC supplies additional NSC and RSC values calculated from this value.
164-
166+
| Option & Argument | Description | Default |
167+
|---------------------------|-------------|---------|
168+
| `-r, --read-length [int]` | Specify read length explicitly (Default: get representative by scanning). PyMaSC needs representative value of read length to plot figures and to calc mappability-sensitive cross-correlation. By default, PyMaSC scans input file read length to get representative read length. If read length is specified, PyMaSC skips this step. Note that this option must be specified to treat unseekable input (like stdin). | auto |
169+
| `--readlen-estimator {MEAN,MEDIAN,MODE,MIN,MAX}` | Specify how to get representative value of read length.| median |
170+
| `-l, --library-length` | Specify expected fragment length. PyMaSC supplies additional NSC and RSC values calculated from this value. | None |
165171

166172
#### Input mappability file settings
167173

168-
##### -m / --mappability [BigWig file]
169-
Specify mappability (alignability, uniqueness) track to calculate mappability-sensitive
170-
cross-correlation.
171-
Input file must be BigWig format and each track's score should indicate mappability
172-
in [0, 1] (1 means uniquely mappable position).
173-
If BigWig file is not supplied, PyMaSC will calculate only naïve cross-correlation.
174-
175-
##### --mappability-stats [json file]
176-
Read and save path to the json file which contains mappability region statistics.
177-
(Default: same place, same base name as the mappability BigWig file)
178-
If there is no statistics file for specified BigWig file, PyMaSC calculate total
179-
length of doubly mappable region for each shift size automatically and save them
180-
to reuse for next calculation and faster computing.
181-
`pymasc-precalc` performs this calculation for specified BigWig file (this is not
182-
necessary, of course).
183-
174+
| Option & Argument | Description | Default |
175+
|-----------------------------------|-------------|---------|
176+
| `-m, --mappability [BigWig file]` | Specify mappability (alignability, uniqueness) track to calculate mappability-sensitive cross-correlation. Input file must be BigWig format and each track's score should indicate mappability in [0, 1] (1 means uniquely mappable position). If BigWig file is not supplied, PyMaSC will calculate only naïve cross-correlation. | |
177+
| `--mappability-stats [json file]` | Read and save path to the json file which contains mappability region statistics. If there is no statistics file for specified BigWig file, PyMaSC calculate total length of doubly mappable region for each shift size automatically and save them to reuse for next calculation and faster computing. `pymasc-precalc` performs this calculation for specified BigWig file (this is not necessary, of course). | auto (same place, same base name as the mappability BigWig file) |
184178

185179
#### Input file filtering arguments
186180

187-
##### -q / --mapq [int]
188-
Input reads which mapping quality less than specified score will be discarded. (Default: 1)
189-
MAPQ >= 1 is recommended because MAPQ=0 contains multiple hit reads.
190-
191-
##### -i / --include-chrom [pattern ...]
192-
Specify chromosomes to calculate. Unix shell-style wildcards (`.`, `*`, `[]` and `[!]`)
193-
are acceptable. This option can be declared multiple times to re-include chromosomes
194-
specified in a just before -e/--exclude-chrom option. Note that this option is case-sensitive.
195-
196-
##### -e / --exclude-chrom [pattern ...]
197-
As same as the -i/--include-chrom option, specify chromosomes to exclude from calculation.
198-
This option can be declared multiple times to re-exclude chromosomes specified in
199-
a just before -i/--include-chrom option.
200-
181+
| Option & Argument | Description | Default |
182+
|-------------------------------------|-------------|---------|
183+
| `-q, --mapq [int]` | Input reads which mapping quality less than specified score will be discarded. MAPQ >= 1 is recommended because MAPQ=0 contains multiple hit reads. | 1 |
184+
| `-i, --include-chrom [pattern ...]` | Specify chromosomes to calculate. Unix shell-style wildcards (`.`, `*`, `[]` and `[!]`) are acceptable. This option can be declared multiple times to re-include chromosomes specified in a just before `-e/--exclude-chrom` option. Note that this option is case-sensitive. | |
185+
| `-e, --exclude-chrom [pattern ...]` | As same as the `-i/--include-chrom` option, specify chromosomes to exclude from calculation. This option can be declared multiple times to re-exclude chromosomes specified in a just before `-i/--include-chrom` option. | |
201186

202187
#### Analysis Parameters
203188

204-
##### -d / --max-shift [int]
205-
PyMaSC calculate cross-correlation with shift size from 0 to this value. (Default: 1000)
206-
207-
##### --chi2-pval [float]
208-
P-value threshold to check strand specificity. (Default: 0.05)
209-
PyMaSC performs chi-square test between number of reads mapped to positive- and negative-strand.
210-
211-
##### -w / --smooth-window [int]
212-
Before mean fragment length estimation, PyMaSC applies moving average filter to
213-
mappability-sensitive cross-correlation. This option specify filter's window size.
214-
(Default: 15)
215-
216-
##### --mask-size [int]
217-
If difference between a read length and the estimated library length is equal or
218-
less than the length specified by this option, PyMaSC masks correlation coefficients
219-
in the read length +/- specified length and try to estimate mean library length again.
220-
(Default: 5, Specify < 1 to disable)
221-
222-
##### --bg-avr-width [int]
223-
To obtain the minimum coefficients of cross-correlation, PyMaSC gets the median
224-
of the end of specified bases from calculated cross-correlation coefficients.
225-
(Default: 50bp)
189+
| Option & Argument | Description | Default |
190+
|-----------------------------|-------------|---------|
191+
| `-d, --max-shift [int]` | PyMaSC calculate cross-correlation with shift size from 0 to this value. | 1000 |
192+
| `--chi2-pval [float]` | P-value threshold to check strand specificity. PyMaSC performs chi-square test between number of reads mapped to positive- and negative-strand. | 0.05 |
193+
| `-w, --smooth-window [int]` | Before mean fragment length estimation, PyMaSC applies moving average filter to mappability-sensitive cross-correlation. This option specify filter's window size. | 15 |
194+
| `--mask-size [int]` | If difference between a read length and the estimated library length is equal or less than the length specified by this option, PyMaSC masks correlation coefficients in the read length +/- specified length and try to estimate mean library length again. Specify < 1 to disable. | 5 |
195+
| `--bg-avr-width [int]` | To obtain the minimum coefficients of cross-correlation, PyMaSC gets the median of the end of specified bases from calculated cross-correlation coefficients. | 50 |
226196

227197
#### Output options
228198

229-
##### -o / --outdir [path]
230-
Specify output directory. (Default: current directory)
231-
232-
##### -n / --name [NAME...]
233-
By default, output files are written to `outdir/input_file_base_name`. This option
234-
overwrite output file base name.
199+
| Option & Argument | Description | Default |
200+
|------------------------|-------------|---------|
201+
| `-o, --outdir [path]` | Specify output directory. | (current directory) |
202+
| `-n, --name [NAME...]` | By default, output files are written to `outdir/input_file_base_name`. This option overwrite output file base name. | (input_file_base_name) |
235203

204+
---
236205

237206
### `pymasc-precalc` command
238207

@@ -248,7 +217,7 @@ overwrite output file base name.
248217
[-r MAX_READLEN]
249218

250219
#### Usage example
251-
Calculate total length of doubly mappable region.
220+
Calculate total length of doubly mappable regions.
252221
`wgEncodeCrgMapabilityAlign36mer_mappability.json` will be write.
253222

254223
$ pymasc -p 4 -r 50 -d 1000 -m wgEncodeCrgMapabilityAlign36mer.bigWig
@@ -259,6 +228,7 @@ Note that actual max shift size is,
259228
- 0 to `read_length` (if `max_shift` < `read_len` * 2)
260229
- 0 to `max_shift` - `read_len` + 1 (if `max_shift` => `read_len` * 2)
261230

231+
---
262232

263233
### `pymasc-plot` command
264234

@@ -297,7 +267,7 @@ was generated for a mappability BigWig file by PyMaSC to obtain mappable region
297267
#### Input argument
298268
Specify a prefix to `pymasc` output files. For example, set `output/ENCFF000VPI`
299269
to plot figures from `output/ENCFF000VPI_stats.tab`, `output/ENCFF000VPI_nreads.tab`
300-
and `output/ENCFF000VPI_cc.tab` (and/or `output/ENCFF000VPI_masc.tab`). `*_stats.tab`
270+
and `output/ENCFF000VPI_cc.tab` (and/or `output/ENCFF000VPI_mscc.tab`). `*_stats.tab`
301271
and either or both of `*_cc.tab` and `*_mscc.tab` must be exist.
302272
To specify these files individually, use `--stats`, `--nreads`, `--cc` and `--masc`
303273
options.
@@ -326,6 +296,12 @@ efficiency and robustness for shift size.
326296

327297
References
328298
----------
329-
* Ramachandran, Parameswaran, et al. "MaSC: mappability-sensitive cross-correlation
330-
for estimating mean fragment length of single-end short-read sequencing data."
331-
Bioinformatics 29.4 (2013): 444-450.
299+
* PyMaSC paper
300+
* Anzawa, Hayato, Hitoshi Yamagata, and Kengo Kinoshita.
301+
"Theoretical characterisation of strand cross-correlation in ChIP-seq."
302+
BMC bioinformatics 21.1 (2020): 417.
303+
* Original paper on the MaSC algorithm
304+
* Ramachandran, Parameswaran, et al.
305+
"MaSC: mappability-sensitive cross-correlation for estimating
306+
mean fragment length of single-end short-read sequencing data."
307+
Bioinformatics 29.4 (2013): 444-450.

0 commit comments

Comments
 (0)