Skip to content

Commit a640384

Browse files
authored
Update README.md
1 parent f5865dc commit a640384

1 file changed

Lines changed: 17 additions & 7 deletions

File tree

README.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
[![Release](https://img.shields.io/github/v/release/bcgsc/nanosim?include_prereleases)](https://github.com/bcgsc/NanoSim/releases)
2-
[![Downloads](https://img.shields.io/github/downloads/bcgsc/Nanosim/total?logo=github)](https://github.com/bcgsc/NanoSim/archive/v2.5.0.zip)
2+
[![Downloads](https://img.shields.io/github/downloads/bcgsc/Nanosim/total?logo=github)](https://github.com/bcgsc/NanoSim/archive/v2.6.0.zip)
33
[![Conda](https://img.shields.io/conda/dn/bioconda/nanosim?label=Conda)](https://anaconda.org/bioconda/nanosim)
44
[![Stars](https://img.shields.io/github/stars/bcgsc/NanoSim.svg)](https://github.com/bcgsc/NanoSim/stargazers)
55

6-
![NanoSim](https://github.com/bcgsc/NanoSim/blob/master/NanoSim%20logo.png)
6+
![NanoSim](https://github.com/bcgsc/NanoSim/blob/master/NanoSim_logo.png)
77

88
NanoSim is a fast and scalable read simulator that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of nanopore sequencing technology.
99

@@ -246,7 +246,7 @@ usage: simulator.py genome [-h] -rg REF_G [-c MODEL_PREFIX] [-o OUTPUT]
246246
[-med MEDIAN_LEN] [-sd SD_LEN] [--seed SEED]
247247
[-k KMERBIAS] [-b {albacore,guppy,guppy-flipflop}]
248248
[-s STRANDNESS] [-dna_type {linear,circular}]
249-
[--perfect] [-t NUM_THREADS]
249+
[--perfect] [--fastq] [-t NUM_THREADS]
250250
251251
optional arguments:
252252
-h, --help show this help message and exit
@@ -285,6 +285,7 @@ optional arguments:
285285
Specify the dna type: circular OR linear (Default =
286286
linear)
287287
--perfect Ignore error profiles and simulate perfect reads
288+
--fastq Output fastq files instead of fasta files
288289
-t NUM_THREADS, --num_threads NUM_THREADS
289290
Number of threads for simulation (Default = 1)
290291
@@ -298,10 +299,10 @@ __transcriptome mode usage:__
298299
usage: simulator.py transcriptome [-h] -rt REF_T [-rg REF_G] -e EXP
299300
[-c MODEL_PREFIX] [-o OUTPUT] [-n NUMBER]
300301
[-max MAX_LEN] [-min MIN_LEN] [--seed SEED]
301-
[-k KMERBIAS] [-b {albacore, guppy}]
302+
[-k KMERBIAS] [-b {albacore,guppy}]
302303
[-r {dRNA,cDNA_1D,cDNA_1D2}] [-s STRANDNESS]
303-
[--no_model_ir] [--perfect] [-t NUM_THREADS]
304-
[--uracil]
304+
[--no_model_ir] [--perfect] [--polya POLYA]
305+
[--fastq] [-t NUM_THREADS] [--uracil]
305306
306307
optional arguments:
307308
-h, --help show this help message and exit
@@ -340,14 +341,18 @@ optional arguments:
340341
0 and 1
341342
--no_model_ir Simulate intron retention events
342343
--perfect Ignore profiles and simulate perfect reads
344+
--polya POLYA Simulate polyA tails for given list of transcripts
345+
--fastq Output fastq files instead of fasta files
343346
-t NUM_THREADS, --num_threads NUM_THREADS
344347
Number of threads for simulation (Default = 1)
345348
--uracil Converts the thymine (T) bases to uracil (U) in the
346349
output fasta format
347350
```
348351

349352

350-
\* Notice: the use of `max_len` and `min_len` in genome mode will affect the read length distributions. If the range between `max_len` and `min_len` is too small, the program will run slowlier accordingly.
353+
\* Notice: the use of `max_len` and `min_len` in genome mode will affect the read length distributions. If the range between `max_len` and `min_len` is too small, the program will run slowlier accordingly.
354+
355+
\* Notice: the transcript name in the expression tsv file and the ones in th polyadenylated transcript list has to be consistent with the ones in the reference transcripts, otherwise the tool won't recognize them and don't know where to find them to extract reads for simulation.
351356

352357
__Example runs:__
353358
1 If you want to simulate _E. coli_ genome, then circular command must be chosen because it's a circular genome
@@ -371,6 +376,9 @@ __Example runs:__
371376
7 If you want to simulate five thousands cDNA/directRNA reads from mouse reference transcriptome without modeling intron retention
372377
`./simulator.py transcriptome -rt Mus_musculus.GRCm38.cdna.all.fa -c mouse_cdna -e abundance.tsv -n 5000 --no_model_ir`
373378

379+
8 If you want to simulate two thousands cDNA/directRNA reads from human reference transcriptome with polya tails, mimicking homopolymer bias (starting from homopolymer length >= 6) and reads in fastq format
380+
`./simulator.py transcriptome -rt Homo_sapiens.GRCh38.cdna.all.fa -c Homo_sapiens_model -e abundance.tsv -rg Homo_sapiens.GRCh38.dna.primary.assembly.fa --polya transcripts_with_polya_tails --fastq -k 6 --basecaller guppy -r dRNA`
381+
374382
## Explanation of output files
375383
### 1. Characterization stage
376384
#### 1.1 Characterization stage (genome)
@@ -425,6 +433,8 @@ __Example runs:__
425433

426434
The information in the header can help users to locate the read easily.
427435

436+
__Specific to transcriptome simulation__: for reads that include retained introns, the header contains the information starting from `Retained_intron`, each genomic interval is separated by `;`.
437+
428438
2. `simulated_error_profile`
429439
Contains all the information of errors introduced into each reads, including error type, position, original bases and current bases.
430440

0 commit comments

Comments
 (0)