You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NanoSim is a fast and scalable read simulator that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of nanopore sequencing technology.
--perfect Ignore profiles and simulate perfect reads
344
+
--polya POLYA Simulate polyA tails for given list of transcripts
345
+
--fastq Output fastq files instead of fasta files
343
346
-t NUM_THREADS, --num_threads NUM_THREADS
344
347
Number of threads for simulation (Default = 1)
345
348
--uracil Converts the thymine (T) bases to uracil (U) in the
346
349
output fasta format
347
350
```
348
351
349
352
350
-
\* Notice: the use of `max_len` and `min_len` in genome mode will affect the read length distributions. If the range between `max_len` and `min_len` is too small, the program will run slowlier accordingly.
353
+
\* Notice: the use of `max_len` and `min_len` in genome mode will affect the read length distributions. If the range between `max_len` and `min_len` is too small, the program will run slowlier accordingly.
354
+
355
+
\* Notice: the transcript name in the expression tsv file and the ones in th polyadenylated transcript list has to be consistent with the ones in the reference transcripts, otherwise the tool won't recognize them and don't know where to find them to extract reads for simulation.
351
356
352
357
__Example runs:__
353
358
1 If you want to simulate _E. coli_ genome, then circular command must be chosen because it's a circular genome
@@ -371,6 +376,9 @@ __Example runs:__
371
376
7 If you want to simulate five thousands cDNA/directRNA reads from mouse reference transcriptome without modeling intron retention
8 If you want to simulate two thousands cDNA/directRNA reads from human reference transcriptome with polya tails, mimicking homopolymer bias (starting from homopolymer length >= 6) and reads in fastq format
The information in the header can help users to locate the read easily.
427
435
436
+
__Specific to transcriptome simulation__: for reads that include retained introns, the header contains the information starting from `Retained_intron`, each genomic interval is separated by `;`.
437
+
428
438
2.`simulated_error_profile`
429
439
Contains all the information of errors introduced into each reads, including error type, position, original bases and current bases.
0 commit comments