GitHub - bilille/miRkwood

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 2,748 Commits
R		R
cgi-bin		cgi-bin
docker		docker
galaxy/miRkwood		galaxy/miRkwood
html		html
provisioning		provisioning
resources		resources
tests-browser		tests-browser
.dockerignore		.dockerignore
LICENCE		LICENCE
README.txt		README.txt
Vagrantfile		Vagrantfile
install-dependencies.sh		install-dependencies.sh
miRkwood_installation.md		miRkwood_installation.md

Repository files navigation

SYNOPSIS

miRkwood is an application that allows for the fast and easy identification of microRNAs. It is specifically designed for plant microRNAs.

INSTALL

See file miRkwood_installation.md.

USAGE

miRkwood comes in two distinct pipelines, according to the input data type.

-mirkwood.pl (abinitio pipeline): scans a genomic sequence and finds all potential microRNA precursors.
Input: a FASTA file.

-mirkwood-bed.pl (smallRNAseq pipeline): analyses small RNA deep sequencing data and find all potential microRNAs.
Input : a BED file.

OPTIONS

-mirkwood.pl: perl -I/{miRkwood_path}/cgi-bin/lib/ mirkwood.pl [options]
Mandatory options:
--input
Path to the fasta file.

--output
Output directory. If non existing it will be created. The directory
must be empty.

Additional options:
--both-strands
Scan both strands.

--species-mask
Mask coding regions against the given organism.

--shuffles
Compute thermodynamic stability (shuffled sequences).

--filter-mfei
Select only sequences with MFEI < -0.6.

--filter-rrna
Filter out ribosomal RNAs (using RNAmmer).

--filter-trna
Filter out tRNAs (using tRNAscan-SE).

--align
Flag conserved mature miRNAs (alignment with miRBase + miRdup).

--varna
Allow the structure generation using Varna.

--help
Print a brief help message and exits.

--man
Prints the manual page and exits.

-mirkwood-bed.pl: perl -I/{miRkwood_path}/cgi-bin/lib/ mirkwood-bed.pl [options]
Mandatory options:
--input
Path to the BED file (created with our script mirkwood-bam2bed.pl).

--genome
Path to the genome (fasta format).

--output
Output directory. If non existing it will be created. The directory
must be empty.

Additional options:
--shuffles
Compute thermodynamic stability (shuffled sequences).

--align
Flag conserved mature miRNAs (alignment with miRBase + miRdup).

--no-filter-mfei
Don't filter out sequences with MFEI >= -0.6. Default : only keep
sequences with MFEI < -0.6.

--mirbase
If you have a gff file containing known miRNAs for this assembly,
use this option to give the path to this file.

--gff
List of annotation files (gff or gff3 format). Reads matching with
an element of these files will be filtered out. For instance you can
filter out CDS by providing a suitable GFF file.

--no-filter-bad-hairpins
By default the candidates with a quality score of 0 and no
conservation are discarded from results and are stored in a BED
file. Use this option to keep all results.

--min-read-positions-nb
Minimum number of positions for each read to be kept. Default : 0.

--max-read-positions-nb
Maximum number of positions for each read to be kept. Default : 5
(reads that map at more than 5 positions are filtered out).

--varna
Allow the structure generation using Varna.

--help
Print a brief help message and exits.

--man
Prints the manual page and exits.

OUTPUT

For both pipelines:

alignments : folder containing all alignments files
(only if option --align is on).

images: folder containing images created by VARNA
(only if option --varna is on).

results: folder containing all results files, in several
formats (csv, fa, gff, html and txt).

sequences: folder containing sequences for each candidate
in fasta and dotbracket format, alternatives sequences
if they exist and optimal structure if it is different
from the stemloop structure.

YML: folder containing all candidates data in YAML format.

basic_candidates.yml: contains a summary of all candidates
with basic informations (this file is needed to create
the results files).

log.log: log file (hey, what did you expect?)

run_options.cfg: config file with the chosen options.

ab initio pipeline only:

masks: folder containing results of BlastX, rnammer and tRNAscan-SE.

input_sequences.fas: your sequences.

smallRNAseq pipeline only:

read_clouds: folder containing all text files for the candidates
read clouds.

bed_sizes.txt: tabulated file with the number of reads in each BED file.

summary.txt: contains a summary of your options and of results.

Depending on the options you chose for your job you may find
some of the following files:

your_bed_your_GFF.tar.gz: a compressed BED containing all reads matching
to features from your GFF file, for each GFF file that you
provided.

your_bed_multimapped.tar.gz: a compressed BED containing all reads from your
input BED file mapping at less than --min-read-positions-nb positions
or more than --max-read-positions-nb positions.

your_bed_miRNAs.tar.gz: a compressed BED containing all reads from your
input BED file corresponding to miRNAs present in miRBase.

your_bed_orphan_clusters.tar.gz: a compressed BED containing all reads from your
input BED file that fall into a peak but that don't correspond to
a valid miRNA candidate.

your_bed_orphan_hairpins.tar.gz: a compressed BED containing all candidates
with a quality score of 0 and no conservation. By default
these candidates are excluded from final results, but you can
change this behaviour with flag option --no-filter-bad-hairpins.

your_bed_filtered.bed: a BED containing all reads from your
input BED file that have not been filtered out in one of the
previous categories.