Example Pipeline

In our example analysis, we investigate the differences between the microbiome of 20 rural and 20 recently urbanized subjects from the Chinese province of Hunan. For more information on this dataset, please review the analysis Fodor Lab published in the Sep 2017 issue of the journal Microbiome: https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-017-0338-7

Step 1: Prepare BioLockJ Config File

The BioLockJ project Config chinaKrakenFullDB.properties lists 5 BioModules to run (lines 3-7) + 13 properties:

  #BioModule biolockj.module.implicit.RegisterNumReads
  #BioModule biolockj.module.classifier.wgs.KrakenClassifier
  #BioModule biolockj.module.report.taxa.NormalizeTaxaTables
  #BioModule biolockj.module.report.r.R_PlotPvalHistograms
  #BioModule biolockj.module.report.r.R_PlotOtus

In addition to the 5 listed BioModules, 4 additional implicit BioModules will also run:

Mod#	Module	Description
1	ImportMetadata	Always run 1st (for all pipelines)
2	KrakenParser	Always run after KrakenClassifier
3	AddMetadataToOtuTables	Always run just before the 1st R module
4	CalculateStats	Always run as the 1st R module.

Key properties:

Line#	Property	Description
08	cluster.jobHeader	Each script will run on 1 node, 16 cores, and 128GB RAM for up to 30 minutes
10	pipeline.defaultProps	Default config file defines most properties – in this case copperhead.properties
12	input.dirPaths	Directory path containing 40 gzipped whole genome sequencing (WGS) fastq files
18	metadata.filePath	Metadata file path: chinaMetadata.tsv

BioLockJ must associate sequence files in input.dirPaths with the correct metadata row. This is done by matching sequence file names to the 1st column in the metadata file. If the Sample ID is not found in your file names, the file names must be updated. Use the following properties to ignore a file prefix or suffix when matching the sample IDs.

input.suffixFw
input.suffixRv
input.trimPrefix
input.trimSuffix

Sample IDs from 1st column of the metadata file: 081A, 082A, 083A...etc.
Sequence file names: 081A_R1.fq.gz, 082A_R1.fq.gz, 083A_R1.fq.gz...etc.

The default Config file, copperhead.properties, has its own default Config file standard.properties which defines the property input.suffixFw=_R1. As a result, all characters starting with (and including) “_R1” are ignored when matching the file name to the metadata sample ID.

Step 2: Run BioLockJ Pipeline

  > biolockj ~/chinaKrakenFullDB.properties

Look in the BioLockJ pipeline output directory defined by $BLJ_PROJ for a new pipeline directory named after the property file + today’s date: ~/projects/chinaKrakenFullDB_2018Apr09
The 5 configured modules have run in order, with the addition of 2 implicit modules (1st and last) which are added to all pipelines automatically.
The biolockjComplete file indicates the pipeline ran successfully.

Step 3: Review Pipeline Summary

Run the blj_summary command to review the pipeline execution summary.
```
> blj_summary
```
Pipeline Summary

Step 4: Download R Reports

Run the blj_download command to get the command needed to download the analysis.
```
> blj_download
> rsync
```

Step 5: Analyze R Reports

Open downloadDir on your local filesystem to review the analysis. This directory contains:

Output	Description
/temp	Directory where R log files are saved if R script runs locally.
/tables	Directory containing the OTU tables.
/local	Directory where R script output is saved if R script runs locally and r.debug=Y.
*.RData	The saved R sessions for R modules run if r.saveRData=Y.
chinaKrakenFullDB.log	The pipeline Java log file.
MAIN_*.R	Each R script for each module that generated reports has been updated to run on your local filesystem.
*.tsv files	Spreadsheets containing p-value and R^2 statistics for each OTU in the taxonomy level.
*.pdf files	P-value histograms, and bar-charts or scatterplots for each OTU in the taxonomy level.

Each R module generates a report for each report.taxonomyLevel configured:

Open chinaKrakenFullDB_Log10_genus.pdf

The report begins with the unadjusted P-Value Distributions:
Since r.numHistogramBreaks=20 so the 1st bar represents the p-values < 0.05. The ruralUrban attribute appears significant, as indicated by the high number p-values < 0.05.
For each OTU, a bar-chart or scatterplot is output with adjusted parametric and non-parametric p-values formatted using in the plot header.
The p-value format is defined by r.pValFormat.
The p-adjust method is defined by rStats.pAdjustMethod.
P-values that meet the r.pvalCutoff threshold are highlighted with r.colorHighlight.

no text logo BioLockJ: data-wrangling done right.

User Guide

Getting Started
Dependencies
Installation
Configuration
Commands
Example Pipeline
Failure Recovery
Validation
Building Modules
API
FAQ

BioModules

Sequence Processing Modules
AwkFastaConverter
Gunzipper
KneadDataSanitizer
Multiplexer
PearMergeReads
RarefySeqs
SeqFileValidator
TrimPrimers

Classifier Modules
for whole genome sequences
Humann2Classifier
KrakenClassifier
Kraken2Classifier
Metaphlan2Classifier
for 16S sequences
QiimeClosedRefClassifier
QiimeDeNovoClassifier
QiimeOpenRefClassifier
RdpClassifier

Report Modules
general
Email
JsonReport
for otu tables
CompileOtuCounts
RarefyOtuCounts
RemoveLowOtuCounts
RemoveScarceOtuCounts
for taxa tables
AddMetadataToOtuTables
BuildTaxaTables
LogTransformTaxaTables
NormalizeTaxaTables
for pathway tables
AddMetadataToPathwayTables
RemoveLowPathwayCounts
RemoveScarcePathwayCounts
for statistics and visualization
R_CalculateStats
R_PlotEffectSize
R_PlotMds
R_PlotOtus
R_PlotPvalHistograms

DIY
GenMod

Implicit Modules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example Pipeline

Step 1: Prepare BioLockJ Config File

Step 2: Run BioLockJ Pipeline

Step 3: Review Pipeline Summary

Step 4: Download R Reports

Step 5: Analyze R Reports

Open chinaKrakenFullDB_Log10_genus.pdf

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

User Guide

BioModules

Clone this wiki locally