Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,18 @@

![](./assets/pipeline_light.svg)

Supported profilers:
Supported profilers and current status:

1. [**HUMANn v3**](https://huttenhower.sph.harvard.edu/humann/) — functional profiling via MetaPhlAn + HUMANn 3 (`--run_humann_v3`)
2. [**HUMANn v4**](https://huttenhower.sph.harvard.edu/humann/) — functional profiling via MetaPhlAn + HUMANn 4 (`--run_humann_v4`)
3. [**FMH FunProfiler**](https://github.com/dib-lab/fmh_funprofiler) — sketch-based functional profiling (`--run_fmhfunprofiler`)
4. [**RGI**](https://github.com/arpcard/rgi) — antimicrobial resistance gene identification (`--run_rgi`, available)
5. [**mifaser**](https://bromberglab.org/project/mifaser/) — functional profiling via mifaser (`--run_mifaser`, available)
6. [**DIAMOND**](https://github.com/bbuchfink/diamond) — alignment with DIAMOND blastx (`--run_diamond`, available)
7. [**eggNOG-mapper**](https://academic.oup.com/mbe/article/38/12/5825/6379734) — functional annotation, orthology assignments and domain prediction (`--run_eggnogmapper`, available)
6. [**DIAMOND**](https://github.com/bbuchfink/diamond) — alignment with DIAMOND blastx (`--run_diamond`, work in progress / beta)
7. [**eggNOG-mapper**](https://academic.oup.com/mbe/article/38/12/5825/6379734) — functional annotation, orthology assignments and domain prediction (`--run_eggnogmapper`, work in progress / beta)

> [!WARNING]
> DIAMOND and eggNOG-mapper support is currently in beta and should be treated as work in progress. These modules are still being validated in the full pipeline, including database handling, output behavior, and downstream reporting. Use them with caution, expect potential issues, and independently review results before using them for production analyses or interpretation.

## Usage

Expand Down
10 changes: 8 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ The pipeline processes data using the following steps:
- [HUMANn v3 / v4](#humann-v3--v4) - Functional profiling via MetaPhlAn + HUMANn
- [FMH FunProfiler](#fmh-funprofiler) - Sketch-based functional profiling
- [mifaser](#mifaser) - Read-level functional profiling
- [DIAMOND blastx](#diamond-blastx) - Translated alignment against a protein database
- [EggNOG-mapper](#eggnog-mapper) - Functional annotation via orthology assignment
- [DIAMOND blastx](#diamond-blastx) - Translated alignment against a protein database (work in progress / beta)
- [EggNOG-mapper](#eggnog-mapper) - Functional annotation via orthology assignment (work in progress / beta)
- [RGI BWT](#rgi-bwt) - Antimicrobial resistance gene identification
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
Expand Down Expand Up @@ -80,6 +80,9 @@ Enabled with `--run_mifaser`. Maps reads to functional databases at the protein

Enabled with `--run_diamond`. Performs fast translated alignment of metagenomic reads against a protein reference database. Each read is aligned in all six reading frames and only significant hits are reported.

> [!WARNING]
> DIAMOND support is currently in beta and should be treated as work in progress. The module is still being validated in the full pipeline, including database handling, output behavior, and downstream reporting. Use with caution and independently review results before production use or interpretation.

<details markdown="1">
<summary>Output files</summary>

Expand All @@ -97,6 +100,9 @@ Requires a pre-built `.dmnd` database (see [usage docs](usage.md#diamond-blastx)

Enabled with `--run_eggnogmapper`. Assigns functional annotations to sequences by mapping them to orthologous groups in the EggNOG database.

> [!WARNING]
> EggNOG-mapper support is currently in beta and should be treated as work in progress. The module is still being validated in the full pipeline, including database handling, output behavior, and downstream reporting. Use with caution and independently review results before production use or interpretation.

<details markdown="1">
<summary>Output files</summary>

Expand Down
55 changes: 34 additions & 21 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,23 @@ SAMPLE3,RUN1,OXFORD_NANOPORE,/data/sample3_nanopore.fastq.gz,,

In this example, `SAMPLE1` has two runs which will be merged before profiling. `SAMPLE2` is single-end short reads. `SAMPLE3` is Oxford Nanopore long reads.

## Enabling profilers

At least one profiler must be enabled via command-line flags. The pipeline will only run the profilers you explicitly turn on:

| Flag | Profiler | Status |
| ---------------------- | --------------- | ----------------------- |
| `--run_humann_v3` | HUMANn v3 | Available |
| `--run_humann_v4` | HUMANn v4 | Available |
| `--run_fmhfunprofiler` | FMH FunProfiler | Available |
| `--run_mifaser` | mifaser | Available |
| `--run_diamond` | diamond | Work in progress / beta |
| `--run_eggnogmapper` | EggNOG-mapper | Work in progress / beta |
| `--run_rgi` | RGI BWT | Available |

> [!IMPORTANT]
> Each `--run_` flag requires a matching database entry in the `--databases` CSV. Database rows for tools that are not enabled will be ignored.

## Databases input

```bash
Expand All @@ -49,6 +66,8 @@ In this example, `SAMPLE1` has two runs which will be merged before profiling. `

The databases sheet is a comma-separated file that specifies which databases to use for each profiler. Only tools enabled via `--run_<tool>` flags will use the corresponding database entries.

Use the `db_name` column to record the database release or version used for the run, for example `uniref90_v3`, `eggnog_v5`, `card_v3`, or `GS-24-all`.

| Column | Required | Description |
| ----------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `tool` | Yes | Profiler name. Must be one of: `humann_v3`, `humann_v4`, `fmhfunprofiler`, `mifaser`, `diamond`, `rgi`, `eggnogmapper`. |
Expand All @@ -60,7 +79,7 @@ The databases sheet is a comma-separated file that specifies which databases to

### HUMANn databases

HUMANn requires four database components per named database, each as a separate row with the same `db_name`:
HUMANn requires four database components per named database, each as a separate row with the same `db_name`. The example below uses a HUMANn v3-compatible UniRef90 database set; replace `uniref90_v3` with the exact release or version used in your analysis.

```csv
tool,db_name,db_entity,db_params,db_type,db_path
Expand All @@ -72,7 +91,7 @@ humann_v3,uniref90_v3,humann_utility,,,/data/databases/utility_mapping

### FMH FunProfiler databases

FMH FunProfiler requires a single sketch database:
FMH FunProfiler requires a single sketch database. The example below uses a KEGG-derived sketch database labeled `kegg_v1`; replace this with the exact sketch/database version used in your analysis.

```csv
tool,db_name,db_entity,db_params,db_type,db_path
Expand All @@ -81,7 +100,10 @@ fmhfunprofiler,kegg_v1,,,short;long,/data/databases/fmhfunprofiler_kegg.sig.zip

### EggNOG-mapper databases

EggNOG-mapper requires two database entries per named database: the search database and the EggNOG data directory. The `db_params` field of the `eggnogmapper_db` row must specify the search mode (e.g. `diamond`, `mmseqs`, `hmmer`).
EggNOG-mapper requires two database entries per named database: the search database and the EggNOG data directory. The `db_params` field of the `eggnogmapper_db` row must specify the search mode (e.g. `diamond`, `mmseqs`, `hmmer`). The example below uses an EggNOG v5 database label; replace `eggnog_v5` with the exact EggNOG database release used in your analysis.

> [!WARNING]
> EggNOG-mapper support is currently in beta and should be treated as work in progress. Database handling, output behavior, and downstream reporting are still being validated in the full pipeline, so use with caution and independently review results before production use or interpretation.

```csv
tool,db_name,db_entity,db_params,db_type,db_path
Expand All @@ -97,15 +119,17 @@ eggnogmapper,eggnog_v5,eggnogmapper_data_dir,,,/data/databases/eggnog_mapper/dat

#### Database preparation

Download a pre-built mifaser database (e.g. GS-21 or GS-580) from the [mifaser website](https://bromberglab.org/project/mifaser/). The `db_path` should point to the directory containing the database files.
Download a pre-built mifaser database (e.g. GS-21, GS-24-all, or GS-580) from the [mifaser website](https://bromberglab.org/project/mifaser/). The `db_path` should point to the directory containing the database files and `db_name` should record the downloaded database version.

```csv
tool,db_name,db_entity,db_params,db_type,db_path
mifaser,gs21,,,short,/data/databases/mifaser/GS-21
mifaser,GS-24-all,,,short,/data/databases/mifaser/GS-24-all
```

### Full example databases sheet

This example uses versioned database names to make the database releases traceable in the run outputs. Replace these names and paths with the exact database releases you downloaded.

```csv
tool,db_name,db_entity,db_params,db_type,db_path
humann_v3,uniref90_v3,humann_metaphlan,,,/data/databases/metaphlan_db
Expand All @@ -125,7 +149,7 @@ fmhfunprofiler,kegg_v1,,,short;long,/data/databases/fmhfunprofiler_kegg.sig.zip

#### Database preparation

Download the CARD database and extract it to a directory:
Download the CARD database and extract it to a directory. The example CSV below labels the database as `card_v3`; replace this with the exact CARD release used in your analysis.

```bash
wget https://card.mcmaster.ca/latest/data
Expand All @@ -147,9 +171,12 @@ rgi,card_v3,,,,/data/databases/card

[DIAMOND](https://github.com/bbuchfink/diamond/wiki/) is a high-throughput sequence aligner for translated (nucleotide-vs-protein) alignment. Enable it with `--run_diamond`.

> [!WARNING]
> DIAMOND support is currently in beta and should be treated as work in progress. Database handling, output behavior, and downstream reporting are still being validated in the full pipeline, so use with caution and independently review results before production use or interpretation.

#### Database preparation

The database supplied in the `--databases` CSV must already be in DIAMOND binary format (`.dmnd`). Build it from a protein FASTA using `diamond makedb`:
The database supplied in the `--databases` CSV must already be in DIAMOND binary format (`.dmnd`). Build it from a versioned protein FASTA using `diamond makedb`, and use `db_name` to record the source database and release.

```bash
diamond makedb --in proteins.faa --db proteins
Expand Down Expand Up @@ -186,20 +213,6 @@ work # Directory containing the nextflow working files
# Other nextflow hidden files, eg. history of pipeline runs and old logs.
```

### Enabling profilers

At least one profiler must be enabled via command-line flags. The pipeline will only run the profilers you explicitly turn on:

| Flag | Profiler | Status |
| ---------------------- | --------------- | --------- |
| `--run_humann_v3` | HUMANn v3 | Available |
| `--run_humann_v4` | HUMANn v4 | Available |
| `--run_fmhfunprofiler` | FMH FunProfiler | Available |
| `--run_mifaser` | mifaser | Available |
| `--run_diamond` | diamond | Available |
| `--run_eggnogmapper` | EggNOG-mapper | Available |
| `--run_rgi` | RGI BWT | Available |

### Parameters

If you wish to repeatedly use the same parameters for multiple runs, rather than specifying each flag in the command, you can specify these in a params file.
Expand Down
Loading