Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
39b6159
Updated to v2
Aug 3, 2023
bd66be8
Updates to V2 with new PacBio tool version to work with Revio data
Sep 27, 2023
e66e710
Updated path to test data
Sep 27, 2023
81cdc91
Updated path to test file in config file
Sep 27, 2023
e656b34
Included a config with reduced CPU requirements for github
Sep 28, 2023
40a6024
Included actions
Sep 28, 2023
5e1178e
Modified the CPU requirements
Sep 28, 2023
755074a
Updated conda path
Sep 28, 2023
94b3c27
Updated conda info in modules
Sep 28, 2023
a3dbf3b
Updated actions check
Sep 28, 2023
7694dad
Removed LongQC as it is dependant on local installation
Sep 28, 2023
65e14be
Updated Action test
Sep 28, 2023
11e6582
Updated CPU requirements for hifiasm
Sep 28, 2023
b3466b9
Updated memory requirement for Hifiasm test
Sep 28, 2023
30d43f6
Updated faidx module
Sep 28, 2023
7a9f20e
Update github-actions-demo.yml
scorreard Sep 28, 2023
cfce7f4
Update github-actions-demo.yml
scorreard Sep 28, 2023
699c3e8
Update github-actions-demo.yml
scorreard Sep 28, 2023
4b81f27
Update github-actions-demo.yml
scorreard Sep 28, 2023
b92d462
Update github-actions-demo.yml
scorreard Sep 28, 2023
8d4876f
Update github-actions-demo.yml
scorreard Sep 28, 2023
f68438f
Update github-actions-demo.yml
scorreard Sep 28, 2023
0bcde69
Update github-actions-demo.yml
scorreard Sep 28, 2023
8154a46
Update github-actions-demo.yml
scorreard Sep 28, 2023
cc24020
Update nextflow.config
scorreard Sep 28, 2023
fc5e36c
Update nextflow_github_test.config
scorreard Sep 28, 2023
e85ac25
Update github-actions-demo.yml
scorreard Sep 28, 2023
cff1b0d
Update README.md
scorreard Oct 3, 2023
5487ce4
Update README.md
scorreard Oct 3, 2023
92465ef
Add files via upload
scorreard Oct 3, 2023
4dfeecd
Create test.inage
scorreard Oct 3, 2023
1ad9d53
Add files via upload
scorreard Oct 3, 2023
79107ce
Delete res/test.inage
scorreard Oct 3, 2023
e00af69
Update README.md
scorreard Oct 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .github/workflows/github-actions-demo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: test CBP nextflow pipeline
run-name: ${{ github.actor }} is testing the Canafian Biogenome Project pipeline
on: [push]
jobs:
Explore-GitHub-Actions:
runs-on: ubuntu-latest
steps:
- run: echo "🎉 The job was automatically triggered by a ${{ github.event_name }} event."
- run: echo "🐧 This job is now running on a ${{ runner.os }} server hosted by GitHub!"
- run: echo "🔎 The name of your branch is ${{ github.ref }} and your repository is ${{ github.repository }}."
- name: List files in the repository
run: |
ls ${{ github.workspace }}
- uses: actions/checkout@v3
- uses: nf-core/setup-nextflow@v1
- run: nextflow run bcgsc/Canadian_Biogenome_Project -latest -r V2 -profile conda -c nextflow_github_test.config
- run: echo "🍏 This job's status is ${{ job.status }}."
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/work/*
/work
/assembly/*
/assembly
/blobtools/*
/blobtools
/hic_scaffolding/*
/hic_scaffolding
/preprocessing/*
/preprocessing
/purge_dups/*
/purge_dups
/QC/*
/QC
.n*
.git/
/V2
/V2/*
Binary file added CBP_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
42 changes: 26 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,21 @@ In short, each step of the pipeline is included in a module. Most of the modules

A lot of the modules available in this pipeline were developed by members of the nf-core/genomeassembler group, if you want to participate, feel free to join the community.

## **Table of Contents**
* **[Input data](#input-data)**
* **[Output data](#output-files)**
* **[Process](#process)**
* [Running the pipeline with test data](#running-the-pipeline-with-test-data-(will-work-once-the-repo-is-public))
* [Running the pipeline with your own data](#running-the-pipeline-with-your-own-data)
* **[Credits](#credits)**
* **[Details on the test dataset](#details-on-the-test-dataset)**



## Input data
The pipeline was developped to take as input PacBio ccs files (bam) and Hi-C files (fastq.gz). The pipeline also support the inclusion of nanopore data and short-reads for polishing.
The pipeline was developped to take as input PacBio files (bam, from Sequel II or Revio machines) and Hi-C files (fastq.gz). The pipeline also support the inclusion of nanopore data and short-reads for polishing.

The pipeline also require information related to the specie of interest such as genome size or ploidy. This information can be found on GoaT (https://goat.genomehubs.org).
The pipeline also require the specie NCBI Taxonomy ID, which can be found on GoaT (https://goat.genomehubs.org) or on NCBI.



Expand All @@ -22,40 +31,39 @@ The pipeline generates many files and intermediate files, most are self explanat


## Process
An overview of the pipeline is visible on the following subway map. Some parts of the pipeline may have been commented out in this version as they relied on locaaly installed software. The code is still available in case you also want to locally install the software and try it out.
An overview of the pipeline is visible on the following subway map. Some parts of the pipeline may have been commented out in this version as they relied on localy installed software. The code is still available in case you also want to locally install the software and try it out.

By default, the pipeline will use hifiasm with PacBio data for the assembly, and if Hi-C data is available, YAHS is used for the scaffolding.
Other assembler and scaffolder are available within the pipeline, to change, you need to edit the nextflow.config file.

Software used that would require local installation:

- LongQC

- MitoHifi
- [LongQC](https://github.com/yfukasawa/LongQC)
- [MitoHifi](https://github.com/marcelauliano/MitoHiFi)
- [Juicer](https://github.com/aidenlab/juicer)

- Juicer

Software that relies on locally downloaded files / databases :

- Busco

- Kraken
- [Busco](https://busco.ezlab.org/busco_userguide.html#download-and-automated-update)
- [Kraken](http://ccb.jhu.edu/software/kraken/)

<p align="center">
<img title="The Canadian Biogenome Project Workflow" src="https://github.com/bcgsc/Canadian_Biogenome_Project/CBP_workflow.png" width=50%>
<img title="The Canadian Biogenome Project Workflow" src="res/CBP_workflow.png" width=50%>
</p>
<p align="center">
Figure : Overview of the Canadian Biogenome project assembly pipeline
</p>


## Running the pipeline with test data (will work once the repo is public)
To run this pipeline, you need nextflow, conda and singularity installed on your system.

## Running the pipeline with test data
To run this pipeline, you need nextflow and conda or singularity installed on your system.

A set of test data are available in this repo to allow you to test the pipeline with just one command line:

```
nextflow run bcgsc/Canadian_Biogenome_Project -latest -r dev
nextflow run bcgsc/Canadian_Biogenome_Project -latest -r V2 -profile conda
```

The outputs are organized in several subfolder that are self-explenatory.
Expand Down Expand Up @@ -87,16 +95,18 @@ nextflow run main.nf -profile singularity

## Credits

The pipeline was originnally written by @scorreard with the help and input from :
The pipeline was originnally written by [@scorreard](https://github.com/scorreard) with the help and input from :

- Members of the Jones lab (Canada's Michael Smith Genome Sciences Centre, Vancouver, Canada).

- Members of the Earth Biogenome Project and other affiliated projects.
- Special thanks to [@Glenn Chang](https://github.com/Glenn032787) for reviewing that repo.

- Members of the Earth Biogenome Project and other affiliated projects.
- Members of the nf-core / nextflow community.




## Details on the test dataset

The PacBio data is a subset of covid ssequences obtained with this command lines :
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading