Skip to content

Commit 0f9ca2f

Browse files
authored
Merge branch 'dev' into nf-core-template-merge-3.4.1
2 parents 6c9ea55 + 32a583b commit 0f9ca2f

318 files changed

Lines changed: 95528 additions & 624 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/nf-test.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,16 @@
11
name: Run nf-test
22
on:
3+
push:
4+
branches:
5+
- dev
6+
- master
7+
- main
8+
paths-ignore:
9+
- "docs/**"
10+
- "**/meta.yml"
11+
- "**/*.md"
12+
- "**/*.png"
13+
- "**/*.svg"
314
pull_request:
415
paths-ignore:
516
- "docs/**"

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@ testing/
77
testing*
88
*.pyc
99
null/
10+
.nf-test/
11+
.nf-test.log

.prettierignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,4 @@ testing*
1212
bin/
1313
.nf-test/
1414
ro-crate-metadata.json
15+
modules/local/report/create/app/dependencies/*

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6-
## v0.0.1dev - [date]
6+
## v0.0.1dev - [2024-10-17]
77

88
Initial release of nf-core/tfactivity, created with the [nf-core](https://nf-co.re/) template.
99

CITATIONS.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,50 @@
1010
1111
## Pipeline tools
1212

13+
- [DeSeq2](https://doi.org/10.1186/s13059-014-0550-8)
14+
15+
> Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).
16+
17+
- [STARE](https://doi.org/10.1093/bioinformatics/btad062)
18+
19+
> Dennis Hecker, Fatemeh Behjati Ardakani, Alexander Karollus, Julien Gagneur, Marcel H Schulz, The adapted Activity-By-Contact model for enhancer–gene assignment and its application to single-cell data, Bioinformatics, Volume 39, Issue 2, February 2023
20+
21+
- [Bedtools](https://doi.org/10.1093%2Fbioinformatics%2Fbtq033)
22+
23+
> Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PMID: 20110278; PMCID: PMC2832824.
24+
25+
- [GTFtools](https://doi.org/10.1093/bioinformatics/btac561)
26+
27+
> Hong-Dong Li, Cui-Xiang Lin, Jiantao Zheng, GTFtools: a software package for analyzing various features of gene models, Bioinformatics, Volume 38, Issue 20, 15 October 2022, Pages 4806–4808,
28+
29+
- [ChromHMM](https://doi.org/10.1038/nprot.2017.124)
30+
31+
> Ernst, J., Kellis, M. Chromatin-state discovery and genome annotation with ChromHMM. Nat Protoc 12, 2478–2492 (2017)
32+
33+
- [DYNAMITE](https://doi.org/10.1093/bioinformatics/bty856)
34+
35+
> Florian Schmidt, Fabian Kern, Peter Ebert, Nina Baumgarten, Marcel H Schulz, TEPIC 2—an extended framework for transcription factor binding prediction and integrative epigenomic analysis, Bioinformatics, Volume 35, Issue 9, May 2019, Pages 1608–160
36+
37+
- [Biopython](https://doi.org/10.1093/bioinformatics/btp163)
38+
39+
> Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon, Biopython: freely available Python tools for computational molecular biology and bioinformatics, Bioinformatics, Volume 25, Issue 11, June 2009, Pages 1422–1423
40+
41+
- [FIMO](https://doi.org/10.1093/bioinformatics/btr064)
42+
43+
> Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011 Apr 1;27(7):1017-8
44+
45+
- [JASPAR](https://doi.org/10.1093/nar/gkad1059)
46+
47+
> Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon JA, Ferenc K, Kumar V, Lemma RB, Lucas J, Chèneby J, Baranasic D, Khan A, Fornes O, Gundersen S, Johansen M, Hovig E, Lenhard B, Sandelin A, Wasserman WW, Parcy F, Mathelier A JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles Nucleic Acids Res. 2024 Jan 5;52(D1):D174-D182
48+
49+
- [universalmotif](https://doi.org/10.21105/joss.07012)
50+
51+
> Tremblay, B. J., (2024). universalmotif: An R package for biological motif analysis. Journal of Open Source Software, 9(100), 701
52+
53+
- [SNEEP](https://doi.org/10.1016/j.isci.2024.109765)
54+
55+
> Baumgarten N, Ebert P, Schmidt F, Kern F, Schulz MH. A statistical approach for identifying single nucleotide variants that affect transcription factor binding. iScience, Volume 27, Issue 5, 109765
56+
1357
## Software packaging/containerisation tools
1458

1559
- [Anaconda](https://anaconda.com)

README.md

Lines changed: 67 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -21,47 +21,83 @@
2121

2222
## Introduction
2323

24-
**nf-core/tfactivity** is a bioinformatics pipeline that ...
24+
**nf-core/tfactivity** is a bioinformatics pipeline that can identify the most differentially active transcription factors (TFs) between multiple conditions. It takes a count matrix and open chromatin data (ATAC-seq, DNase-seq, HM-ChIP-seq) as input. It produces a ranking of transcription factors.
2525

26-
<!-- TODO nf-core:
27-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
28-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
29-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
30-
-->
26+
![Metro map](docs/images/metromap.png)
3127

32-
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
33-
workflows use the "tube map" design for that. See https://nf-co.re/docs/guidelines/graphic_design/workflow_diagrams#examples for examples. -->
34-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
28+
1. Identify accessible regions (can perform footprinting between close ChIP-seq peaks or take ATAC-seq peaks)
29+
2. Calculate affinity scores for combinations of transcription factors and target genes (TGs) using [STARE](https://doi.org/10.1093/bioinformatics/btad062)
30+
3. Identify differentially expressed genes between conditions
31+
4. Utilize linear regression to identify the transcription factors that are most likely to be responsible for the differential gene expression
32+
5. Calculate the TF-TG score based on:
33+
1. Differential expression of the target genes
34+
2. Affinity of the transcription factors to the target genes
35+
3. The regression coefficient of the transcription factors
36+
6. Perform a Mann-Whitney U test and create a ranking of the transcription factors
37+
38+
A more biological visualization of the workflow can be found here:
39+
40+
> [!NOTE]
41+
> The following image was created for the TF-Prioritizer publication. Parts of the workflow have been adapted for the nf-core pipeline, but the general idea is still valid.
42+
43+
![TF-Prioritizer workflow](docs/images/tfprio.jpeg)
3544

3645
## Usage
3746

3847
> [!NOTE]
3948
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
4049
41-
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
42-
Explain what rows and columns represent. For instance (please edit as appropriate):
50+
The pipeline supports processing of previously called peaks from ATAC-seq, DNase-seq, or histone modification ChIP-seq data. The peaks can then either be used as-is or be subjected to footprinting analysis. Additionally, BAM files can be provided in a separate samplesheet, which will be used to predict enhancer regions.
4351

44-
First, prepare a samplesheet with your input data that looks as follows:
52+
```csv title="samplesheet.csv"
53+
sample,condition,assay,peak_file
54+
condition1_H3K27ac_1,condition1,H3K27ac,condition1_H3K27ac_1.broadPeak
55+
condition1_H3K27ac_2,condition1,H3K27ac,condition1_H3K27ac_2.broadPeak
56+
condition1_H3K4me3,condition1,H3K4me3,condition1_H3K4me3.broadPeak
57+
condition2_H3K27ac,condition2,H3K27ac,condition2_H3K27ac.broadPeak
58+
condition3_H3K27ac,condition3,H3K27ac,condition3_H3K27ac.broadPeak
59+
condition3_H3K4me3,condition3,H3K4me3,condition3_H3K4me3.broadPeak
60+
```
4561

46-
`samplesheet.csv`:
62+
Each row represents a peak file. The `sample` column should contain a unique identifier for each peak file. The `peak_file` column should contain the path to the peak file. Peak files need to be in a format that is compatible with the `bed` format. Only the first three columns of the `bed` format are used.
4763

48-
```csv
49-
sample,fastq_1,fastq_2
50-
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
64+
```csv title="samplesheet_bam.csv"
65+
sample,condition,assay,signal,control
66+
condition1_H3K27ac_1,condition1,H3K27ac,condition1_H3K27ac_1.bam,condition1_control.bam
67+
condition1_H3K27ac_2,condition1,H3K27ac,condition1_H3K27ac_2.bam,condition1_control.bam
68+
condition1_H3K4me3,condition1,H3K4me3,condition1_H3K4me3.bam,condition1_control.bam
69+
condition2_H3K27ac,condition2,H3K27ac,condition2_H3K27ac.bam,condition2_control.bam
70+
condition3_H3K27ac,condition3,H3K27ac,condition3_H3K27ac.bam,condition3_control.bam
71+
condition3_H3K4me3,condition3,H3K4me3,condition3_H3K4me3.bam,condition3_control.bam
5172
```
5273

53-
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
74+
The first three columns are the same as in the peak file samplesheet. The `signal` column should contain the path to the signal BAM file. The `control` column should contain the path to the control BAM file.
5475

55-
-->
76+
Second, you need a raw count matrix (e.g. from [nf-core/rnaseq](https://nf-co.re/rnaseq)) with gene IDs as rows and samples as columns. You also need a design matrix that specifies the conditions of the samples in the count matrix. The design matrix should look as follows:
5677

57-
Now, you can run the pipeline using:
78+
```csv title="design_matrix.csv"
79+
sample,condition
80+
sample1,condition1
81+
sample2,condition1
82+
sample3,condition2
83+
sample4,condition3
84+
```
85+
86+
The `sample` column should match the columns in the expression matrix. The `condition` column is needs to match the `condition` column in the samplesheet. Additionally,batches can be added to the design matrix and will be considered in the differential expression analysis.
5887

59-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
88+
:::tip
89+
There is an alternative way of providing expression values. Instead of providing a single count matrix for all samples, you can provide a gene list and one count file per sample. Details can be found in the [usage documentation](https://nf-co.re/tfactivity/usage).
90+
:::
91+
92+
Now, you can run the pipeline using:
6093

6194
```bash
6295
nextflow run nf-core/tfactivity \
6396
-profile <docker/singularity/.../institute> \
6497
--input samplesheet.csv \
98+
--genome GRCh38 \
99+
--counts <EXPRESSION_MATRIX> \
100+
--counts_design design_matrix.csv \
65101
--outdir <OUTDIR>
66102
```
67103

@@ -78,11 +114,12 @@ For more details about the output files and reports, please refer to the
78114

79115
## Credits
80116

81-
nf-core/tfactivity was originally written by Nico Trummer.
117+
nf-core/tfactivity was originally written by [Nico Trummer](https://github.com/nictru).
82118

83119
We thank the following people for their extensive assistance in the development of this pipeline:
84120

85-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
121+
- [Markus Hoffmann](https://scholar.google.com/citations?user=_qXUS28AAAAJ) (project and scientific management)
122+
- [Leon Hafner](https://www.linkedin.com/in/leon-hafner/) (implementations)
86123

87124
## Contributions and Support
88125

@@ -92,11 +129,17 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
92129

93130
## Citations
94131

132+
nf-core/tfactivity is based on the previously published "TF-Prioritizer" pipeline. As long as there is no dedicated nf-core/tfactivity publication, please cite the following paper:
133+
134+
> **TF-Prioritizer: a Java pipeline to prioritize condition-specific transcription factors**
135+
>
136+
> Markus Hoffmann, Nico Trummer, Leon Schwartz, Jakub Jankowski, Hye Kyung Lee, Lina-Liv Willruth, Olga Lazareva, Kevin Yuan, Nina Baumgarten, Florian Schmidt, Jan Baumbach, Marcel H Schulz, David B Blumenthal, Lothar Hennighausen & Markus List
137+
>
138+
> GigaScience, Volume 12, 2023, giad026, https://doi.org/10.1093/gigascience/giad026
139+
95140
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
96141
<!-- If you use nf-core/tfactivity for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
97142

98-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
99-
100143
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
101144

102145
You can cite the `nf-core` publication as follows:

0 commit comments

Comments
 (0)