nf-core
diff --git a/‎.github/workflows/nf-test.yml‎
Lines changed: 11 additions & 0 deletions b/‎.github/workflows/nf-test.yml‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎.prettierignore‎
Lines changed: 1 addition & 0 deletions b/‎.prettierignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 1 addition & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎CITATIONS.md‎
Lines changed: 44 additions & 0 deletions b/‎CITATIONS.md‎
Lines changed: 44 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 67 additions & 24 deletions b/‎README.md‎
Lines changed: 67 additions & 24 deletions
@@ -1,5 +1,16 @@
 name: Run nf-test
 on:
+  push:
+    branches:
+      - dev
+      - master
+      - main
+    paths-ignore:
+      - "docs/**"
+      - "**/meta.yml"
+      - "**/*.md"
+      - "**/*.png"
+      - "**/*.svg"
   pull_request:
     paths-ignore:
       - "docs/**"
 
@@ -7,3 +7,5 @@ testing/
 testing*
 *.pyc
 null/
+.nf-test/
+.nf-test.log
@@ -12,3 +12,4 @@ testing*
 bin/
 .nf-test/
 ro-crate-metadata.json
+modules/local/report/create/app/dependencies/*
@@ -3,7 +3,7 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v0.0.1dev - [date]
+## v0.0.1dev - [2024-10-17]
 
 Initial release of nf-core/tfactivity, created with the [nf-core](https://nf-co.re/) template.
 
 
@@ -10,6 +10,50 @@
 
 ## Pipeline tools
 
+- [DeSeq2](https://doi.org/10.1186/s13059-014-0550-8)
+
+  > Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014).
+
+- [STARE](https://doi.org/10.1093/bioinformatics/btad062)
+
+  > Dennis Hecker, Fatemeh Behjati Ardakani, Alexander Karollus, Julien Gagneur, Marcel H Schulz, The adapted Activity-By-Contact model for enhancer–gene assignment and its application to single-cell data, Bioinformatics, Volume 39, Issue 2, February 2023
+
+- [Bedtools](https://doi.org/10.1093%2Fbioinformatics%2Fbtq033)
+
+  > Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PMID: 20110278; PMCID: PMC2832824.
+
+- [GTFtools](https://doi.org/10.1093/bioinformatics/btac561)
+
+  > Hong-Dong Li, Cui-Xiang Lin, Jiantao Zheng, GTFtools: a software package for analyzing various features of gene models, Bioinformatics, Volume 38, Issue 20, 15 October 2022, Pages 4806–4808,
+
+- [ChromHMM](https://doi.org/10.1038/nprot.2017.124)
+
+  > Ernst, J., Kellis, M. Chromatin-state discovery and genome annotation with ChromHMM. Nat Protoc 12, 2478–2492 (2017)
+
+- [DYNAMITE](https://doi.org/10.1093/bioinformatics/bty856)
+
+  > Florian Schmidt, Fabian Kern, Peter Ebert, Nina Baumgarten, Marcel H Schulz, TEPIC 2—an extended framework for transcription factor binding prediction and integrative epigenomic analysis, Bioinformatics, Volume 35, Issue 9, May 2019, Pages 1608–160
+
+- [Biopython](https://doi.org/10.1093/bioinformatics/btp163)
+
+  > Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon, Biopython: freely available Python tools for computational molecular biology and bioinformatics, Bioinformatics, Volume 25, Issue 11, June 2009, Pages 1422–1423
+
+- [FIMO](https://doi.org/10.1093/bioinformatics/btr064)
+
+  > Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011 Apr 1;27(7):1017-8
+
+- [JASPAR](https://doi.org/10.1093/nar/gkad1059)
+
+  > Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon JA, Ferenc K, Kumar V, Lemma RB, Lucas J, Chèneby J, Baranasic D, Khan A, Fornes O, Gundersen S, Johansen M, Hovig E, Lenhard B, Sandelin A, Wasserman WW, Parcy F, Mathelier A JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles Nucleic Acids Res. 2024 Jan 5;52(D1):D174-D182
+
+- [universalmotif](https://doi.org/10.21105/joss.07012)
+
+  > Tremblay, B. J., (2024). universalmotif: An R package for biological motif analysis. Journal of Open Source Software, 9(100), 701
+
+- [SNEEP](https://doi.org/10.1016/j.isci.2024.109765)
+
+  > Baumgarten N, Ebert P, Schmidt F, Kern F, Schulz MH. A statistical approach for identifying single nucleotide variants that affect transcription factor binding. iScience, Volume 27, Issue 5, 109765
+
 ## Software packaging/containerisation tools
 
 - [Anaconda](https://anaconda.com)
 
@@ -21,47 +21,83 @@
 
 ## Introduction
 
-**nf-core/tfactivity** is a bioinformatics pipeline that ...
+**nf-core/tfactivity** is a bioinformatics pipeline that can identify the most differentially active transcription factors (TFs) between multiple conditions. It takes a count matrix and open chromatin data (ATAC-seq, DNase-seq, HM-ChIP-seq) as input. It produces a ranking of transcription factors.
 
-<!-- TODO nf-core:
-   Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
-   major pipeline sections and the types of output it produces. You're giving an overview to someone new
-   to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
--->
+![Metro map](docs/images/metromap.png)
 
-<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
-     workflows use the "tube map" design for that. See https://nf-co.re/docs/guidelines/graphic_design/workflow_diagrams#examples for examples.   -->
-<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
+1. Identify accessible regions (can perform footprinting between close ChIP-seq peaks or take ATAC-seq peaks)
+2. Calculate affinity scores for combinations of transcription factors and target genes (TGs) using [STARE](https://doi.org/10.1093/bioinformatics/btad062)
+3. Identify differentially expressed genes between conditions
+4. Utilize linear regression to identify the transcription factors that are most likely to be responsible for the differential gene expression
+5. Calculate the TF-TG score based on:
+   1. Differential expression of the target genes
+   2. Affinity of the transcription factors to the target genes
+   3. The regression coefficient of the transcription factors
+6. Perform a Mann-Whitney U test and create a ranking of the transcription factors
+
+A more biological visualization of the workflow can be found here:
+
+> [!NOTE]
+> The following image was created for the TF-Prioritizer publication. Parts of the workflow have been adapted for the nf-core pipeline, but the general idea is still valid.
+
+![TF-Prioritizer workflow](docs/images/tfprio.jpeg)
 
 ## Usage
 
 > [!NOTE]
 > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
 
-<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
-     Explain what rows and columns represent. For instance (please edit as appropriate):
+The pipeline supports processing of previously called peaks from ATAC-seq, DNase-seq, or histone modification ChIP-seq data. The peaks can then either be used as-is or be subjected to footprinting analysis. Additionally, BAM files can be provided in a separate samplesheet, which will be used to predict enhancer regions.
 
-First, prepare a samplesheet with your input data that looks as follows:
+```csv title="samplesheet.csv"
+sample,condition,assay,peak_file
+condition1_H3K27ac_1,condition1,H3K27ac,condition1_H3K27ac_1.broadPeak
+condition1_H3K27ac_2,condition1,H3K27ac,condition1_H3K27ac_2.broadPeak
+condition1_H3K4me3,condition1,H3K4me3,condition1_H3K4me3.broadPeak
+condition2_H3K27ac,condition2,H3K27ac,condition2_H3K27ac.broadPeak
+condition3_H3K27ac,condition3,H3K27ac,condition3_H3K27ac.broadPeak
+condition3_H3K4me3,condition3,H3K4me3,condition3_H3K4me3.broadPeak
+```
 
-`samplesheet.csv`:
+Each row represents a peak file. The `sample` column should contain a unique identifier for each peak file. The `peak_file` column should contain the path to the peak file. Peak files need to be in a format that is compatible with the `bed` format. Only the first three columns of the `bed` format are used.
 
-```csv
-sample,fastq_1,fastq_2
-CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
+```csv title="samplesheet_bam.csv"
+sample,condition,assay,signal,control
+condition1_H3K27ac_1,condition1,H3K27ac,condition1_H3K27ac_1.bam,condition1_control.bam
+condition1_H3K27ac_2,condition1,H3K27ac,condition1_H3K27ac_2.bam,condition1_control.bam
+condition1_H3K4me3,condition1,H3K4me3,condition1_H3K4me3.bam,condition1_control.bam
+condition2_H3K27ac,condition2,H3K27ac,condition2_H3K27ac.bam,condition2_control.bam
+condition3_H3K27ac,condition3,H3K27ac,condition3_H3K27ac.bam,condition3_control.bam
+condition3_H3K4me3,condition3,H3K4me3,condition3_H3K4me3.bam,condition3_control.bam
 ```
 
-Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
+The first three columns are the same as in the peak file samplesheet. The `signal` column should contain the path to the signal BAM file. The `control` column should contain the path to the control BAM file.
 
--->
+Second, you need a raw count matrix (e.g. from [nf-core/rnaseq](https://nf-co.re/rnaseq)) with gene IDs as rows and samples as columns. You also need a design matrix that specifies the conditions of the samples in the count matrix. The design matrix should look as follows:
 
-Now, you can run the pipeline using:
+```csv title="design_matrix.csv"
+sample,condition
+sample1,condition1
+sample2,condition1
+sample3,condition2
+sample4,condition3
+```
+
+The `sample` column should match the columns in the expression matrix. The `condition` column is needs to match the `condition` column in the samplesheet. Additionally,batches can be added to the design matrix and will be considered in the differential expression analysis.
 
-<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
+:::tip
+There is an alternative way of providing expression values. Instead of providing a single count matrix for all samples, you can provide a gene list and one count file per sample. Details can be found in the [usage documentation](https://nf-co.re/tfactivity/usage).
+:::
+
+Now, you can run the pipeline using:
 
 ```bash
 nextflow run nf-core/tfactivity \
    -profile <docker/singularity/.../institute> \
    --input samplesheet.csv \
+   --genome GRCh38 \
+   --counts <EXPRESSION_MATRIX> \
+   --counts_design design_matrix.csv \
    --outdir <OUTDIR>
 ```
 
@@ -78,11 +114,12 @@ For more details about the output files and reports, please refer to the
 
 ## Credits
 
-nf-core/tfactivity was originally written by Nico Trummer.
+nf-core/tfactivity was originally written by [Nico Trummer](https://github.com/nictru).
 
 We thank the following people for their extensive assistance in the development of this pipeline:
 
-<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
+- [Markus Hoffmann](https://scholar.google.com/citations?user=_qXUS28AAAAJ) (project and scientific management)
+- [Leon Hafner](https://www.linkedin.com/in/leon-hafner/) (implementations)
 
 ## Contributions and Support
 
@@ -92,11 +129,17 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
 
 ## Citations
 
+nf-core/tfactivity is based on the previously published "TF-Prioritizer" pipeline. As long as there is no dedicated nf-core/tfactivity publication, please cite the following paper:
+
+> **TF-Prioritizer: a Java pipeline to prioritize condition-specific transcription factors**
+>
+> Markus Hoffmann, Nico Trummer, Leon Schwartz, Jakub Jankowski, Hye Kyung Lee, Lina-Liv Willruth, Olga Lazareva, Kevin Yuan, Nina Baumgarten, Florian Schmidt, Jan Baumbach, Marcel H Schulz, David B Blumenthal, Lothar Hennighausen & Markus List
+>
+> GigaScience, Volume 12, 2023, giad026, https://doi.org/10.1093/gigascience/giad026
+
 <!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
 <!-- If you use nf-core/tfactivity for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
 
-<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
-
 An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
 
 You can cite the `nf-core` publication as follows: