You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> Dennis Hecker, Fatemeh Behjati Ardakani, Alexander Karollus, Julien Gagneur, Marcel H Schulz, The adapted Activity-By-Contact model for enhancer–gene assignment and its application to single-cell data, Bioinformatics, Volume 39, Issue 2, February 2023
> Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28. PMID: 20110278; PMCID: PMC2832824.
> Hong-Dong Li, Cui-Xiang Lin, Jiantao Zheng, GTFtools: a software package for analyzing various features of gene models, Bioinformatics, Volume 38, Issue 20, 15 October 2022, Pages 4806–4808,
> Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon, Biopython: freely available Python tools for computational molecular biology and bioinformatics, Bioinformatics, Volume 25, Issue 11, June 2009, Pages 1422–1423
> Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011 Apr 1;27(7):1017-8
44
+
45
+
-[JASPAR](https://doi.org/10.1093/nar/gkad1059)
46
+
47
+
> Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon JA, Ferenc K, Kumar V, Lemma RB, Lucas J, Chèneby J, Baranasic D, Khan A, Fornes O, Gundersen S, Johansen M, Hovig E, Lenhard B, Sandelin A, Wasserman WW, Parcy F, Mathelier A JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles Nucleic Acids Res. 2024 Jan 5;52(D1):D174-D182
Copy file name to clipboardExpand all lines: README.md
+67-24Lines changed: 67 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,47 +21,83 @@
21
21
22
22
## Introduction
23
23
24
-
**nf-core/tfactivity** is a bioinformatics pipeline that ...
24
+
**nf-core/tfactivity** is a bioinformatics pipeline that can identify the most differentially active transcription factors (TFs) between multiple conditions. It takes a count matrix and open chromatin data (ATAC-seq, DNase-seq, HM-ChIP-seq) as input. It produces a ranking of transcription factors.
25
25
26
-
<!-- TODO nf-core:
27
-
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
28
-
major pipeline sections and the types of output it produces. You're giving an overview to someone new
29
-
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
30
-
-->
26
+

31
27
32
-
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
33
-
workflows use the "tube map" design for that. See https://nf-co.re/docs/guidelines/graphic_design/workflow_diagrams#examples for examples. -->
34
-
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
28
+
1. Identify accessible regions (can perform footprinting between close ChIP-seq peaks or take ATAC-seq peaks)
29
+
2. Calculate affinity scores for combinations of transcription factors and target genes (TGs) using [STARE](https://doi.org/10.1093/bioinformatics/btad062)
30
+
3. Identify differentially expressed genes between conditions
31
+
4. Utilize linear regression to identify the transcription factors that are most likely to be responsible for the differential gene expression
32
+
5. Calculate the TF-TG score based on:
33
+
1. Differential expression of the target genes
34
+
2. Affinity of the transcription factors to the target genes
35
+
3. The regression coefficient of the transcription factors
36
+
6. Perform a Mann-Whitney U test and create a ranking of the transcription factors
37
+
38
+
A more biological visualization of the workflow can be found here:
39
+
40
+
> [!NOTE]
41
+
> The following image was created for the TF-Prioritizer publication. Parts of the workflow have been adapted for the nf-core pipeline, but the general idea is still valid.
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
40
49
41
-
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
42
-
Explain what rows and columns represent. For instance (please edit as appropriate):
50
+
The pipeline supports processing of previously called peaks from ATAC-seq, DNase-seq, or histone modification ChIP-seq data. The peaks can then either be used as-is or be subjected to footprinting analysis. Additionally, BAM files can be provided in a separate samplesheet, which will be used to predict enhancer regions.
43
51
44
-
First, prepare a samplesheet with your input data that looks as follows:
Each row represents a peak file. The `sample` column should contain a unique identifier for each peak file. The `peak_file` column should contain the path to the peak file. Peak files need to be in a format that is compatible with the `bed` format. Only the first three columns of the `bed` format are used.
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
74
+
The first three columns are the same as in the peak file samplesheet. The `signal` column should contain the path to the signal BAM file. The `control` column should contain the path to the control BAM file.
54
75
55
-
-->
76
+
Second, you need a raw count matrix (e.g. from [nf-core/rnaseq](https://nf-co.re/rnaseq)) with gene IDs as rows and samples as columns. You also need a design matrix that specifies the conditions of the samples in the count matrix. The design matrix should look as follows:
56
77
57
-
Now, you can run the pipeline using:
78
+
```csv title="design_matrix.csv"
79
+
sample,condition
80
+
sample1,condition1
81
+
sample2,condition1
82
+
sample3,condition2
83
+
sample4,condition3
84
+
```
85
+
86
+
The `sample` column should match the columns in the expression matrix. The `condition` column is needs to match the `condition` column in the samplesheet. Additionally,batches can be added to the design matrix and will be considered in the differential expression analysis.
58
87
59
-
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
88
+
:::tip
89
+
There is an alternative way of providing expression values. Instead of providing a single count matrix for all samples, you can provide a gene list and one count file per sample. Details can be found in the [usage documentation](https://nf-co.re/tfactivity/usage).
90
+
:::
91
+
92
+
Now, you can run the pipeline using:
60
93
61
94
```bash
62
95
nextflow run nf-core/tfactivity \
63
96
-profile <docker/singularity/.../institute> \
64
97
--input samplesheet.csv \
98
+
--genome GRCh38 \
99
+
--counts <EXPRESSION_MATRIX> \
100
+
--counts_design design_matrix.csv \
65
101
--outdir <OUTDIR>
66
102
```
67
103
@@ -78,11 +114,12 @@ For more details about the output files and reports, please refer to the
78
114
79
115
## Credits
80
116
81
-
nf-core/tfactivity was originally written by Nico Trummer.
117
+
nf-core/tfactivity was originally written by [Nico Trummer](https://github.com/nictru).
82
118
83
119
We thank the following people for their extensive assistance in the development of this pipeline:
84
120
85
-
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
121
+
-[Markus Hoffmann](https://scholar.google.com/citations?user=_qXUS28AAAAJ) (project and scientific management)
@@ -92,11 +129,17 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
92
129
93
130
## Citations
94
131
132
+
nf-core/tfactivity is based on the previously published "TF-Prioritizer" pipeline. As long as there is no dedicated nf-core/tfactivity publication, please cite the following paper:
133
+
134
+
> **TF-Prioritizer: a Java pipeline to prioritize condition-specific transcription factors**
135
+
>
136
+
> Markus Hoffmann, Nico Trummer, Leon Schwartz, Jakub Jankowski, Hye Kyung Lee, Lina-Liv Willruth, Olga Lazareva, Kevin Yuan, Nina Baumgarten, Florian Schmidt, Jan Baumbach, Marcel H Schulz, David B Blumenthal, Lothar Hennighausen & Markus List
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
96
141
<!-- If you use nf-core/tfactivity for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
97
142
98
-
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
99
-
100
143
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
101
144
102
145
You can cite the `nf-core` publication as follows:
0 commit comments