From 3c1aed28bd335f176a070f4aa643120776b17406 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Fri, 13 Mar 2026 03:51:27 +0000 Subject: [PATCH 1/2] =?UTF-8?q?chore:=20=F0=9F=A4=96=20sync=20copilot=20in?= =?UTF-8?q?structions=20-=202026-03-13?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .github/copilot-instructions.md | 164 ++++++++++++++++++++++++++++++++ 1 file changed, 164 insertions(+) create mode 100644 .github/copilot-instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..ab79ac9 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,164 @@ +# CoPilot Instructions for CCBR Repositories + +## Reviewer guidance (what to look for in PRs) + +- Reviewers must validate enforcement rules: no secrets, container specified, and reproducibility pins. +- If code is AI-generated, reviewers must ensure the author documents what was changed and why, and that the PR is labeled `generated-by-AI`. +- Reviewers should verify license headers and ownership metadata (for example, `CODEOWNERS`) are present. +- Reviews must read the code and verify that it adheres to the project's coding standards, guidelines, and best practices in software engineering. + +## CI & enforcement suggestions (automatable) + +1. **PR template**: include optional AI-assistance disclosure fields (model used, high-level prompt intent, manual review confirmation). +2. **Pre-merge check (GitHub Action)**: verify `.github/copilot-instructions.md` is present in the repository and that new pipeline files include a `# CRAFT:` header. +3. **Lint jobs**: `ruff` for Python, `shellcheck` for shell, `lintr` for R, and `nf-core lint` or Snakemake lint checks where applicable. +4. **Secrets scan**: run `TruffleHog` or `Gitleaks` on PRs to detect accidental credentials. +5. **AI usage label**: if AI usage is declared, an Action should add `generated-by-AI` label (create this label if it does not exist); the PR body should end with the italicized Markdown line: _Generated using AI_, and any associated commit messages should end with the plain footer line: `Generated using AI`. + +_Sample GH Action check (concept): if AI usage is declared, require an AI-assistance disclosure field in the PR body._ + +## Security & compliance (mandatory) + +- Developers must not send PHI or sensitive NIH internal identifiers to unapproved external AI services; use synthetic examples. +- Repository content must only be sent to model providers approved by NCI/NIH policy (for example, Copilot for Business or approved internal proxies). +- For AI-assisted actions, teams must keep an auditable record including: user, repository, action, timestamp, model name, and endpoint. +- If using a server wrapper (Option C), logs must include the minimum metadata above and follow institutional retention policy. +- If policy forbids external model use for internal code, teams must use approved local/internal LLM workflows. + +## Operational notes (practical) + +- `copilot-instructions.md` should remain concise and prescriptive; keep only high-value rules and edge-case examples. +- Developers should include the CRAFT block in edited files when requesting substantial generated code to improve context quality. +- CoPilot must ask the user for permission before deleting any file unless the file was created by CoPilot for a temporary run or test. +- CoPilot must not edit any files outside of the current open workspace. + +## Code authoring guidance + +- Code must not include hard-coded secrets, credentials, or sensitive absolute paths on disk. +- Code should be designed for modularity, reusability, and maintainability. It should ideally be platform-agnostic, with special support for running on the Biowulf HPC. +- Use pre-commit to enforce code style and linting during the commit process. + +### Pipelines + +- Authors must review existing CCBR pipelines first: . +- New pipelines should follow established CCBR conventions for folder layout, rule/process naming, config structure, and test patterns. +- Pipelines must define container images and pin tool/image versions for reproducibility. +- Contributions should include a test dataset and a documented example command. + +#### Snakemake + +- In general, new pipelines should be created with Nextflow rather than Snakemake, unless there is a compelling reason to use Snakemake. +- Generate new pipelines from the CCBR_SnakemakeTemplate repo: +- For Snakemake, run `snakemake --lint` and a dry-run before PR submission. + +#### Nextflow + +- Generate new pipelines from the CCBR_NextflowTemplate repo: +- For Nextflow pipelines, authors must follow nf-core patterns and references: . +- Nextflow code must use DSL2 only (DSL1 is not allowed). +- For Nextflow, run `nf-core lint` (or equivalent checks) before PR submission. +- Where possible, reuse modules and subworkflows from CCBR/nf-modules or nf-core/modules. +- New modules and subworkflows should be tested with `nf-test`. + +### Python scripts and packages + +- Python scripts must include module and function/class docstrings. +- Where a standard CLI framework is adopted, Python CLIs should use `click` or `typer` for consistency with existing components. +- Scripts must support `--help` and document required/optional arguments. +- Python code must follow [PEP 8](https://peps.python.org/pep-0008/), use `snake_case`, and include type hints for public functions. +- Scripts must raise descriptive error messages on failure and warnings when applicable. Prefer raising an exception over printing an error message, and over returning an error code. +- Python code should pass `ruff`; +- Each script must include a documented example usage in comments or README. +- Tests should be written with `pytest`. Other testing frameworks may be used if justified. +- Do not catch bare exceptions. The exception type must always be specified. +- Only include one return statement at the end of a function. + +### R scripts and packages + +- R scripts must include function and class docstrings via roxygen2. +- CLIs must be defined using the `argparse` package. +- CLIs must support `--help` and document required/optional arguments. +- R code should pass `lintr` and `air`. +- Tests should be written with `testthat`. +- Packages should pass `devtools::check()`. +- R code should adhere to the tidyverse style guide. https://style.tidyverse.org/ +- Only include one return statement at the end of a function, if a return statement is used at all. Explicit returns are preferred but not required for R functions. + +## AI-generated commit messages (Conventional Commits) + +- Commit messages must follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) (as enforced in `CONTRIBUTING.md`). +- Generate messages from staged changes only (`git diff --staged`); do not include unrelated work. +- Commits should be atomic: one logical change per commit. +- If mixed changes are present, split into multiple logical commits; the number of commits does not need to equal the number of files changed. +- Subject format must be: `(optional-scope): short imperative summary` (<=72 chars), e.g., `fix(profile): update release table parser`. +- Add a body only when needed to explain **why** and notable impact; never include secrets, tokens, PHI, or large diffs. +- For AI-assisted commits, add this final italicized footer line in the commit message body: _commit message is ai-generated_ + +Suggested prompt for AI tools: + +```text +Create a Conventional Commit message from this staged diff. +Rules: +1) Use one of: feat|fix|docs|style|refactor|perf|test|build|ci|chore|revert. +2) Keep subject <= 72 chars, imperative mood, no trailing period. +3) Include optional scope when clear. +4) Add a short body only if needed (why/impact), wrapped at ~72 chars. +5) Output only the final commit message. +``` + +## Pull Requests + +When opening a pull request, use the repository's pull request template (usually it is `.github/PULL_REQUEST_TEMPLATE.md`). +Different repos have different PR templates depending on their needs. +Ensure that the pull request follows the repository's PR template and includes all required information. +Do not allow the developer to proceed with opening a PR if it does not fill out all sections of the template. +Before a PR can be moved from draft to "ready for review", all of the relevant checklist items must be checked, and any +irrelevant checklist items should be crossed out. + +When new features, bug fixes, or other behavioral changes are introduced to the code, +unit tests must be added or updated to cover the new or changed functionality. + +If there are any API or other user-facing changes, the documentation must be updated both inline via docstrings and long-form docs in the `docs/` or `vignettes/` directory. + +When a repo contains a build workflow (i.e. a workflow file in `.github/workflows` starting with `build` or named `R-CMD-check`), +the build workflow must pass before the PR can be approved. + +### Changelog + +The changelog for the repository should be maintained in a `CHANGELOG.md` file +(or `NEWS.md` for R packages) at the root of the repository. Each pull request +that introduces user-facing changes must include a concise entry with the PR +number and author username tagged. Developer-only changes (i.e. updates to CI +workflows, development notes, etc.) should never be included in the changelog. +Example: + +``` +## development version + +- Fix bug in `detect_absolute_paths()` to ignore comments. (#123, @username) +``` + +## Onboarding checklist for new developers + +- [ ] Read `.github/CONTRIBUTING.md` and `.github/copilot-instructions.md`. +- [ ] Configure VSCode workspace to open `copilot-instructions.md` by default (so Copilot Chat sees it). +- [ ] Install pre-commit and run `pre-commit install`. + +## Appendix: VSCode snippet (drop into `.vscode/snippets/craft.code-snippets`) + +```json +{ + "Insert CRAFT prompt": { + "prefix": "craft", + "body": [ + "/* C: Context: Repo=${workspaceFolderBasename}; bioinformatics pipelines; NIH HPC (Biowulf/Helix); containers: quay.io/ccbr */", + "/* R: Rules: no PHI, no secrets, containerize, pin versions, follow style */", + "/* F: Flow: inputs/ -> results/, conf/, tests/ */", + "/* T: Tests: provide a one-line TEST_CMD and expected output */", + "", + "A: $1" + ], + "description": "Insert CRAFT prompt and place cursor at Actions" + } +} +``` From c43b4fab70516d5b5a99f3ad96239c3a49914f0d Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 13 Mar 2026 03:52:48 +0000 Subject: [PATCH 2/2] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- .tests/README.md | 1 - .tests/pairs.tsv | 2 +- README.md | 2 +- bin/ascat.R | 10 +- bin/assess_significance.R | 8 +- bin/combineAllSampleCompareResults.R | 1 - bin/flowcell_lane.py | 76 ++-- bin/lofreq_convert.sh | 2 +- bin/makeGraph.R | 40 +-- bin/make_freec_genome.pl | 4 +- bin/make_freec_genome_paired.pl | 2 +- bin/sexscript.sh | 2 +- bin/split_Bed_into_equal_regions.py | 214 ++++++----- bin/strelka_convert.py | 127 ++++--- conf/ci_stub.config | 2 +- conf/fastq_screen.conf | 8 +- conf/frce.config | 2 +- conf/genomes.config | 40 +-- docker/annotate_cnvsv/Dockerfile | 16 +- docker/annotate_cnvsv/build.sh | 4 +- docker/cnv/Dockerfile | 16 +- docker/cnv/build.sh | 2 +- docker/ffpe/Dockerfile | 30 +- docker/ffpe/build.sh | 3 +- docker/lofreq/Dockerfile | 12 +- docker/lofreq/build.sh | 4 +- docker/logan_base/Dockerfile | 50 +-- docker/logan_base/argparse.bash | 2 +- docker/logan_base/build.sh | 2 +- docker/qc/Dockerfile | 23 +- docker/qc/build.sh | 4 +- docker/sv/Dockerfile | 21 +- docker/sv/build.sh | 4 +- docs/index.md | 1 - docs/user-guide/pipeline.md | 57 +-- docs/user-guide/tool_comparisons.md | 57 ++- modules/local/annotsv.nf | 2 +- modules/local/ascat.nf | 4 +- modules/local/bcftools_stats.nf | 2 +- modules/local/bwamem/bwamem2.nf | 16 +- modules/local/cnvkit.nf | 1 - modules/local/combinefilter.nf | 15 +- modules/local/deepsomatic.nf | 7 +- modules/local/deepvariant.nf | 6 +- modules/local/fastp/fastp.nf | 4 +- modules/local/fastq_screen.nf | 2 +- modules/local/ffpe.nf | 19 +- modules/local/freec.nf | 6 +- modules/local/gridss.nf | 4 +- modules/local/kraken.nf | 2 +- modules/local/lancet2/lancet2.nf | 4 +- modules/local/lofreq.nf | 2 - modules/local/manta.nf | 2 +- modules/local/mutect2.nf | 3 - modules/local/octopus.nf | 13 +- modules/local/purple.nf | 11 +- modules/local/qc.nf | 2 +- modules/local/qualimap.nf | 2 +- modules/local/sage.nf | 4 +- modules/local/sequenza.nf | 4 +- modules/local/splitbed.nf | 6 +- modules/local/strelka.nf | 2 +- modules/local/trim_align.nf | 9 +- modules/local/vardict.nf | 6 +- modules/local/varscan.nf | 3 +- modules/local/vcftools.nf | 2 +- setup.py | 2 +- subworkflows/local/workflows.nf | 499 +++++++++++++------------- subworkflows/local/workflows_tonly.nf | 299 ++++++++------- 69 files changed, 925 insertions(+), 891 deletions(-) diff --git a/.tests/README.md b/.tests/README.md index 540c385..a6ad604 100644 --- a/.tests/README.md +++ b/.tests/README.md @@ -3,4 +3,3 @@ These input files are used for continuous integration purposes, specificially to dry run the pipeline whenever commits have been made to the main, master, or unified branches. **Please Note:** Each of the provided FastQ files and BAM files have only headers and will not work for the LOGAN pipeline - diff --git a/.tests/pairs.tsv b/.tests/pairs.tsv index fdccfb3..67824a2 100644 --- a/.tests/pairs.tsv +++ b/.tests/pairs.tsv @@ -1,2 +1,2 @@ Tumor Normal -WGS_NC_T WGS_NC_N +WGS_NC_T WGS_NC_N diff --git a/README.md b/README.md index 807de47..78f2b1f 100644 --- a/README.md +++ b/README.md @@ -154,7 +154,7 @@ Example of Tumor_Normal calling mode # Step 0: Set up sinteractive --mem=8g -N 1 -n 4 -module load ccbrpipeliner # v8 +module load ccbrpipeliner # v8 # set up directories diff --git a/bin/ascat.R b/bin/ascat.R index a275194..784239a 100755 --- a/bin/ascat.R +++ b/bin/ascat.R @@ -51,14 +51,14 @@ ascat.prepareHTS( normalBAF_file = sprintf("%s_BAF.txt",normal_name), BED_file=bed) -ascat.bc = ascat.loadData(Tumor_LogR_file = sprintf("%s_LogR.txt",tumor_name), - Tumor_BAF_file = sprintf("%s_BAF.txt",tumor_name), - Germline_LogR_file = sprintf("%s_LogR.txt",normal_name), Germline_BAF_file = sprintf("%s_BAF.txt",normal_name), +ascat.bc = ascat.loadData(Tumor_LogR_file = sprintf("%s_LogR.txt",tumor_name), + Tumor_BAF_file = sprintf("%s_BAF.txt",tumor_name), + Germline_LogR_file = sprintf("%s_LogR.txt",normal_name), Germline_BAF_file = sprintf("%s_BAF.txt",normal_name), gender = gender, genomeVersion = genome) ascat.plotRawData(ascat.bc, img.prefix = "Before_correction_") -ascat.bc = ascat.correctLogR(ascat.bc, - GCcontentfile = sprintf("%s/GC_G1000/GC_G1000_%s.txt",genomebasedir,genome), +ascat.bc = ascat.correctLogR(ascat.bc, + GCcontentfile = sprintf("%s/GC_G1000/GC_G1000_%s.txt",genomebasedir,genome), replictimingfile = sprintf("%s/RT_G1000/RT_G1000_%s.txt",genomebasedir,genome)) ascat.plotRawData(ascat.bc, img.prefix = "After_correction_") ascat.bc = ascat.aspcf(ascat.bc) diff --git a/bin/assess_significance.R b/bin/assess_significance.R index 7f7216c..b416031 100755 --- a/bin/assess_significance.R +++ b/bin/assess_significance.R @@ -12,7 +12,7 @@ cnvs<- data.frame(dataTable) ratio$Ratio[which(ratio$Ratio==-1)]=NA -cnvs.bed=GRanges(cnvs[,1],IRanges(cnvs[,2],cnvs[,3])) +cnvs.bed=GRanges(cnvs[,1],IRanges(cnvs[,2],cnvs[,3])) ratio.bed=GRanges(ratio$Chromosome,IRanges(ratio$Start,ratio$Start),score=ratio$Ratio) overlaps <- subsetByOverlaps(ratio.bed,cnvs.bed) @@ -46,13 +46,13 @@ ifelse(resultks == "try-error",kscore <- c(kscore, "NA"),kscore <- c(kscore, ks. cnvs = cbind(cnvs, as.numeric(wscore), as.numeric(kscore)) if (numberOfCol==7) { - names(cnvs)=c("chr","start","end","copy number","status","WilcoxonRankSumTestPvalue","KolmogorovSmirnovPvalue") + names(cnvs)=c("chr","start","end","copy number","status","WilcoxonRankSumTestPvalue","KolmogorovSmirnovPvalue") } if (numberOfCol==9) { - names(cnvs)=c("chr","start","end","copy number","status","genotype","uncertainty","WilcoxonRankSumTestPvalue","KolmogorovSmirnovPvalue") + names(cnvs)=c("chr","start","end","copy number","status","genotype","uncertainty","WilcoxonRankSumTestPvalue","KolmogorovSmirnovPvalue") } if (numberOfCol==11) { - names(cnvs)=c("chr","start","end","copy number","status","genotype","uncertainty","somatic/germline","precentageOfGermline","WilcoxonRankSumTestPvalue","KolmogorovSmirnovPvalue") + names(cnvs)=c("chr","start","end","copy number","status","genotype","uncertainty","somatic/germline","precentageOfGermline","WilcoxonRankSumTestPvalue","KolmogorovSmirnovPvalue") } write.table(cnvs, file=paste(args[4],".p.value.txt",sep=""),sep="\t",quote=F,row.names=F) diff --git a/bin/combineAllSampleCompareResults.R b/bin/combineAllSampleCompareResults.R index 6b646b6..83a3590 100755 --- a/bin/combineAllSampleCompareResults.R +++ b/bin/combineAllSampleCompareResults.R @@ -78,4 +78,3 @@ colnames(finalPredPairs)<-c("Sample1","Sample2","Som:relatedness","Som:hom_conco #mergedDF<-merge(x=finalPredPairs,y=finalpredictedPairsVerifyBAMID,by = "Sample1",all = TRUE) #write.table(mergedDF[,c(1:4,6)],file = user.input.3,sep = "\t",quote = FALSE,row.names = FALSE) write.table(finalPredPairs,file = user.input.3,sep = "\t",quote = FALSE,row.names = FALSE) - diff --git a/bin/flowcell_lane.py b/bin/flowcell_lane.py index 0839b05..f93d68a 100755 --- a/bin/flowcell_lane.py +++ b/bin/flowcell_lane.py @@ -29,12 +29,17 @@ # +SRR6755966.1 1 length=101 # CC@FFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHIJJJJI -def usage(message = '', exitcode = 0): + +def usage(message="", exitcode=0): """Displays help and usage information. If provided invalid usage returns non-zero exit-code. Additional message can be displayed with the 'message' parameter. """ - print('Usage: python {} sampleName.R1.fastq.gz sampleName > sampleName.flowcell_lanes.txt'.format(sys.argv[0])) + print( + "Usage: python {} sampleName.R1.fastq.gz sampleName > sampleName.flowcell_lanes.txt".format( + sys.argv[0] + ) + ) if message: print(message) sys.exit(exitcode) @@ -49,11 +54,11 @@ def get_flowcell_lane(sequence_identifer): IDs in its sequence indentifer. For more information visit: https://en.wikipedia.org/wiki/FASTQ_format """ - id_list = sequence_identifer.strip().split(':') + id_list = sequence_identifer.strip().split(":") if len(id_list) < 7: # No Flowcell IDs in this format # Return next instrument id instead (next best thing) - if sequence_identifer.startswith('@SRR'): + if sequence_identifer.startswith("@SRR"): # SRA format or downloaded SRA FastQ file # SRA format 1: contains machine and lane information # @SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 @@ -66,20 +71,20 @@ def get_flowcell_lane(sequence_identifer): except IndexError: # SRA format 2 id1 = id_list[0].split()[0].split(".")[0] - id2 = id1.lstrip('@') - return id1,id2 + id2 = id1.lstrip("@") + return id1, id2 else: # Casava < 1.8 (fastq format) # @HWUSI-EAS100R:6:73:941:1973#0/1 - return id_list[0],id_list[1] + return id_list[0], id_list[1] else: # Casava >= 1.8 # Normal FastQ format # @J00170:88:HNYVJBBXX:8:1101:6390:1244 1:N:0:ACTTGA - return id_list[2],id_list[3] + return id_list[2], id_list[3] -def md5sum(filename, blocksize = 65536): +def md5sum(filename, blocksize=65536): """Gets md5checksum of a file in memory-safe manner. The file is read in blocks defined by the blocksize parameter. This is a safer option to reading the entire file into memory if the file is very large. @@ -93,7 +98,7 @@ def md5sum(filename, blocksize = 65536): import hashlib hasher = hashlib.md5() - with open(filename, 'rb') as fh: + with open(filename, "rb") as fh: buf = fh.read(blocksize) while len(buf) > 0: hasher.update(buf) @@ -102,13 +107,15 @@ def md5sum(filename, blocksize = 65536): return hasher.hexdigest() -if __name__ == '__main__': - +if __name__ == "__main__": # Check Usage - if '-h' in sys.argv or '--help' in sys.argv or '-help' in sys.argv: - usage(exitcode = 0) + if "-h" in sys.argv or "--help" in sys.argv or "-help" in sys.argv: + usage(exitcode=0) elif len(sys.argv) != 3: - usage(message = 'Error: failed to provide all required positional arguments!', exitcode = 1) + usage( + message="Error: failed to provide all required positional arguments!", + exitcode=1, + ) # Get file name and sample name prefix filename = sys.argv[1] @@ -117,23 +124,34 @@ def md5sum(filename, blocksize = 65536): md5 = md5sum(filename) # Get Flowcell and Lane information - handle = gzip.open if filename.endswith('.gz') else open - meta = {'flowcell': [], 'lane': [], 'flowcell_lane': []} + handle = gzip.open if filename.endswith(".gz") else open + meta = {"flowcell": [], "lane": [], "flowcell_lane": []} i = 0 # keeps track of line number - with handle(filename, 'rt') as file: - print('sample_name\ttotal_read_pairs\tflowcell_ids\tlanes\tflowcell_lanes\tmd5_checksum') + with handle(filename, "rt") as file: + print( + "sample_name\ttotal_read_pairs\tflowcell_ids\tlanes\tflowcell_lanes\tmd5_checksum" + ) for line in file: line = line.strip() - if i%4 == 0: # read id or sequence identifer + if i % 4 == 0: # read id or sequence identifer fc, lane = get_flowcell_lane(line) - fc = fc.lstrip('@') - fc_lane = "{}_{}".format(fc,lane) - if fc not in meta['flowcell']: - meta['flowcell'].append(fc) - if lane not in meta['lane']: - meta['lane'].append(lane) - if fc_lane not in meta['flowcell_lane']: - meta['flowcell_lane'].append(fc_lane) + fc = fc.lstrip("@") + fc_lane = "{}_{}".format(fc, lane) + if fc not in meta["flowcell"]: + meta["flowcell"].append(fc) + if lane not in meta["lane"]: + meta["lane"].append(lane) + if fc_lane not in meta["flowcell_lane"]: + meta["flowcell_lane"].append(fc_lane) i += 1 - print("{}\t{}\t{}\t{}\t{}\t{}".format(sample, int(i/4),",".join(sorted(meta['flowcell'])),",".join(sorted(meta['lane'])),",".join(sorted(meta['flowcell_lane'])), md5)) \ No newline at end of file + print( + "{}\t{}\t{}\t{}\t{}\t{}".format( + sample, + int(i / 4), + ",".join(sorted(meta["flowcell"])), + ",".join(sorted(meta["lane"])), + ",".join(sorted(meta["flowcell_lane"])), + md5, + ) + ) diff --git a/bin/lofreq_convert.sh b/bin/lofreq_convert.sh index 1d5edda..42d544b 100755 --- a/bin/lofreq_convert.sh +++ b/bin/lofreq_convert.sh @@ -29,4 +29,4 @@ zcat "${INPUT_FILE}" \ my @data = map { chomp; [ split /=|;/ ] } $_; $NEW_ROW = "$_\tDP:DP4\t$data[0][1]:$data[0][7]\n"; print $NEW_ROW; - }' \ No newline at end of file + }' diff --git a/bin/makeGraph.R b/bin/makeGraph.R index 9938587..88849d4 100755 --- a/bin/makeGraph.R +++ b/bin/makeGraph.R @@ -19,21 +19,21 @@ for (i in c(1:22,'X','Y')) { plot(ratio$Start[tt],log2(ratio$Ratio[tt]),xlab = paste ("position, chr",i),ylab = "normalized copy number profile (log2)",pch = ".",col = colors()[88]) tt <- which(ratio$Chromosome==i & ratio$CopyNumber>ploidy ) points(ratio$Start[tt],log2(ratio$Ratio[tt]),pch = ".",col = colors()[136]) - - + + tt <- which(ratio$Chromosome==i & ratio$CopyNumberploidy ) points(ratio$Start[tt],ratio$Ratio[tt]*ploidy,pch = ".",col = colors()[136]) - - tt <- which(ratio$Chromosome==i & ratio$Ratio==maxLevelToPlot & ratio$CopyNumber>ploidy) + + tt <- which(ratio$Chromosome==i & ratio$Ratio==maxLevelToPlot & ratio$CopyNumber>ploidy) points(ratio$Start[tt],ratio$Ratio[tt]*ploidy,pch = ".",col = colors()[136],cex=4) - + tt <- which(ratio$Chromosome==i & ratio$CopyNumber5) { lBAF <-BAF[tt,] plot(lBAF$Position,lBAF$BAF,ylim = c(-0.1,1.1),xlab = paste ("position, chr",i),ylab = "BAF",pch = ".",col = colors()[1]) - tt <- which(lBAF$A==0.5) + tt <- which(lBAF$A==0.5) points(lBAF$Position[tt],lBAF$BAF[tt],pch = ".",col = colors()[92]) tt <- which(lBAF$A!=0.5 & lBAF$A>=0) points(lBAF$Position[tt],lBAF$BAF[tt],pch = ".",col = colors()[62]) @@ -106,24 +106,24 @@ if (length(args)>5) { if (length(lBAF$A)>4) { for (j in c(2:(length(lBAF$A)-pres-1))) { - if (lBAF$A[j]==lBAF$A[j+pres]) { - tt[length(tt)+1] <- j + if (lBAF$A[j]==lBAF$A[j+pres]) { + tt[length(tt)+1] <- j } } points(lBAF$Position[tt],lBAF$A[tt],pch = ".",col = colors()[24],cex=4) - points(lBAF$Position[tt],lBAF$B[tt],pch = ".",col = colors()[24],cex=4) + points(lBAF$Position[tt],lBAF$B[tt],pch = ".",col = colors()[24],cex=4) } tt <- 1 pres <- 1 if (length(lBAF$FittedA)>4) { for (j in c(2:(length(lBAF$FittedA)-pres-1))) { - if (lBAF$FittedA[j]==lBAF$FittedA[j+pres]) { - tt[length(tt)+1] <- j + if (lBAF$FittedA[j]==lBAF$FittedA[j+pres]) { + tt[length(tt)+1] <- j } } points(lBAF$Position[tt],lBAF$FittedA[tt],pch = ".",col = colors()[463],cex=4) - points(lBAF$Position[tt],lBAF$FittedB[tt],pch = ".",col = colors()[463],cex=4) + points(lBAF$Position[tt],lBAF$FittedB[tt],pch = ".",col = colors()[463],cex=4) } } diff --git a/bin/make_freec_genome.pl b/bin/make_freec_genome.pl index 28aec4c..7745b26 100755 --- a/bin/make_freec_genome.pl +++ b/bin/make_freec_genome.pl @@ -25,9 +25,9 @@ print C "chrFiles = $chrFiles\n"; print C "minimalSubclonePresence = 20\nmaxThreads = 4\n"; print C "outputDir = $ARGV[0]\n\n"; - + print C '[sample]' . "\n\n"; - + print C "mateFile = $tumormateFile\n"; print C "inputFormat = BAM\nmateOrientation = FR\n\n"; diff --git a/bin/make_freec_genome_paired.pl b/bin/make_freec_genome_paired.pl index e1861bb..be18317 100755 --- a/bin/make_freec_genome_paired.pl +++ b/bin/make_freec_genome_paired.pl @@ -41,4 +41,4 @@ print C "makePileup = $makePileup\n"; print C "fastaFile = $fastaFile\n"; print C "minimalCoveragePerPosition = 5\nminimalQualityPerPosition = 5\n"; -print C "SNPfile = $SNPfile"; \ No newline at end of file +print C "SNPfile = $SNPfile"; diff --git a/bin/sexscript.sh b/bin/sexscript.sh index 8e94555..a2cf08c 100755 --- a/bin/sexscript.sh +++ b/bin/sexscript.sh @@ -15,4 +15,4 @@ echo "F" else echo "M" - fi \ No newline at end of file + fi diff --git a/bin/split_Bed_into_equal_regions.py b/bin/split_Bed_into_equal_regions.py index 49c682f..bc7b3c8 100755 --- a/bin/split_Bed_into_equal_regions.py +++ b/bin/split_Bed_into_equal_regions.py @@ -8,194 +8,216 @@ def run(): - # argparse Stuff - parser = argparse.ArgumentParser(description='Given an input bed file, this program will output a number of bed files, each will have same number of total base pairs. This routine is used to parallelize SomaticSeq tasks. One limitation, however, is that some regions of the genome have much higher coverage than others. This is the reason some regions run much slower than others.', formatter_class=argparse.ArgumentDefaultsHelpFormatter) - + parser = argparse.ArgumentParser( + description="Given an input bed file, this program will output a number of bed files, each will have same number of total base pairs. This routine is used to parallelize SomaticSeq tasks. One limitation, however, is that some regions of the genome have much higher coverage than others. This is the reason some regions run much slower than others.", + formatter_class=argparse.ArgumentDefaultsHelpFormatter, + ) + # Variant Call Type, i.e., snp or indel - parser.add_argument('-infile', '--input-file', type=str, help='Input merged BED file', required=True, default=None) - parser.add_argument('-num', '--num-of-files', type=int, help='1', required=False, default=1) - parser.add_argument('-outfiles', '--output-files', type=str, help='Output BED file', required=False, default=sys.stdout) - - + parser.add_argument( + "-infile", + "--input-file", + type=str, + help="Input merged BED file", + required=True, + default=None, + ) + parser.add_argument( + "-num", "--num-of-files", type=int, help="1", required=False, default=1 + ) + parser.add_argument( + "-outfiles", + "--output-files", + type=str, + help="Output BED file", + required=False, + default=sys.stdout, + ) + # Parse the arguments: args = parser.parse_args() - - infile = args.input_file + + infile = args.input_file outfiles = args.output_files - num = args.num_of_files - + num = args.num_of_files + return infile, outfiles, num def fai2bed(fai, bedout): - - with open(fai) as fai, open(bedout, 'w') as bed: - + with open(fai) as fai, open(bedout, "w") as bed: fai_i = fai.readline().rstrip() - + while fai_i: - fai_item = fai_i.split('\t') - bed.write( '{}\t{}\t{}\n'.format(fai_item[0], '0', fai_item[1] ) ) + fai_item = fai_i.split("\t") + bed.write("{}\t{}\t{}\n".format(fai_item[0], "0", fai_item[1])) fai_i = fai.readline().rstrip() - + return bedout def split(infile, outfiles, num): - outfilesWritten = [] - - out_basename = os.path.basename(outfiles) + + out_basename = os.path.basename(outfiles) out_directory = os.path.dirname(outfiles) - + if not out_directory: out_directory = os.curdir - + with open(infile) as bedin: - line_i = bedin.readline().rstrip() - - while re.match(r'track|browser|#', line_i): + + while re.match(r"track|browser|#", line_i): line_i = bedin.readline().rstrip() - + total_region_size = 0 original_regions = [] while line_i: - - items = line_i.split('\t') - - chr_i = items[0] + items = line_i.split("\t") + + chr_i = items[0] start_i = int(items[1]) - end_i = int(items[2]) - - total_region_size = total_region_size + ( end_i - start_i ) - original_regions.append( (chr_i, start_i, end_i) ) - + end_i = int(items[2]) + + total_region_size = total_region_size + (end_i - start_i) + original_regions.append((chr_i, start_i, end_i)) + line_i = bedin.readline().rstrip() - + # For each bed file, this is the total base pairs in that file - size_per_file = math.ceil( total_region_size / num ) - + size_per_file = math.ceil(total_region_size / num) + # Go through every original region and split current_size = 0 current_region = [] ith_split = 1 for region_i in original_regions: - - chr_i = region_i[0] + chr_i = region_i[0] start_i = region_i[1] - end_i = region_i[2] - + end_i = region_i[2] + # If the "now size" is still less than size/file requirement if current_size + (end_i - start_i) <= size_per_file: - # Need to collect more to fulfill the size/file requirement, so append to current_region list - current_region.append( '{}\t{}\t{}\n'.format(chr_i, start_i, end_i) ) + current_region.append("{}\t{}\t{}\n".format(chr_i, start_i, end_i)) current_size = current_size + (end_i - start_i) - + # If the "now size" exceeds the size/file requirement, need to start splitting: elif current_size + (end_i - start_i) > size_per_file: - # Split a big region into a smaller regino, such that the size of "current_region" is equal to the size/file requirement: breakpoint_i = size_per_file + start_i - current_size - breakpoint_i= int(breakpoint_i) + breakpoint_i = int(breakpoint_i) # Write these regions out, , reset "current_region," then add 1 to ith_split afterward to keep track: - outfilesWritten.append( '{}{}{}.{}'.format(out_directory, os.sep, ith_split, out_basename) ) - with open( '{}{}{}.{}'.format(out_directory, os.sep, ith_split, out_basename), 'w' ) as ith_out: + outfilesWritten.append( + "{}{}{}.{}".format(out_directory, os.sep, ith_split, out_basename) + ) + with open( + "{}{}{}.{}".format(out_directory, os.sep, ith_split, out_basename), "w" + ) as ith_out: for line_i in current_region: - ith_out.write( line_i ) - + ith_out.write(line_i) + # Make sure it doesn't write a 0-bp region: if breakpoint_i > start_i: - ith_out.write( '{}\t{}\t{}\n'.format( chr_i, start_i, breakpoint_i ) ) + ith_out.write("{}\t{}\t{}\n".format(chr_i, start_i, breakpoint_i)) ith_split += 1 current_region = [] - + # The remaining, is the end position of the original region and the previous breakpoint: remaining_length = end_i - breakpoint_i remaining_region = (chr_i, breakpoint_i, end_i) - - # If the remnant of the previous region is less than the size/file requirement, simply make it "current_region" and then move on: + + # If the remnant of the previous region is less than the size/file requirement, simply make it "current_region" and then move on: if remaining_length <= size_per_file: - current_region.append( '{}\t{}\t{}\n'.format(chr_i, breakpoint_i, end_i) ) + current_region.append("{}\t{}\t{}\n".format(chr_i, breakpoint_i, end_i)) current_size = remaining_length - + # If the renmant of the previuos region exceed the size/file requirement, it needs to be properly split until it's small enough: elif remaining_length > size_per_file: - # Each sub-region, if large enough, will have its own file output: while (end_i - breakpoint_i) > size_per_file: - end_j = breakpoint_i + size_per_file - end_j=int(end_j) - - outfilesWritten.append( '{}{}{}.{}'.format(out_directory, os.sep, ith_split, out_basename) ) - with open( '{}{}{}.{}'.format(out_directory, os.sep, ith_split, out_basename), 'w' ) as ith_out: - + end_j = int(end_j) + + outfilesWritten.append( + "{}{}{}.{}".format( + out_directory, os.sep, ith_split, out_basename + ) + ) + with open( + "{}{}{}.{}".format( + out_directory, os.sep, ith_split, out_basename + ), + "w", + ) as ith_out: if end_j > breakpoint_i: - ith_out.write( '{}\t{}\t{}\n'.format( chr_i, breakpoint_i, end_j ) ) + ith_out.write( + "{}\t{}\t{}\n".format(chr_i, breakpoint_i, end_j) + ) ith_split += 1 - + breakpoint_i = end_j - + # After every sub-region has its own bed file, the remnant is added to "current_region" to deal with the next line of the "original_regions" - current_region.append( '{}\t{}\t{}\n'.format(chr_i, breakpoint_i, end_i) ) + current_region.append("{}\t{}\t{}\n".format(chr_i, breakpoint_i, end_i)) current_size = end_i - breakpoint_i - + # The final region to write out: - ithOutName = '{}{}{}.{}'.format(out_directory, os.sep, ith_split, out_basename) - outfilesWritten.append( ithOutName ) - with open( ithOutName, 'w' ) as ith_out: + ithOutName = "{}{}{}.{}".format(out_directory, os.sep, ith_split, out_basename) + outfilesWritten.append(ithOutName) + with open(ithOutName, "w") as ith_out: for line_i in current_region: - ith_out.write( line_i ) - - return outfilesWritten - - + ith_out.write(line_i) + return outfilesWritten def split_vcf_file(vcf_file, work_dir=os.curdir, num=1): - num_lines = 0 with open_textfile(vcf_file) as vcf: line_i = vcf.readline() header = [] - while line_i.startswith('#'): + while line_i.startswith("#"): header.append(line_i) line_i = vcf.readline() while line_i: num_lines += 1 line_i = vcf.readline() - lines_per_file = math.ceil( float(num_lines)/num ) + lines_per_file = math.ceil(float(num_lines) / num) with open_textfile(vcf_file) as vcf: - - outnames = [ os.curdir + os.sep + str(i) + '_' + re.sub(r'.vcf(.gz)?', '', os.path.basename(vcf_file)) + '.vcf' for i in range(num) ] - outhandles = [open(i, 'w') for i in outnames] + outnames = [ + os.curdir + + os.sep + + str(i) + + "_" + + re.sub(r".vcf(.gz)?", "", os.path.basename(vcf_file)) + + ".vcf" + for i in range(num) + ] + outhandles = [open(i, "w") for i in outnames] [write_header(header, i) for i in outhandles] - + line_i = vcf.readline() - - while line_i.startswith('#'): + + while line_i.startswith("#"): line_i = vcf.readline() - - while line_i: + while line_i: i = 0 n = 0 while line_i: - - outhandles[n].write( line_i ) + outhandles[n].write(line_i) i += 1 - + if i == lines_per_file: i = 0 n += 1 - + line_i = vcf.readline() [i.close() for i in outhandles] @@ -203,8 +225,6 @@ def split_vcf_file(vcf_file, work_dir=os.curdir, num=1): return outnames - - -if __name__ == '__main__': +if __name__ == "__main__": infile, outfiles, num = run() split(infile, outfiles, num) diff --git a/bin/strelka_convert.py b/bin/strelka_convert.py index 4b5d4fe..46cca68 100755 --- a/bin/strelka_convert.py +++ b/bin/strelka_convert.py @@ -1,9 +1,10 @@ #!/usr/bin/env python import numpy as np import cyvcf2 -import sys -import gzip -import os +import sys +import gzip +import os + ##Adapted from https://github.com/bcbio/bcbio-nextgen/blob/72f42706faa5cfe4f0680119bf148e0bdf2b78ba/bcbio/variation/strelka2.py#L30 def _tumor_normal_genotypes(ref, alt, info): @@ -23,6 +24,7 @@ def _tumor_normal_genotypes(ref, alt, info): fname, coords: not currently used, for debugging purposes """ known_names = set(["het", "hom", "ref", "conflict"]) + def name_to_gt(val): if val.lower() == "het": return "0/1" @@ -34,6 +36,7 @@ def name_to_gt(val): # Non-standard representations, het is our best imperfect representation # print(fname, coords, ref, alt, info, val) return "0/1" + def alleles_to_gt(val): gt_indices = {gt.upper(): i for i, gt in enumerate([ref] + alt)} tumor_gts = [gt_indices[x.upper()] for x in val if x in gt_indices] @@ -47,6 +50,7 @@ def alleles_to_gt(val): else: tumor_gt = name_to_gt(val) return tumor_gt + nt_val = [x.split("=")[-1] for x in info if x.startswith("NT=")][0] normal_gt = name_to_gt(nt_val) sgt_val = [x.split("=")[-1] for x in info if x.startswith("SGT=")] @@ -58,7 +62,7 @@ def alleles_to_gt(val): return normal_gt, tumor_gt -def _af_annotate_and_filter(in_file,out_file): +def _af_annotate_and_filter(in_file, out_file): """Populating FORMAT/AF, and dropping variants with AF\n') - out_handle.write(line) - elif line.startswith("#CHROM"): - assert added_gt - out_handle.write(line) - elif line.startswith("#"): - out_handle.write(line) - else: - parts = line.rstrip().split("\t") - normal_gt,tumor_gt = _tumor_normal_genotypes(parts[3], parts[4].split(","), - parts[7].split(";")) - parts[8] = "GT:%s" % parts[8] - parts[9] = "%s:%s" % (normal_gt, parts[9]) - parts[10] = "%s:%s" % (tumor_gt, parts[10]) - out_handle.write("\t".join(parts) + "\n") + ##Set genotypes now + out_file = os.path.basename(in_file).replace(".vcf.gz", "-fixed.vcf") + # open_fn = gzip.open if is_gzipped(in_file) else open + with gzip.open(in_file, "rt") as in_handle: + with open(out_file, "wt") as out_handle: + added_gt = False + for line in in_handle: + if line.startswith("##FORMAT") and not added_gt: + added_gt = True + out_handle.write( + '##FORMAT=\n' + ) + out_handle.write(line) + elif line.startswith("#CHROM"): + assert added_gt + out_handle.write(line) + elif line.startswith("#"): + out_handle.write(line) + else: + parts = line.rstrip().split("\t") + normal_gt, tumor_gt = _tumor_normal_genotypes( + parts[3], parts[4].split(","), parts[7].split(";") + ) + parts[8] = "GT:%s" % parts[8] + parts[9] = "%s:%s" % (normal_gt, parts[9]) + parts[10] = "%s:%s" % (tumor_gt, parts[10]) + out_handle.write("\t".join(parts) + "\n") + -if __name__ == '__main__': +if __name__ == "__main__": filename = sys.argv[1] outname = sys.argv[2] _add_gt(filename) newname = os.path.basename(filename).replace(".vcf.gz", "-fixed.vcf") - _af_annotate_and_filter(newname,outname) + _af_annotate_and_filter(newname, outname) os.remove(newname) - diff --git a/conf/ci_stub.config b/conf/ci_stub.config index 6d16a2f..7e50fc7 100644 --- a/conf/ci_stub.config +++ b/conf/ci_stub.config @@ -29,4 +29,4 @@ process { scratch = false } -stubRun = true \ No newline at end of file +stubRun = true diff --git a/conf/fastq_screen.conf b/conf/fastq_screen.conf index e2c5be8..0d5cbf3 100644 --- a/conf/fastq_screen.conf +++ b/conf/fastq_screen.conf @@ -3,7 +3,7 @@ ########### ## Bowtie # ########### -## If the bowtie binary is not in your PATH then you can +## If the bowtie binary is not in your PATH then you can ## set this value to tell the program where to find it. ## Uncomment the line below and set the appropriate location ## @@ -39,9 +39,9 @@ THREADS 8 ## This section allows you to configure multiple databases ## to search against in your screen. For each database ## you need to provide a database name (which can't contain -## spaces) and the location of the bowtie indices which +## spaces) and the location of the bowtie indices which ## you created for that database. -## +## ## The default entries shown below are only suggested examples ## you can add as many DATABASE sections as you like, and you ## can comment out or remove as many of the existing entries @@ -94,4 +94,4 @@ DATABASE Vectors /fdb/fastq_screen/FastQ_Screen_Genomes/Vectors/Vectors ############ ## Adapters - sequence derived from the FastQC contaminats file ## www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ -DATABASE Adapters /fdb/fastq_screen/FastQ_Screen_Genomes/Adapters/Contaminants \ No newline at end of file +DATABASE Adapters /fdb/fastq_screen/FastQ_Screen_Genomes/Adapters/Contaminants diff --git a/conf/frce.config b/conf/frce.config index bd0614c..37db7b2 100644 --- a/conf/frce.config +++ b/conf/frce.config @@ -28,4 +28,4 @@ process { // for running pipeline on group sharing data directory, this can avoid inconsistent files timestamps cache = 'lenient' -} \ No newline at end of file +} diff --git a/conf/genomes.config b/conf/genomes.config index 07660c1..cee08bd 100644 --- a/conf/genomes.config +++ b/conf/genomes.config @@ -7,14 +7,14 @@ params { genomedict= "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/bwamem/GRCh38.d1.vd1.dict" wgsregion = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list" intervals= "${projectDir}/assets/hg38_v0_wgs_calling_regions.hg38.bed" - fullinterval = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/genomes/hg38_main.bed" - INDELREF = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" //ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" + fullinterval = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/genomes/hg38_main.bed" + INDELREF = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" //ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" KNOWNINDELS = "-known /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -known /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" KNOWNRECAL = '--known-sites /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/dbsnp_138.hg38.vcf.gz --known-sites /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz' dbsnp = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/GATK_GRCh38.d1.vd1/dbsnp_138.hg38.vcf.gz" - gnomad = '--germline-resource /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/GATK_GRCh38.d1.vd1/somatic-hg38-af-only-gnomad.hg38.vcf.gz' - tonly_PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/gatk4_mutect2_4136_pon.vcf.gz" - PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/MuTect2.PON.5210.vcf.gz" + gnomad = '--germline-resource /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/GATK_GRCh38.d1.vd1/somatic-hg38-af-only-gnomad.hg38.vcf.gz' + tonly_PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/gatk4_mutect2_4136_pon.vcf.gz" + PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/MuTect2.PON.5210.vcf.gz" germline_resource = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/GATK_GRCh38.d1.vd1/somatic-hg38-af-only-gnomad.hg38.vcf.gz" KRAKENBACDB = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/kraken/20180907_standard_kraken2" snpeff_genome = "GRCh38.86" @@ -60,15 +60,15 @@ params { //CNVKIT REFFLAT = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/cnvkit/refFlat.txt" ACCESS = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/cnvkit/access-10kb.hg38.bed" - } - + } + 'hg19' { genome = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg19/genome/bwamem2/hg19.with_extra.fa" genomefai = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg19/genome/bwamem2/hg19.with_extra.fa.fai" bwagenome = "/data/CCBR_Pipeliner/db/PipeDB/lib/hs37d5.fa" genomedict= "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg19/genome/bwamem2/hg19.with_extra.dict" intervals= "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg19/hg19_noblacklist_maincontig.bed" - INDELREF = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg19/GATKbundle/Mills_and_1000G_gold_standard.indels.hg19.vcf.gz" + INDELREF = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg19/GATKbundle/Mills_and_1000G_gold_standard.indels.hg19.vcf.gz" KNOWNINDELS = "-known /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg19/GATKbundle/Mills_and_1000G_gold_standard.indels.hg19.vcf.gz -known /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg19/GATKbundle/1000G_phase1.indels.hg19.vcf.gz" KNOWNRECAL = '--known-sites /fdb/GATK_resource_bundle/hg19-2.8/dbsnp_138.hg19.excluding_sites_after_129.vcf.gz --known-sites /data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg19/GATKbundle/Mills_and_1000G_gold_standard.indels.hg19.vcf.gz' dbsnp = "/fdb/GATK_resource_bundle/hg19-2.8/dbsnp_138.hg19.vcf.gz" @@ -120,8 +120,8 @@ params { } 'mm10' { - genome = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/mm10/genome/bwamem2index/genome.fa" - genomefai = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/mm10/genome/bwamem2index/genome.fa.fai" + genome = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/mm10/genome/bwamem2index/genome.fa" + genomefai = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/mm10/genome/bwamem2index/genome.fa.fai" bwagenome= "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/mm10/genome/bwaindex/genome.fa" genomedict= "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/mm10/genome/bwamem2index/genome.dict" intervals="/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/mm10/genome/bwamem2index/mm10_wgsregions.bed" @@ -167,13 +167,13 @@ params { genomedict= "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/genome/Homo_sapiens_assembly38.dict" wgsregion = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list" intervals= "${projectDir}/assets/hg38_v0_wgs_calling_regions.hg38.bed" - fullinterval = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/genomes/hg38_main.bed" - INDELREF = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" //ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" + fullinterval = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/genomes/hg38_main.bed" + INDELREF = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" //ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" KNOWNINDELS = "-known /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -known /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" KNOWNRECAL = '--known-sites /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/dbsnp_138.hg38.vcf.gz --known-sites /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz' dbsnp = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/dbsnp_138.hg38.vcf.gz" - gnomad = '--germline-resource /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GNOMAD/somatic-hg38-af-only-gnomad.hg38.vcf.gz' - tonly_PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/gatk4_mutect2_4136_pon.vcf.gz" + gnomad = '--germline-resource /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GNOMAD/somatic-hg38-af-only-gnomad.hg38.vcf.gz' + tonly_PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/gatk4_mutect2_4136_pon.vcf.gz" PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/MuTect2.PON.5210.vcf.gz" germline_resource = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GNOMAD/somatic-hg38-af-only-gnomad.hg38.vcf.gz" KRAKENBACDB = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/kraken/20180907_standard_kraken2" @@ -215,7 +215,7 @@ params { //CNVKIT REFFLAT = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/cnvkit/refFlat.txt" ACCESS = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/cnvkit/access-10kb.hg38.bed" - } + } 'hg38_noalt' { genome = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/genome_noalt/bwamem2/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta" genomefai = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/genome_noalt/bwamem2/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.fai" @@ -223,14 +223,14 @@ params { bwagenome= "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/genome_noalt/bwa/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta" wgsregion = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/resources_broad_hg38_v0_wgs_calling_regions.hg38.interval_list" intervals= "${projectDir}/assets/hg38_v0_wgs_calling_regions.hg38.bed" - fullinterval = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/genomes/hg38_main.bed" - INDELREF = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" //ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" + fullinterval = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/genomes/hg38_main.bed" + INDELREF = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" //ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" KNOWNINDELS = "-known /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -known /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz" KNOWNRECAL = '--known-sites /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/dbsnp_138.hg38.vcf.gz --known-sites /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/ALL.wgs.1000G_phase3.GRCh38.ncbi_remapper.20150424.shapeit2_indels.vcf.gz' dbsnp = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GATK_resource_bundle/dbsnp_138.hg38.vcf.gz" - gnomad = '--germline-resource /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GNOMAD/somatic-hg38-af-only-gnomad.hg38.vcf.gz' - tonly_PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/gatk4_mutect2_4136_pon.vcf.gz" - PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/MuTect2.PON.5210.vcf.gz" + gnomad = '--germline-resource /data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GNOMAD/somatic-hg38-af-only-gnomad.hg38.vcf.gz' + tonly_PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/gatk4_mutect2_4136_pon.vcf.gz" + PON = "/data/CCBR_Pipeliner/Pipelines/LOGAN/resources/hg38/PON/MuTect2.PON.5210.vcf.gz" germline_resource = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/GNOMAD/somatic-hg38-af-only-gnomad.hg38.vcf.gz" KRAKENBACDB = "/data/CCBR_Pipeliner/Pipelines/XAVIER/resources/hg38/kraken/20180907_standard_kraken2" snpeff_genome = "GRCh38.86" diff --git a/docker/annotate_cnvsv/Dockerfile b/docker/annotate_cnvsv/Dockerfile index 4a3b4c5..d5d901b 100644 --- a/docker/annotate_cnvsv/Dockerfile +++ b/docker/annotate_cnvsv/Dockerfile @@ -10,16 +10,16 @@ ENV REPONAME=${REPONAME} LABEL maintainer -# Create Container filesystem specific -# working directory and opt directories +# Create Container filesystem specific +# working directory and opt directories RUN apt-get update \ && apt-get -y upgrade \ && DEBIAN_FRONTEND=noninteractive apt-get install -y \ tclsh -WORKDIR /opt2 +WORKDIR /opt2 -###Create AnnotSV +###Create AnnotSV RUN wget https://github.com/lgmgeo/AnnotSV/archive/refs/tags/v3.4.2.tar.gz \ && tar -xvzf /opt2/v3.4.2.tar.gz \ && rm /opt2/v3.4.2.tar.gz @@ -29,12 +29,12 @@ ENV PATH="/opt2/AnnotSV-3.4.2/bin:$PATH" ##Update the resources for ClassifyCNV RUN wget https://github.com/Genotek/ClassifyCNV/archive/refs/tags/v1.1.1.tar.gz \ && tar -xvzf /opt2/v1.1.1.tar.gz \ - && rm /opt2/v1.1.1.tar.gz - #&& chmod a+rx /opt2/ClassifyCNV-1.1.1/update_clingen.sh + && rm /opt2/v1.1.1.tar.gz + #&& chmod a+rx /opt2/ClassifyCNV-1.1.1/update_clingen.sh ENV PATH="/opt2/ClassifyCNV-1.1.1/:$PATH" #RUN update_clingen.sh -##Survivor +##Survivor RUN wget https://github.com/fritzsedlazeck/SURVIVOR/archive/refs/tags/v1.0.6.tar.gz \ && tar -xvzf /opt2/v1.0.6.tar.gz \ && rm /opt2/v1.0.6.tar.gz \ @@ -44,4 +44,4 @@ ENV PATH="/opt2/SURVIVOR-1.0.6/Debug:$PATH" COPY Dockerfile /opt2/Dockerfile_${REPONAME}.${BUILD_TAG} -RUN chmod a+r /opt2/Dockerfile_${REPONAME}.${BUILD_TAG} \ No newline at end of file +RUN chmod a+r /opt2/Dockerfile_${REPONAME}.${BUILD_TAG} diff --git a/docker/annotate_cnvsv/build.sh b/docker/annotate_cnvsv/build.sh index e98686f..a8b2dfe 100644 --- a/docker/annotate_cnvsv/build.sh +++ b/docker/annotate_cnvsv/build.sh @@ -1,10 +1,8 @@ ##BUILD cnv/sv -docker build --platform linux/amd64 --tag ccbr_annotate_cnvsv:v0.0.2 -f Dockerfile . +docker build --platform linux/amd64 --tag ccbr_annotate_cnvsv:v0.0.2 -f Dockerfile . docker tag ccbr_annotate_cnvsv:v0.0.2 dnousome/ccbr_annotate_cnvsv:v0.0.2 docker tag ccbr_annotate_cnvsv:v0.0.2 dnousome/ccbr_annotate_cnvsv:latest docker push dnousome/ccbr_annotate_cnvsv:v0.0.2 docker push dnousome/ccbr_annotate_cnvsv:latest - - diff --git a/docker/cnv/Dockerfile b/docker/cnv/Dockerfile index 340fe72..fcdb526 100644 --- a/docker/cnv/Dockerfile +++ b/docker/cnv/Dockerfile @@ -2,16 +2,16 @@ FROM --platform=linux/amd64 nciccbr/ccbr_ubuntu_base_20.04:v6 LABEL maintainer= -WORKDIR /opt2 +WORKDIR /opt2 -RUN apt-get update +RUN apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \ gnupg \ dirmngr \ ca-certificates \ apt-transport-https \ - software-properties-common + software-properties-common #Install R for ASCAT RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \ @@ -19,17 +19,17 @@ RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD5 && add-apt-repository --enable-source --yes 'ppa:c2d4u.team/c2d4u4.0+' \ && apt-get -y install r-base r-base-core r-recommended r-base-dev \ && apt-get -y install libcurl4-openssl-dev libssl-dev libboost-dev libxml2-dev \ - && apt-get -y install r-cran-biocmanager r-cran-devtools r-bioc-genomicranges + && apt-get -y install r-cran-biocmanager r-cran-devtools r-bioc-genomicranges -#ASCAT -RUN Rscript -e 'devtools::install_github("VanLoo-lab/ascat/ASCAT")' +#ASCAT +RUN Rscript -e 'devtools::install_github("VanLoo-lab/ascat/ASCAT")' RUN Rscript -e 'install.packages(c("argparse"), repos="http://cran.r-project.org")' RUN Rscript -e 'BiocManager::install("DNAcopy")' #Allelecounter RUN git clone https://github.com/cancerit/alleleCount ac \ && cd ac \ -&& ./setup.sh /opt2/alleleCount +&& ./setup.sh /opt2/alleleCount ENV PATH="/opt2/alleleCount/bin:$PATH" ENV LD_LIBARY_PATH="/opt2/alleleCount/lib" @@ -42,5 +42,3 @@ RUN git clone https://github.com/etal/cnvkit \ ##Clean up folders WORKDIR /opt2 RUN rm -R ac - - diff --git a/docker/cnv/build.sh b/docker/cnv/build.sh index bd707fb..a42659b 100644 --- a/docker/cnv/build.sh +++ b/docker/cnv/build.sh @@ -9,4 +9,4 @@ docker push dnousome/ccbr_logan_cnv:latest #singularity pull dnousome-ccbr_logan_cnv-v0.0.1.img docker://dnousome/ccbr_logan_cnv:v0.0.1 -#docker run -it ccbr_logan_cnv:v0.0.1 +#docker run -it ccbr_logan_cnv:v0.0.1 diff --git a/docker/ffpe/Dockerfile b/docker/ffpe/Dockerfile index 8ef615d..88bf86b 100644 --- a/docker/ffpe/Dockerfile +++ b/docker/ffpe/Dockerfile @@ -10,36 +10,36 @@ ENV REPONAME=${REPONAME} LABEL maintainer -# Create Container filesystem specific -# working directory and opt directories -WORKDIR /opt2 +# Create Container filesystem specific +# working directory and opt directories +WORKDIR /opt2 # This section installs system packages required for your project # If you need extra system packages add them here. # python/3.8.0 and python/2.7.16 (strelka and manta) RUN apt-get update \ - && apt-get -y upgrade + && apt-get -y upgrade -# Common bioinformatics tools -# bwa/0.7.17-4 bowtie/1.2.3 bowtie2/2.3.5.1 -# bedtools/2.27.1 bedops/2.4.37 -# vcftools/0.1.16 -# Previous tools already installed -# tabix/1.10.2 +# Common bioinformatics tools +# bwa/0.7.17-4 bowtie/1.2.3 bowtie2/2.3.5.1 +# bedtools/2.27.1 bedops/2.4.37 +# vcftools/0.1.16 +# Previous tools already installed +# tabix/1.10.2 # Install SOB -RUN wget https://github.com/mikdio/SOBDetector/releases/download/v1.0.4/SOBDetector_v1.0.4.jar +RUN wget https://github.com/mikdio/SOBDetector/releases/download/v1.0.4/SOBDetector_v1.0.4.jar ENV SOB_JAR="/opt2/SOBDetector_v1.0.4.jar" WORKDIR /data2 # Clean-up step to reduce size -# and install GNU awk to calculate mean and standard -# deviation, ensures backward compatibility with +# and install GNU awk to calculate mean and standard +# deviation, ensures backward compatibility with # biowulf installation of awk is a pointer to gawk, -# and install pandoc (>= 1.12.3 required for Rmarkdown) +# and install pandoc (>= 1.12.3 required for Rmarkdown) RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \ gawk \ pandoc \ && apt-get clean && apt-get purge \ - && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \ No newline at end of file + && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* diff --git a/docker/ffpe/build.sh b/docker/ffpe/build.sh index 806887c..afede26 100644 --- a/docker/ffpe/build.sh +++ b/docker/ffpe/build.sh @@ -1,5 +1,5 @@ #Build.sh -docker build --platform=linux/amd64 --tag ccbr_logan_ffpe:v0.0.1 -f Dockerfile . +docker build --platform=linux/amd64 --tag ccbr_logan_ffpe:v0.0.1 -f Dockerfile . #Test Docker Build #docker run -it ccbr_logan_ffpe:v0.0.1 @@ -10,4 +10,3 @@ docker tag ccbr_logan_ffpe:v0.0.1 dnousome/ccbr_logan_ffpe:latest docker push dnousome/ccbr_logan_ffpe:v0.0.1 docker push dnousome/ccbr_logan_ffpe:latest - diff --git a/docker/lofreq/Dockerfile b/docker/lofreq/Dockerfile index 15e56f5..1f94cc0 100644 --- a/docker/lofreq/Dockerfile +++ b/docker/lofreq/Dockerfile @@ -10,8 +10,8 @@ ENV REPONAME=${REPONAME} LABEL maintainer -# Create Container filesystem specific -# working directory and opt directories +# Create Container filesystem specific +# working directory and opt directories # This section installs system packages required for your project # If you need extra system packages add them here. @@ -31,11 +31,11 @@ RUN apt-get update \ libncurses5-dev \ libssl-dev \ python3-dev \ - zlib1g-dev + zlib1g-dev RUN ln -s /usr/bin/python3.10 /usr/bin/python -WORKDIR /opt2 +WORKDIR /opt2 ARG htsversion=1.19 @@ -54,6 +54,6 @@ RUN git clone https://github.com/CSB5/lofreq \ && ./bootstrap \ && ./configure --with-htslib=/usr/local \ && make \ - && make install + && make install -ENV LD_LIBRARY_PATH /usr/local/lib:$LD_LIBRARY_PATH \ No newline at end of file +ENV LD_LIBRARY_PATH /usr/local/lib:$LD_LIBRARY_PATH diff --git a/docker/lofreq/build.sh b/docker/lofreq/build.sh index 9df32f9..b83efba 100644 --- a/docker/lofreq/build.sh +++ b/docker/lofreq/build.sh @@ -1,5 +1,5 @@ -docker build --platform linux/amd64 --tag ccbr_lofreq:v0.0.1 -f Dockerfile . +docker build --platform linux/amd64 --tag ccbr_lofreq:v0.0.1 -f Dockerfile . docker tag ccbr_lofreq:v0.0.1 dnousome/ccbr_lofreq:v0.0.1 docker push dnousome/ccbr_lofreq:v0.0.1 -docker push dnousome/ccbr_lofreq:latest \ No newline at end of file +docker push dnousome/ccbr_lofreq:latest diff --git a/docker/logan_base/Dockerfile b/docker/logan_base/Dockerfile index 30ef39b..f4b4d4e 100644 --- a/docker/logan_base/Dockerfile +++ b/docker/logan_base/Dockerfile @@ -10,9 +10,9 @@ ENV REPONAME=${REPONAME} LABEL maintainer -# Create Container filesystem specific -# working directory and opt directories -WORKDIR /opt2 +# Create Container filesystem specific +# working directory and opt directories +WORKDIR /opt2 # This section installs system packages required for your project # If you need extra system packages add them here. @@ -21,22 +21,22 @@ RUN apt-get update \ && apt-get -y upgrade \ && DEBIAN_FRONTEND=noninteractive apt-get install -y \ bc \ - openjdk-17-jdk - -# Common bioinformatics tools -# bwa/0.7.17-4 bowtie/1.2.3 bowtie2/2.3.5.1 -# bedtools/2.27.1 bedops/2.4.37 -# vcftools/0.1.16 -# Previous tools already installed -# tabix/1.10.2 + openjdk-17-jdk + +# Common bioinformatics tools +# bwa/0.7.17-4 bowtie/1.2.3 bowtie2/2.3.5.1 +# bedtools/2.27.1 bedops/2.4.37 +# vcftools/0.1.16 +# Previous tools already installed +# tabix/1.10.2 RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \ tabix \ libhts-dev - + # Install BWA-MEM2 v2.2.1 RUN wget https://github.com/bwa-mem2/bwa-mem2/releases/download/v2.2.1/bwa-mem2-2.2.1_x64-linux.tar.bz2 \ && tar -xvjf /opt2/bwa-mem2-2.2.1_x64-linux.tar.bz2 \ - && rm /opt2/bwa-mem2-2.2.1_x64-linux.tar.bz2 + && rm /opt2/bwa-mem2-2.2.1_x64-linux.tar.bz2 ENV PATH="/opt2/bwa-mem2-2.2.1_x64-linux:$PATH" # samtools/1.10 # bcftools/1.10.2 are dated in package @@ -70,7 +70,7 @@ RUN wget https://github.com/biod/sambamba/releases/download/v0.8.1/sambamba-0.8. && chmod a+rx /opt2/sambamba # Install GATK4 (GATK/4.6.1.0) -# Requires Java17 +# Requires Java17 RUN wget https://github.com/broadinstitute/gatk/releases/download/4.6.1.0/gatk-4.6.1.0.zip \ && unzip /opt2/gatk-4.6.1.0.zip \ && rm /opt2/gatk-4.6.1.0.zip \ @@ -79,11 +79,11 @@ ENV PATH="/opt2/gatk-4.6.1.0:$PATH" # Picard RUN mkdir picard \ - && wget -O picard/picard.jar https://github.com/broadinstitute/picard/releases/download/3.2.0/picard.jar + && wget -O picard/picard.jar https://github.com/broadinstitute/picard/releases/download/3.2.0/picard.jar ENV PICARDJARPATH="/opt2/picard" #Use DISCVRSeq For CombineVariants Replacement -#RUN wget https://github.com/BimberLab/DISCVRSeq/releases/download/1.3.62/DISCVRSeq-1.3.62.jar +#RUN wget https://github.com/BimberLab/DISCVRSeq/releases/download/1.3.62/DISCVRSeq-1.3.62.jar #ENV DISCVRSeq_JAR="/opt2/DISCVRSeq-1.3.62.jar" # Install last release of GATK3 (GATK/3.8-1) @@ -111,8 +111,8 @@ RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD5 && add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' \ && add-apt-repository --enable-source --yes 'ppa:c2d4u.team/c2d4u4.0+' \ && apt-get -y install r-base r-base-core r-recommended r-base-dev \ - && apt-get -y install libcurl4-openssl-dev libssl-dev libboost-dev libxml2-dev -RUN apt-get -y install r-cran-tidyverse r-cran-plyr r-cran-knitr + && apt-get -y install libcurl4-openssl-dev libssl-dev libboost-dev libxml2-dev +RUN apt-get -y install r-cran-tidyverse r-cran-plyr r-cran-knitr RUN apt-get -y install r-cran-plotly r-cran-rcolorbrewer r-cran-htmlwidgets r-cran-shiny r-cran-rmarkdown r-cran-crosstalk r-cran-dt r-cran-reshape2 r-cran-circlize r-cran-viridis r-cran-gridextra r-cran-rcurl r-cran-cowplot RUN apt-get -y install r-cran-biocmanager r-cran-devtools r-cran-snow r-bioc-limma r-bioc-edger r-bioc-complexheatmap r-bioc-genomicranges r-bioc-summarizedexperiment r-bioc-biocparallel RUN Rscript -e 'install.packages(c("argparse"), repos="http://cran.r-project.org")' @@ -141,7 +141,7 @@ RUN wget https://github.com/BoevaLab/FREEC/archive/refs/tags/v11.6.zip \ && cd /opt2/FREEC-11.6/src/ \ && make ENV PATH="/opt2/FREEC-11.6/src:$PATH" -WORKDIR /opt2 +WORKDIR /opt2 # Install VarScan/v2.4.4 @@ -203,12 +203,12 @@ RUN wget http://opengene.org/fastp/fastp.0.24.0 \ && chmod a+x fastp/fastp ENV PATH="/opt2/fastp:$PATH" -#ASCAT +#ASCAT RUN Rscript -e 'devtools::install_github("VanLoo-lab/ascat/ASCAT")' # SvABA RUN wget -O svaba_1.2.0 https://github.com/walaj/svaba/releases/download/v1.2.0/svaba \ - && mkdir svaba \ + && mkdir svaba \ && mv svaba_1.2.0 svaba/svaba \ && chmod a+x svaba/svaba ENV PATH="/opt2/svaba:$PATH" @@ -247,12 +247,12 @@ COPY argparse.bash /opt2 WORKDIR /data2 # Clean-up step to reduce size -# and install GNU awk to calculate mean and standard -# deviation, ensures backward compatibility with +# and install GNU awk to calculate mean and standard +# deviation, ensures backward compatibility with # biowulf installation of awk is a pointer to gawk, -# and install pandoc (>= 1.12.3 required for Rmarkdown) +# and install pandoc (>= 1.12.3 required for Rmarkdown) RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \ gawk \ pandoc \ && apt-get clean && apt-get purge \ - && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \ No newline at end of file + && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* diff --git a/docker/logan_base/argparse.bash b/docker/logan_base/argparse.bash index d6d317d..a9772f6 100644 --- a/docker/logan_base/argparse.bash +++ b/docker/logan_base/argparse.bash @@ -71,4 +71,4 @@ EOF echo "INFILE: \${INFILE}" echo "OUTFILE: \${OUTFILE}" FOO -fi \ No newline at end of file +fi diff --git a/docker/logan_base/build.sh b/docker/logan_base/build.sh index 1c41e00..d79a5da 100644 --- a/docker/logan_base/build.sh +++ b/docker/logan_base/build.sh @@ -1,6 +1,6 @@ #Build image -docker build --platform linux/amd64 --tag ccbr_logan_base:v0.3.9 -f Dockerfile . +docker build --platform linux/amd64 --tag ccbr_logan_base:v0.3.9 -f Dockerfile . docker tag ccbr_logan_base:v0.3.9 dnousome/ccbr_logan_base:v0.3.9 docker tag ccbr_logan_base:v0.3.9 dnousome/ccbr_logan_base:latest diff --git a/docker/qc/Dockerfile b/docker/qc/Dockerfile index bd3515d..721bac6 100644 --- a/docker/qc/Dockerfile +++ b/docker/qc/Dockerfile @@ -10,9 +10,9 @@ ENV REPONAME=${REPONAME} LABEL maintainer= -# Create Container filesystem specific -# working directory and opt directories -WORKDIR /opt2 +# Create Container filesystem specific +# working directory and opt directories +WORKDIR /opt2 # This section installs system packages required for your project # If you need extra system packages add them here. @@ -21,7 +21,7 @@ RUN apt-get update \ && apt-get -y upgrade \ && DEBIAN_FRONTEND=noninteractive apt-get install -y \ bc \ - libgd-perl + libgd-perl #FASTQ Screen 'fastq_screen/0.15.3:bowtie/2-2.5.3' RUN yes | perl -MCPAN -e "install GD" @@ -29,7 +29,7 @@ RUN yes | perl -MCPAN -e "install GD::Graph" RUN wget https://github.com/StevenWingett/FastQ-Screen/archive/refs/tags/v0.15.3.tar.gz \ && tar -xvzf /opt2/v0.15.3.tar.gz \ - && rm /opt2/v0.15.3.tar.gz + && rm /opt2/v0.15.3.tar.gz ENV PATH="/opt2/FastQ-Screen-0.15.3:$PATH" ##FASTQC 'fastqc/0.12.1' @@ -39,7 +39,7 @@ RUN wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.12. ENV PATH="/opt2/FastQC:$PATH" ##QUALIMAP 'qualimap/2.3' -RUN wget https://bitbucket.org/kokonech/qualimap/downloads/qualimap_v2.3.zip \ +RUN wget https://bitbucket.org/kokonech/qualimap/downloads/qualimap_v2.3.zip \ && unzip qualimap_v2.3.zip \ && rm qualimap_v2.3.zip ENV PATH="/opt2/qualimap_v2.3:$PATH" @@ -70,7 +70,7 @@ RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD5 && add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' \ && add-apt-repository --enable-source --yes 'ppa:c2d4u.team/c2d4u4.0+' \ && apt-get -y install r-base r-base-core r-recommended r-base-dev \ - && apt-get -y install libcurl4-openssl-dev libssl-dev libboost-dev libxml2-dev + && apt-get -y install libcurl4-openssl-dev libssl-dev libboost-dev libxml2-dev RUN apt-get -y install r-cran-tidyverse r-cran-plyr RUN apt-get -y install r-cran-plotly r-cran-htmlwidgets r-cran-tidyr @@ -88,20 +88,19 @@ RUN wget https://github.com/marbl/Krona/releases/download/v2.8.1/KronaTools-2.8. && cd KronaTools-2.8.1 \ && ./install.pl --prefix . \ && ./updateTaxonomy.sh \ - && chmod 775 bin/ -R \ + && chmod 775 bin/ -R \ && chmod 775 lib/ -R \ && chmod 775 src/ -R \ && chmod 775 scripts/ -R \ && cd /opt2 \ && rm KronaTools-2.8.1.tar ENV PATH="/opt2/KronaTools-2.8.1/bin:$PATH" - + # Clean-up step to reduce size -# and install GNU awk to calculate mean and standard -# deviation, ensures backward compatibility with +# and install GNU awk to calculate mean and standard +# deviation, ensures backward compatibility with # biowulf installation of awk is a pointer to gawk, RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \ gawk \ && apt-get clean && apt-get purge \ && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* - diff --git a/docker/qc/build.sh b/docker/qc/build.sh index 815abaa..8e0065c 100644 --- a/docker/qc/build.sh +++ b/docker/qc/build.sh @@ -1,5 +1,5 @@ #Build.sh -docker build --platform=linux/amd64 --tag ccbr_logan_qc:v0.0.2 -f Dockerfile . +docker build --platform=linux/amd64 --tag ccbr_logan_qc:v0.0.2 -f Dockerfile . docker tag ccbr_logan_qc:v0.0.2 dnousome/ccbr_logan_qc:v0.0.2 docker tag ccbr_logan_qc:v0.0.2 dnousome/ccbr_logan_qc:latest @@ -8,4 +8,4 @@ docker push dnousome/ccbr_logan_qc:v0.0.2 docker push dnousome/ccbr_logan_qc:latest ## -#docker run -it ccbr_logan_qc:v0.0.2 \ No newline at end of file +#docker run -it ccbr_logan_qc:v0.0.2 diff --git a/docker/sv/Dockerfile b/docker/sv/Dockerfile index df38225..5c58f4d 100644 --- a/docker/sv/Dockerfile +++ b/docker/sv/Dockerfile @@ -2,9 +2,9 @@ FROM --platform=linux/amd64 nciccbr/ccbr_ubuntu_base_20.04:v6 LABEL maintainer= -WORKDIR /opt2 +WORKDIR /opt2 -RUN apt-get update +RUN apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \ gnupg \ @@ -12,17 +12,17 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get install -y \ ca-certificates \ apt-transport-https \ software-properties-common \ - openjdk-17-jdk + openjdk-17-jdk -# Create Container filesystem specific -# working directory and opt directories +# Create Container filesystem specific +# working directory and opt directories ##Install R RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \ && add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' \ && add-apt-repository --enable-source --yes 'ppa:c2d4u.team/c2d4u4.0+' \ && apt-get -y install r-base r-base-core r-recommended r-base-dev \ - && apt-get -y install libcurl4-openssl-dev libssl-dev libboost-dev libxml2-dev + && apt-get -y install libcurl4-openssl-dev libssl-dev libboost-dev libxml2-dev ENV PATH="/usr/bin/Rscript:$PATH" RUN wget https://github.com/samtools/htslib/releases/download/1.20/htslib-1.20.tar.bz2 \ @@ -40,25 +40,24 @@ RUN wget https://github.com/samtools/samtools/releases/download/1.20/samtools-1. ENV PATH="/opt2/samtools-1.20:$PATH" #Grab GRIDSS -RUN mkdir gridss +RUN mkdir gridss WORKDIR /opt2/gridss RUN wget https://github.com/PapenfussLab/gridss/releases/download/v2.13.2/gridss-2.13.2-gridss-jar-with-dependencies.jar \ && wget https://github.com/PapenfussLab/gridss/releases/download/v2.13.2/gridss \ && wget https://github.com/PapenfussLab/gridss/releases/download/v2.13.2/gridss.config.R \ && wget https://github.com/PapenfussLab/gridss/releases/download/v2.13.2/libgridss.R \ - && wget https://github.com/PapenfussLab/gridss/releases/download/v2.13.2/gridss_somatic_filter + && wget https://github.com/PapenfussLab/gridss/releases/download/v2.13.2/gridss_somatic_filter ENV GRIDSS_JAR=/opt/gridss2/gridss-2.13.2-gridss-jar-with-dependencies.jar RUN chmod +x /opt2/gridss/* \ && chmod +x /opt2/gridss/*.R -WORKDIR /opt2 +WORKDIR /opt2 ##Add GRIPSS for SOMATIC FILTERING RUN wget https://github.com/hartwigmedical/hmftools/releases/download/gripss-v2.3.4/gripss_v2.3.4.jar \ && mkdir hmftools \ - && mv gripss_v2.3.4.jar hmftools/gripss.jar + && mv gripss_v2.3.4.jar hmftools/gripss.jar ENV PATH="/opt2/gridss:/opt2/hmftools:$PATH" - diff --git a/docker/sv/build.sh b/docker/sv/build.sh index b1cf22e..4da9714 100644 --- a/docker/sv/build.sh +++ b/docker/sv/build.sh @@ -1,5 +1,5 @@ #Build.sh -docker build --platform=linux/amd64 --tag ccbr_logan_sv:v0.0.1 -f Dockerfile . +docker build --platform=linux/amd64 --tag ccbr_logan_sv:v0.0.1 -f Dockerfile . docker tag ccbr_logan_sv:v0.0.1 dnousome/ccbr_logan_sv:v0.0.1 docker tag ccbr_logan_sv:v0.0.1 dnousome/ccbr_logan_sv:latest @@ -10,4 +10,4 @@ docker push dnousome/ccbr_logan_sv:latest ## #docker run -it ccbr_logan_sv:v0.0.1 #gridss --jar /opt2/gridss/gridss-2.13.2-gridss-jar-with-dependencies.jar \ -#--reference test.fa --output t.vcf.gz s.bam \ No newline at end of file +#--reference test.fa --output t.vcf.gz s.bam diff --git a/docs/index.md b/docs/index.md index 0e0b6b4..612c7a5 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,2 +1 @@ --8<-- "README.md" - diff --git a/docs/user-guide/pipeline.md b/docs/user-guide/pipeline.md index 1b752be..9fb7828 100644 --- a/docs/user-guide/pipeline.md +++ b/docs/user-guide/pipeline.md @@ -1,63 +1,70 @@ -# How to run LOGAN +# How to run LOGAN ## Guide ### Input Files -LOGAN supports inputs of either -1) paired end fastq files + +LOGAN supports inputs of either + +1. paired end fastq files `--fastq_input`- A glob can be used to include all FASTQ files. Like `--fastq_input "*R{1,2}.fastq.gz"`. Globbing requires quotes. -2) Pre aligned BAM files with BAI indices +2. Pre aligned BAM files with BAI indices `--bam_input`- A glob can be used to include all FASTQ files. Like `--bam_input "*.bam"`. Globbing requires quotes. -3) A sheet that indicates the sample name and either FASTQs or BAM file locations +3. A sheet that indicates the sample name and either FASTQs or BAM file locations -`--fastq_file_input`- A headerless tab delimited sheet that has the sample name, R1, and R2 file locations +`--fastq_file_input`- A headerless tab delimited sheet that has the sample name, R1, and R2 file locations Example + ```bash c130863309_TUMOR /data/nousomedr/c130863309_TUMOR.R1_001.fastq.gz /data/nousomedr/c130863309_TUMOR.R2_001.fastq.gz c130889189_PBMC /data/nousomedr/c130889189_PBMC.R1_001.fastq.gz /data/nousomedr/c130889189_PBMC.R2_001.fastq.gz ``` -`--bam_file_input` - A headerless Tab delimited sheet that has the sample name, bam, and bam index (bai) file locations +`--bam_file_input` - A headerless Tab delimited sheet that has the sample name, bam, and bam index (bai) file locations Example + ```bash c130863309_TUMOR /data/nousomedr/c130863309_TUMOR.bam /data/nousomedr/c130863309_TUMOR.bam.bai c130889189_PBMC /data/nousomedr/c130889189_PBMC.bam /data/nousomedr/c130889189_PBMC.bam.bai ``` ### Genome + `--genome` - A flag to indicate which genome to run. hg38, hg19 and mm10 are supported. Example: `--genome hg38` to run the hg38 genome -`--genome hg19` and `--genome mm10` are also supported +`--genome hg19` and `--genome mm10` are also supported + +#### hg38 has options for either -#### hg38 has options for either -`--genome hg38` - Based off the GRCh38.d1.vd1.fa which is consistent with TCGA/GDC processing pipelines +`--genome hg38` - Based off the GRCh38.d1.vd1.fa which is consistent with TCGA/GDC processing pipelines `--genome hg38_sf` - Based off the Homo_sapiens_assembly38.fasta which is derived from the Broad Institute/NCI Sequencing Facility The biggest difference between the two is that GRCh38.d1.vd1.fa includes the GCA_000001405.15_GRCh38_no_alt_analysis_set, Sequence Decoys (GenBank Accession GCA_000786075), and Virus Sequences. Homo_sapiens_assembly38.fasta has HLA specific contigs which may not be compatible with certain downstream tools. ### Operating Modes -#### 1. Paired Tumor/Normal Mode +#### 1. Paired Tumor/Normal Mode Required for Paired Tumor/Normal Mode -`--sample_sheet` In Paired mode a sample sheet must be provided with the basename of the Tumor and Normal samples. This sheet must be Tab separated with a header for Tumor and Normal. +`--sample_sheet` In Paired mode a sample sheet must be provided with the basename of the Tumor and Normal samples. This sheet must be Tab separated with a header for Tumor and Normal. Example + ```bash Tumor Normal c130863309_TUMOR c130863309_PBMC c130889189_TUMOR c130889189_PBMC ``` -#### 2. Tumor only mode +#### 2. Tumor only mode No addtional flags for sample sheet are required as all samples will be used to call variants @@ -73,9 +80,8 @@ Adding flags determines SNV (germline and/or somatic), SV, and/or CNV calling mo `--cnv` or `--copynumber`- Enables somatic CNV calling using FREEC, Sequenza, ASCAT, CNVKit, and Purple (hg19/hg38 only) - - #### Optional Arguments + `--callers` - Comma separated argument for selecting only specified callers, the default is to use all. Example: `--callers mutect2,octopus` @@ -85,34 +91,35 @@ Example: `--cnvcallers purple` `--svcallers` - Comma separated argument for selecting only specified SV callers, the default is to use all. Example: `--svcallers gridss` -`--ffpe` - Adds additional filtering for FFPE by detecting strand orientation bias using SOBDetector. +`--ffpe` - Adds additional filtering for FFPE by detecting strand orientation bias using SOBDetector. `--exome` - When using exome data, this flag limits calling to intervals provided in target bed to reduce time and to account for exome sequencing specific parameters. An intervals file is required. `--indelrealign` - Enables indel realignment using the GATK pipeline when running alignment steps. May be helpful for certain callers (VarScan, VarDict) that do not have local haplotype reassembly. - ## Running LOGAN -Example of Tumor_Normal calling mode + +Example of Tumor_Normal calling mode + ```bash -# preview the logan jobs that will run +# preview the logan jobs that will run nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -preview --vc --sv --cnv -# run a stub/dryrun of the logan jobs +# run a stub/dryrun of the logan jobs nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" -stub --vc --sv --cnv # launch a logan run on slurm with the test dataset -nextflow run LOGAN/main.nf --mode slurm -profile biowulf,slurm --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" --vc --sv --cnv +nextflow run LOGAN/main.nf --mode slurm -profile biowulf,slurm --genome hg38 --sample_sheet samplesheet.tsv --outdir out --fastq_input "*R{1,2}.fastq.gz" --vc --sv --cnv ``` -Example of Tumor only calling mode +Example of Tumor only calling mode + ```bash -# preview the logan jobs that will run +# preview the logan jobs that will run nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -preview --vc --sv --cnv -# run a stub/dryrun of the logan jobs +# run a stub/dryrun of the logan jobs nextflow run LOGAN/main.nf --mode local -profile ci_stub --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 -stub --vc --sv --cnv # launch a logan run on slurm with the test dataset nextflow run LOGAN/main.nf --mode slurm -profile biowulf,slurm --genome hg38 --outdir out --fastq_input "*R{1,2}.fastq.gz" --callers octopus,mutect2 --vc --sv --cnv ``` - diff --git a/docs/user-guide/tool_comparisons.md b/docs/user-guide/tool_comparisons.md index cd4051a..0576c94 100644 --- a/docs/user-guide/tool_comparisons.md +++ b/docs/user-guide/tool_comparisons.md @@ -1,43 +1,42 @@ # LOGAN Tools and Tools Tested - ## SNV -| Tools |Pros | Cons | Used in Logan| -|----|---|---|--- -|Mutect2 |Part of GATK best practices| | x | -|Strelka | Fast| Paired only|x| -|Muse | Fast| Paired only, can't be parallelized|x| -|Lofreq | Low frequency variants| Slow,Paired only|x| -|Vardict | Fast | Lower accuracy|x| -|Varscan | Fast| Lower accuracy|x|| -|Octopus | Accurate| Slow,High memory|x| -|Deepsomatic|Relatively fast|Trained on human data|x| +| Tools | Pros | Cons | Used in Logan | +| ----------- | --------------------------- | ---------------------------------- | ------------- | --- | +| Mutect2 | Part of GATK best practices | | x | +| Strelka | Fast | Paired only | x | +| Muse | Fast | Paired only, can't be parallelized | x | +| Lofreq | Low frequency variants | Slow,Paired only | x | +| Vardict | Fast | Lower accuracy | x | +| Varscan | Fast | Lower accuracy | x | | +| Octopus | Accurate | Slow,High memory | x | +| Deepsomatic | Relatively fast | Trained on human data | x | ## Structural Variants -| Tools |Pros | Cons | Approach| Used in Logan| -|----|---|---|---|---| -|Manta |Accurate, fast| |graph-based| x | -|SVABA | Deletion detection||local assembly+ multiple alignment|x| -|GRIDSS | Provides blacklist| Slow, part of HMFtools pipeline|Break end assembly (discordant +split)|x| + +| Tools | Pros | Cons | Approach | Used in Logan | +| ------ | ------------------ | ------------------------------- | -------------------------------------- | ------------- | +| Manta | Accurate, fast | | graph-based | x | +| SVABA | Deletion detection | | local assembly+ multiple alignment | x | +| GRIDSS | Provides blacklist | Slow, part of HMFtools pipeline | Break end assembly (discordant +split) | x | Manta, GridSS, and SvABA are based on read-pairs, split-reads, and local-assemblies. References [Joe et al](https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-024-10239-9) ## Copy Number -| Tools |Pros | Cons | Used in Logan| -|----|---|---|---| -|Purple |Complete workflow|Doesn't support mm10, requires SV,SNV calls as well | x | -|Sequenza | Purity/Ploidy||x| -|FREEC | Fast | No Purity/Ploidy Estimatation|x| -|ASCAT | Fast, Purity/Ploidy| |x| -|CNVkit |Fast | No Purity/Ploidy Estimatation|x| -|PureCN|Tumor only|Needs Panel of Normals on Sequencing| - - +| Tools | Pros | Cons | Used in Logan | +| -------- | ------------------- | --------------------------------------------------- | ------------- | +| Purple | Complete workflow | Doesn't support mm10, requires SV,SNV calls as well | x | +| Sequenza | Purity/Ploidy | | x | +| FREEC | Fast | No Purity/Ploidy Estimatation | x | +| ASCAT | Fast, Purity/Ploidy | | x | +| CNVkit | Fast | No Purity/Ploidy Estimatation | x | +| PureCN | Tumor only | Needs Panel of Normals on Sequencing | ## Germline -| Tools |Pros | Cons | Used in Logan| -|----|---|---|---| -|Deepvariant |Fast, most accurate| Model trained on human genomes (May not support mm10)| x| + +| Tools | Pros | Cons | Used in Logan | +| ----------- | ------------------- | ----------------------------------------------------- | ------------- | +| Deepvariant | Fast, most accurate | Model trained on human genomes (May not support mm10) | x | diff --git a/modules/local/annotsv.nf b/modules/local/annotsv.nf index 4cbb352..205ccc5 100644 --- a/modules/local/annotsv.nf +++ b/modules/local/annotsv.nf @@ -121,4 +121,4 @@ process annotsv_tonly { touch "${sv}/${tumorname}.tsv" touch "${sv}/${tumorname}.unannotated.tsv" """ -} \ No newline at end of file +} diff --git a/modules/local/ascat.nf b/modules/local/ascat.nf index 9ecba85..0250e52 100644 --- a/modules/local/ascat.nf +++ b/modules/local/ascat.nf @@ -15,7 +15,7 @@ process ascat_tn { val(normalname), path(normal), path(normalbai) output: - tuple val(tumorname), + tuple val(tumorname), path("After_correction_${tumorname}.germline.png"), path("After_correction_${tumorname}.tumour.png"), path("Before_correction_${tumorname}.germline.png"), @@ -66,7 +66,7 @@ process ascat_tn_exome { path(bed) output: - tuple val(tumorname), + tuple val(tumorname), path("After_correction_${tumorname}.germline.png"), path("After_correction_${tumorname}.tumour.png"), path("Before_correction_${tumorname}.germline.png"), diff --git a/modules/local/bcftools_stats.nf b/modules/local/bcftools_stats.nf index f235b0d..5366f61 100644 --- a/modules/local/bcftools_stats.nf +++ b/modules/local/bcftools_stats.nf @@ -30,4 +30,4 @@ process bcftools_stats { touch ${samplename}.germline.bcftools_stats.txt """ -} \ No newline at end of file +} diff --git a/modules/local/bwamem/bwamem2.nf b/modules/local/bwamem/bwamem2.nf index d30ddae..f257d8a 100644 --- a/modules/local/bwamem/bwamem2.nf +++ b/modules/local/bwamem/bwamem2.nf @@ -6,11 +6,11 @@ process bwamem2 { errorStrategy { task.exitStatus in [137,140,143] ? 'retry' : 'terminate' } maxRetries 2 - memory { + memory { if (task.attempt == 2) return '180 GB' else if (task.attempt == 3) return '200 GB' } - + input: tuple val(samplename), path("${samplename}.R1.trimmed.fastq.gz"), @@ -34,7 +34,7 @@ process bwamem2 { else BWA_BINARY="bwa-mem2" fi - + mkdir -p tmp \$BWA_BINARY mem -M \ -R '@RG\\tID:${samplename}\\tSM:${samplename}\\tPL:illumina\\tLB:${samplename}\\tPU:${samplename}\\tCN:hgsc\\tDS:wgs' \ @@ -58,18 +58,18 @@ process BWAMEM2_SPLIT { errorStrategy { task.exitStatus in [137,140,143] ? 'retry' : 'terminate' } maxRetries 2 - memory { + memory { if (task.attempt == 2) return '48 GB' else if (task.attempt == 3) return '64 GB' } - + input: tuple val(samplename), path(reads), val(chunk) output: - tuple val(samplename), + tuple val(samplename), path("${samplename}_${chunk}.bam") script: @@ -97,7 +97,7 @@ process BWAMEM2_SPLIT { stub: """ - touch ${samplename}_${chunk}.bam + touch ${samplename}_${chunk}.bam """ } @@ -132,4 +132,4 @@ process COMBINE_ALIGNMENTS { touch "${samplename}.bam" "${samplename}.bam.bai" """ -} \ No newline at end of file +} diff --git a/modules/local/cnvkit.nf b/modules/local/cnvkit.nf index 7d969ec..4353bd2 100644 --- a/modules/local/cnvkit.nf +++ b/modules/local/cnvkit.nf @@ -134,4 +134,3 @@ process cnvkit_exome_tonly { """ } - diff --git a/modules/local/combinefilter.nf b/modules/local/combinefilter.nf index 452ca2f..7bd9265 100644 --- a/modules/local/combinefilter.nf +++ b/modules/local/combinefilter.nf @@ -34,9 +34,9 @@ process combineVariants { -O ${sample}.${vc}.markedtemp.vcf.gz \ -SD $GENOMEDICT \ -I $vcfin - - bcftools view ${sample}.${vc}.markedtemp.vcf.gz -s $samporder -Oz -o ${sample}.${vc}.marked.vcf.gz - bcftools index -t ${sample}.${vc}.marked.vcf.gz + + bcftools view ${sample}.${vc}.markedtemp.vcf.gz -s $samporder -Oz -o ${sample}.${vc}.marked.vcf.gz + bcftools index -t ${sample}.${vc}.marked.vcf.gz bcftools norm ${sample}.${vc}.marked.vcf.gz -m- --threads $task.cpus --check-ref s -f $GENOMEREF -O v |\ awk '{{gsub(/\\y[W|K|Y|R|S|M|B|D|H|V]\\y/,"N",\$4); OFS = "\t"; print}}' |\ @@ -121,7 +121,7 @@ process combineVariants_alternative { bcftools index ${vc}/${sample}.${vc}.norm.vcf.gz -t """ } - + stub: """ @@ -218,7 +218,7 @@ process somaticcombine { --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \ -o ${tumorsample}_vs_${normal}_combined.vcf.gz \ $vcfin2 - + """ stub: @@ -272,10 +272,9 @@ process somaticcombine_tonly { vcfin1=[caller, vcfs].transpose().collect { a, b -> a + " " + b } vcfin2="-V:" + vcfin1.join(" -V:") callerin=caller.join(",")//.replaceAll("_tonly","") - + """ touch ${tumorsample}_combined_tonly.vcf.gz ${tumorsample}_combined_tonly.vcf.gz.tbi - """ + """ } - diff --git a/modules/local/deepsomatic.nf b/modules/local/deepsomatic.nf index 2fd0d3f..5a8d5a5 100644 --- a/modules/local/deepsomatic.nf +++ b/modules/local/deepsomatic.nf @@ -16,7 +16,7 @@ process deepsomatic_tn_step1 { errorStrategy { task.exitStatus == 1 ? 'ignore' : 'terminate' } input: - tuple val(tname), path(tbam), path(tbai), + tuple val(tname), path(tbam), path(tbai), val(nname), path(nbam), path(nbai), path(bed) @@ -57,7 +57,7 @@ process deepsomatic_tonly_step1 { errorStrategy { task.exitStatus == 1 ? 'ignore' : 'terminate' } input: - tuple val(tname), path(tbam), path(tbai), + tuple val(tname), path(tbam), path(tbai), path(bed) output: @@ -166,7 +166,7 @@ process deepsomatic_step3 { path("outds/*"), path(bed) output: - tuple val(samplename), path("${samplename}_${bed}.vcf.gz"), path("${samplename}_${bed}.vcf.gz.tbi") + tuple val(samplename), path("${samplename}_${bed}.vcf.gz"), path("${samplename}_${bed}.vcf.gz.tbi") script: @@ -210,4 +210,3 @@ process bcfconcat { """ } - diff --git a/modules/local/deepvariant.nf b/modules/local/deepvariant.nf index 6171cef..a7acd63 100644 --- a/modules/local/deepvariant.nf +++ b/modules/local/deepvariant.nf @@ -1,7 +1,7 @@ GENOMEREF=file(params.genomes[params.genome].genome) MODEL="/opt/models/wgs/" /* -//Note +//Note Duplicate marking may be performed, in our analyses there is almost no difference in accuracy except at lower (<20x) coverages. Finally, we recommend that you do not perform @@ -81,7 +81,7 @@ process deepvariant_step2 { process deepvariant_step3 { container "${params.containers.deepvariant}" label 'process_somaticcaller' - + input: tuple val(samplename), path(tfrecords), path(tfgvcf), path("outdv/*"), path(bed) @@ -210,4 +210,4 @@ process deepvariant_combined { """ -} \ No newline at end of file +} diff --git a/modules/local/fastp/fastp.nf b/modules/local/fastp/fastp.nf index 05625ba..8e32769 100644 --- a/modules/local/fastp/fastp.nf +++ b/modules/local/fastp/fastp.nf @@ -47,8 +47,8 @@ process fastp_split { tuple val(samplename), path("*_R{1,2}.trimmed.fastq.gz"), path("${samplename}.fastp.json"), path("${samplename}.fastp.html") - - + + script: """ diff --git a/modules/local/fastq_screen.nf b/modules/local/fastq_screen.nf index faf429e..1f7cd6f 100644 --- a/modules/local/fastq_screen.nf +++ b/modules/local/fastq_screen.nf @@ -38,4 +38,4 @@ process fastq_screen { touch ${samplename}.R1.trimmed_screen.txt ${samplename}.R2.trimmed_screen.html touch ${samplename}.R2.trimmed_screen.png ${samplename}.R2.trimmed_screen.txt """ -} \ No newline at end of file +} diff --git a/modules/local/ffpe.nf b/modules/local/ffpe.nf index 7293f4d..e03031e 100644 --- a/modules/local/ffpe.nf +++ b/modules/local/ffpe.nf @@ -19,17 +19,17 @@ process sobdetect_pass1 { --input-variants ${vcf} \ --input-bam ${bam} \ --output-variants ${sample}.pass1.sobdetect.vcf \ - --only-passed false + --only-passed false bcftools query \ -f '%INFO/numF1R2Alt\t%INFO/numF2R1Alt\t%INFO/numF1R2Ref\t%INFO/numF2R1Ref\t%INFO/numF1R2Other\t%INFO/numF2R1Other\t%INFO/SOB\n' \ ${sample}.sobdetect.vcf \ | awk '{if (\$1 != "."){tum_alt=\$1+\$2; tum_depth=\$1+\$2+\$3+\$4+\$5+\$6; if (tum_depth==0){tum_af=1} else {tum_af=tum_alt/tum_depth }; print tum_alt,tum_depth,tum_af,\$7}}' \ > ${sample}.info - + mv ${sample}.pass1.sobdetect.vcf ${vc}/pass1 mv ${sample}.info ${vc}/pass1 - + """ stub: @@ -64,12 +64,12 @@ process sobdetect_cohort_params { grep -v '^#' all_samples.info \ | awk '{ total1 += \$1; ss1 += \$1^2; total2 += \$2; ss2 += \$2^2; total3 += \$3; ss3 += \$3^2; total4 += \$4; ss4 += \$4^2 } END { print total1/NR,total2/NR,total3/NR,total4/NR; print sqrt(ss1/NR-(total1/NR)^2),sqrt(ss2/NR-(total2/NR)^2),sqrt(ss3/NR-(total3/NR)^3),sqrt(ss4/NR-(total4/NR)^2) }' > cohort_params.txt """ - + stub: """ touch all_samples.info cohort_params.txt """ -} +} process sobdetect_pass2 { container "${params.containers.ffpe}" @@ -77,9 +77,9 @@ process sobdetect_pass2 { input: tuple val(sample), path(vcf), path(bam), val(vc), path(sample_info), path(params_file) - + output: - tuple val(sample), + tuple val(sample), path("${vc}/pass2/${sample}.pass2.sobdetect.vcf"), path("${vc}/pass2/${sample}.info"), path("${vc}/pass2/${sample}_${vc}.artifact_filtered.vcf.gz"), @@ -136,7 +136,7 @@ process sobdetect_metrics { path (pass2_vcfs) output: - tuple path("variant_count_table.txt"), + tuple path("variant_count_table.txt"), path("all_metrics.txt") script: @@ -148,7 +148,7 @@ process sobdetect_metrics { P2FILES=(\$(echo ${pass2_vcfs})) for (( i=0; i<\${#P1FILES[@]}; i++ )); do MYID=\$(basename -s ".sobdetect.vcf" \${P1FILES[\$i]}) - + total_count=\$(grep -v ^# \${P1FILES[\$i]} | wc -l) || total_count=0 count_1p=\$(bcftools query -f '%INFO/pArtifact\n' \${P1FILES[\$i]} | awk '{if (\$1 != "." && \$1 < 0.05){print}}' | wc -l) count_2p=\$(bcftools query -f '%INFO/pArtifact\n' \${P2FILES[\$i]} | awk '{if (\$1 != "." && \$1 < 0.05){print}}' | wc -l) @@ -166,4 +166,3 @@ process sobdetect_metrics { """ } - diff --git a/modules/local/freec.nf b/modules/local/freec.nf index b36c9d0..8338f7f 100644 --- a/modules/local/freec.nf +++ b/modules/local/freec.nf @@ -114,7 +114,7 @@ process freec_paired_exome { shell: """ - python $REFORMATBED -i $CNVTARGETS + python $REFORMATBED -i $CNVTARGETS perl $FREECPAIR_SCRIPT \ . \ $FREECLENGTHS \ @@ -179,8 +179,8 @@ process freec { path("${tumorname}_ratio.txt.png") - shell: - + shell: + """ perl $FREEC_SCRIPT \ . \ diff --git a/modules/local/gridss.nf b/modules/local/gridss.nf index d38f5a8..381a5f2 100644 --- a/modules/local/gridss.nf +++ b/modules/local/gridss.nf @@ -16,7 +16,7 @@ process gridss_somatic { container "${params.containers.sv}" input: - tuple val(tumorname), path(tumor), path(tumorbai), + tuple val(tumorname), path(tumor), path(tumorbai), val(normalname), path(normal), path(normalbai) output: @@ -39,7 +39,7 @@ process gridss_somatic { --jvmheap 90g \ --otherjvmheap 64g \ -t $task.cpus \ - ${normal} ${tumor} + ${normal} ${tumor} mkdir -p ${tumorname}_vs_${normalname} diff --git a/modules/local/kraken.nf b/modules/local/kraken.nf index e42fe34..0065cbf 100644 --- a/modules/local/kraken.nf +++ b/modules/local/kraken.nf @@ -51,4 +51,4 @@ process kraken { touch ${samplename}.trimmed.kraken_bacteria.taxa.txt ${samplename}.trimmed.kraken_bacteria.krona.html """ -} \ No newline at end of file +} diff --git a/modules/local/lancet2/lancet2.nf b/modules/local/lancet2/lancet2.nf index 90f713f..25b5fb6 100644 --- a/modules/local/lancet2/lancet2.nf +++ b/modules/local/lancet2/lancet2.nf @@ -31,7 +31,7 @@ process lancet2_tn { python3 score_variants.py \ ${tumorname}_vs_${normalname}_${bed.simpleName}_temp.vcf.gz somatic_ebm.lancet_6ef7ba445a.v1.pkl > ${tumorname}_vs_${normalname}_${bed.simpleName}_scored.vcf - + bcftools view ${tumorname}_vs_${normalname}_${bed.simpleName}_scored.vcf -Oz -o ${tumorname}_vs_${normalname}_${bed.simpleName}_lancet.vcf.gz bcftools index -t ${tumorname}_vs_${normalname}_${bed.simpleName}_lancet.vcf.gz @@ -48,5 +48,3 @@ process lancet2_tn { """ } - - diff --git a/modules/local/lofreq.nf b/modules/local/lofreq.nf index b1df228..342f0e0 100644 --- a/modules/local/lofreq.nf +++ b/modules/local/lofreq.nf @@ -62,5 +62,3 @@ process lofreq_tn { """ } - - diff --git a/modules/local/manta.nf b/modules/local/manta.nf index ead15a3..65ee63c 100644 --- a/modules/local/manta.nf +++ b/modules/local/manta.nf @@ -5,7 +5,7 @@ process manta_somatic { label 'process_high' input: - tuple val(tumorname), path(tumorbam), path(tumorbai), + tuple val(tumorname), path(tumorbam), path(tumorbai), val(normalname), path(normalbam), path(normalbai) output: diff --git a/modules/local/mutect2.nf b/modules/local/mutect2.nf index fc6e7f1..38e22c0 100644 --- a/modules/local/mutect2.nf +++ b/modules/local/mutect2.nf @@ -521,6 +521,3 @@ process mutect2filter_tonly { touch ${sample}.tonly.mut2.marked.vcf.gz.filteringStats.tsv """ } - - - diff --git a/modules/local/octopus.nf b/modules/local/octopus.nf index f6ac6d6..e791e9c 100644 --- a/modules/local/octopus.nf +++ b/modules/local/octopus.nf @@ -73,13 +73,13 @@ process bcftools_index_octopus { process octopus_convertvcf { container "${params.containers.logan}" label 'process_low' - + input: - tuple val(tumor), val(normal), + tuple val(tumor), val(normal), val(oct), path(vcf), path(vcfindex) output: - tuple val(tumor), val(normal), path("${tumor}.octopus.norm.vcf.gz"), + tuple val(tumor), val(normal), path("${tumor}.octopus.norm.vcf.gz"), path("${tumor}.octopus.norm.vcf.gz.tbi") @@ -120,7 +120,7 @@ process octopus_tonly { -t ${bed} \ --threads ${task.cpus}\ $SOMATIC_FOREST \ - -o ${tumorname}_${bed.simpleName}.tonly.octopus.vcf.gz + -o ${tumorname}_${bed.simpleName}.tonly.octopus.vcf.gz """ stub: @@ -135,12 +135,12 @@ process octopus_tonly { process octopus_convertvcf_tonly { container "${params.containers.logan}" label 'process_low' - + input: tuple val(tumor), val(oct), path(vcf), path(vcfindex) output: - tuple val(tumor), path("${tumor}.octopus_tonly.norm.vcf.gz"), + tuple val(tumor), path("${tumor}.octopus_tonly.norm.vcf.gz"), path("${tumor}.octopus_tonly.norm.vcf.gz.tbi") @@ -157,4 +157,3 @@ process octopus_convertvcf_tonly { touch ${tumor}.octopus_tonly.norm.vcf.gz ${tumor}.octopus_tonly.norm.vcf.gz.tbi """ } - diff --git a/modules/local/purple.nf b/modules/local/purple.nf index c51b308..92f61f3 100644 --- a/modules/local/purple.nf +++ b/modules/local/purple.nf @@ -26,7 +26,7 @@ process amber_tonly { output: tuple val(tumorname), path("${tumorname}_amber") - + script: """ @@ -60,7 +60,7 @@ process amber_tn { output: tuple val("${tumorname}_vs_${normalname}"), val(tumorname), val(normalname), path("${tumorname}_vs_${normalname}_amber") - + script: """ @@ -201,7 +201,7 @@ process purple_novc { path(amberin), path(cobaltin) output: - tuple val(id), val(tumorname), val(normalname), + tuple val(id), val(tumorname), val(normalname), path("${id}") script: @@ -237,7 +237,7 @@ process purple_tonly { errorStrategy 'ignore' input: - tuple val(tumorname), + tuple val(tumorname), path(amberin), path(cobaltin), path(somaticvcf), path(somaticvcfindex) @@ -276,7 +276,7 @@ process purple_tonly_novc { container "${params.containers.logan}" label 'process_medium' errorStrategy 'ignore' - + input: tuple val(tumorname), val(normalname), path(cobaltin), path(amberin) @@ -307,4 +307,3 @@ process purple_tonly_novc { """ } - diff --git a/modules/local/qc.nf b/modules/local/qc.nf index 9f68850..d0bed7b 100644 --- a/modules/local/qc.nf +++ b/modules/local/qc.nf @@ -477,7 +477,7 @@ process somalier_extract { Mapped and pre-processed BAM file @Output: Exracted sites in (binary) somalier format - + params: sites_vcf = config['references']['SOMALIER']['SITES_VCF'], genomeFasta = config['references']['GENOME'], diff --git a/modules/local/qualimap.nf b/modules/local/qualimap.nf index dfe7620..b413412 100644 --- a/modules/local/qualimap.nf +++ b/modules/local/qualimap.nf @@ -43,4 +43,4 @@ process qualimap_bamqc { """ touch ${samplename}_genome_results.txt ${samplename}_qualimapReport.html """ -} \ No newline at end of file +} diff --git a/modules/local/sage.nf b/modules/local/sage.nf index 38f1595..e972492 100644 --- a/modules/local/sage.nf +++ b/modules/local/sage.nf @@ -24,7 +24,7 @@ process sage_tn { path("${tumorname}_vs_${normalname}.sage.vcf.gz"), path("${tumorname}_vs_${normalname}.sage.vcf.gz.tbi") - + script: """ java -Xms4G -Xmx32G -cp /opt2/hmftools/sage.jar \ @@ -54,7 +54,7 @@ process sage_tonly { tuple val(tumorname), path(tumorbam), path(tumorbai) output: - tuple val(tumorname), + tuple val(tumorname), path("${tumorname}.tonly.sage.vcf.gz"), path("${tumorname}.tonly.sage.vcf.gz.tbi") diff --git a/modules/local/sequenza.nf b/modules/local/sequenza.nf index 93b74ff..675322a 100644 --- a/modules/local/sequenza.nf +++ b/modules/local/sequenza.nf @@ -110,11 +110,11 @@ process pileup_sequenza { errorStrategy 'ignore' input: - tuple val(pairid), val(name), + tuple val(pairid), val(name), path(bam), path(bai), path(bed) output: - tuple val(pairid), path("${name}_${bed}.mpileup.gz"), path("${name}_${bed}.mpileup.gz.tbi") + tuple val(pairid), path("${name}_${bed}.mpileup.gz"), path("${name}_${bed}.mpileup.gz.tbi") script: //Q20 is default in sequenza diff --git a/modules/local/splitbed.nf b/modules/local/splitbed.nf index 66ea180..e6b025d 100644 --- a/modules/local/splitbed.nf +++ b/modules/local/splitbed.nf @@ -4,7 +4,7 @@ GENOMEFAI = file(params.genomes[params.genome].genomefai) -// Split Bed Step to create the path +// Split Bed Step to create the path process splitinterval { container "${params.containers.logan}" label "process_single" @@ -45,11 +45,11 @@ process matchbed { /* Code to convert beds to interval list -#Subset current bed +#Subset current bed #hg38 awk -F '\t' '{printf("%s\t0\t%s\n",$1,$2);}' genome.fa.fai bedtools subtract -a GRCh38.primary_assembly.genome.bed -b ../hg38.blacklist.bed > GRCh38.primary_assembly.genome.interval.bed -gatk BedToIntervalList -I GRCh38.primary_assembly.genome.interval.bed -O \ +gatk BedToIntervalList -I GRCh38.primary_assembly.genome.interval.bed -O \ GRCh38.primary_assembly.genome.interval_list -SD GRCh38.primary_assembly.genome.dict #hg19 diff --git a/modules/local/strelka.nf b/modules/local/strelka.nf index 0b24fc5..ad2f0f9 100644 --- a/modules/local/strelka.nf +++ b/modules/local/strelka.nf @@ -70,7 +70,7 @@ process convert_strelka { output: tuple val(tumor), val(normal), val("strelka"), - path("${tumor}_vs_${normal}.filtered.strelka-fixed.vcf.gz"), + path("${tumor}_vs_${normal}.filtered.strelka-fixed.vcf.gz"), path("${tumor}_vs_${normal}.filtered.strelka-fixed.vcf.gz.tbi") diff --git a/modules/local/trim_align.nf b/modules/local/trim_align.nf index 614f1a4..184c788 100644 --- a/modules/local/trim_align.nf +++ b/modules/local/trim_align.nf @@ -101,7 +101,7 @@ process bqsr { process gatherbqsr { container "${params.containers.logan}" label 'process_low' - + input: tuple val(samplename), path(recalgroups) @@ -199,7 +199,7 @@ process bamtocram_tonly { samtools view -@ $task.cpus -C -T $GENOMEREF -o ${id}.cram $tumor samtools index ${id}.cram -@ $task.cpus """ - + stub: """ touch ${id}.cram ${id}.cram.crai @@ -223,10 +223,9 @@ process samtools2fq { -1 ${id}.R1.fastq -2 ${id}.R2.fastq -0 /dev/null -s /dev/null \ -n $bam """ - + stub: """ - touch ${id}.R1.fastq ${id}.R2.fastq + touch ${id}.R1.fastq ${id}.R2.fastq """ } - diff --git a/modules/local/vardict.nf b/modules/local/vardict.nf index dcd5dc5..c04ec27 100644 --- a/modules/local/vardict.nf +++ b/modules/local/vardict.nf @@ -32,12 +32,12 @@ process vardict_tn { -S \ -M \ -f 0.01 > ${tumorname}_vs_${normalname}_${bed.simpleName}.vardict.vcf - + bcftools filter \ --exclude 'STATUS="Germline" | STATUS="LikelyLOH" | STATUS="AFDiff"' \ ${tumorname}_vs_${normalname}_${bed.simpleName}.vardict.vcf \ - > ${tumorname}_vs_${normalname}_${bed.simpleName}.vardict.filtered.vcf - + > ${tumorname}_vs_${normalname}_${bed.simpleName}.vardict.filtered.vcf + printf "${normal.Name}\t${normalname}\n${tumor.Name}\t${tumorname}\n" > sampname bcftools reheader -s sampname ${tumorname}_vs_${normalname}_${bed.simpleName}.vardict.filtered.vcf \ diff --git a/modules/local/varscan.nf b/modules/local/varscan.nf index 71ecb15..d7eeec2 100644 --- a/modules/local/varscan.nf +++ b/modules/local/varscan.nf @@ -80,7 +80,7 @@ process varscan_tonly { varscan_cmd="varscan mpileup2cns <($pileup_cmd) $varscan_opts" eval "$varscan_cmd > !{tumor.simpleName}_!{bed.simpleName}.tonly.varscan.vcf_temp" - + varscan filter !{tumor.simpleName}_!{bed.simpleName}.tonly.varscan.vcf_temp \ --strand-filter 1 --min-reads2 5 --min-strands2 2 --min-var-freq 0.05 --p-value 0.01 --min-avg-qual 30 > \ !{tumor.simpleName}_!{bed.simpleName}.tonly.varscan.vcf_temp1 @@ -101,4 +101,3 @@ process varscan_tonly { """ } - diff --git a/modules/local/vcftools.nf b/modules/local/vcftools.nf index 12c6107..eb0f0df 100644 --- a/modules/local/vcftools.nf +++ b/modules/local/vcftools.nf @@ -30,4 +30,4 @@ process vcftools { """ touch variants_raw_variants.het """ -} \ No newline at end of file +} diff --git a/setup.py b/setup.py index 4a15e68..1abbd06 100644 --- a/setup.py +++ b/setup.py @@ -1,4 +1,4 @@ import setuptools if __name__ == "__main__": - setuptools.setup() \ No newline at end of file + setuptools.setup() diff --git a/subworkflows/local/workflows.nf b/subworkflows/local/workflows.nf index 3ea434d..47068ad 100644 --- a/subworkflows/local/workflows.nf +++ b/subworkflows/local/workflows.nf @@ -25,9 +25,9 @@ include {deepvariant_step1; deepvariant_step2; deepvariant_step3; deepvariant_combined; glnexus; bcfconcat as bcfconcat_vcf; bcfconcat as bcfconcat_gvcf} from '../../modules/local/deepvariant.nf' -include {pileup_paired as pileup_paired_t; pileup_paired as pileup_paired_n; +include {pileup_paired as pileup_paired_t; pileup_paired as pileup_paired_n; learnreadorientationmodel; - mutect2; mutect2filter; contamination_paired; mergemut2stats; + mutect2; mutect2filter; contamination_paired; mergemut2stats; mutect2_t_tonly; mutect2filter_tonly; contamination_tumoronly; learnreadorientationmodel_tonly; @@ -36,7 +36,7 @@ include {pileup_paired as pileup_paired_t; pileup_paired as pileup_paired_n; include {sage_tn; sage_tonly} from '../../modules/local/sage.nf' include {vardict_tn; vardict_tonly} from '../../modules/local/vardict.nf' include {varscan_tn; varscan_tonly} from '../../modules/local/varscan.nf' -include {octopus_tn; bcftools_index_octopus; +include {octopus_tn; bcftools_index_octopus; bcftools_index_octopus as bcftools_index_octopus_tonly; octopus_convertvcf; octopus_tonly; octopus_convertvcf_tonly} from '../../modules/local/octopus.nf' include {lofreq_tn} from '../../modules/local/lofreq.nf' @@ -50,35 +50,35 @@ include {combineVariants as combineVariants_vardict; combineVariants as combineV combineVariants as combineVariants_varscan; combineVariants as combineVariants_varscan_tonly; combineVariants as combineVariants_sage; combineVariants as combineVariants_sage_tonly; combineVariants_alternative as combineVariants_deepsomatic; combineVariants_alternative as combineVariants_deepsomatic_tonly; - combineVariants_alternative as combineVariants_lofreq; + combineVariants_alternative as combineVariants_lofreq; combineVariants as combineVariants_muse; - combineVariants_alternative as combineVariants_octopus; + combineVariants_alternative as combineVariants_octopus; combineVariants_alternative as combineVariants_octopus_tonly; combinemafs_tn; somaticcombine; somaticcombine as somaticcombine_ffpe; combinemafs_tonly;somaticcombine_tonly; somaticcombine_tonly as somaticcombine_tonly_ffpe} from '../../modules/local/combinefilter.nf' -include {sobdetect_pass1 as sobdetect_pass1_mutect2; sobdetect_pass2 as sobdetect_pass2_mutect2; +include {sobdetect_pass1 as sobdetect_pass1_mutect2; sobdetect_pass2 as sobdetect_pass2_mutect2; sobdetect_metrics as sobdetect_metrics_mutect2; sobdetect_cohort_params as sobdetect_cohort_params_mutect2; - sobdetect_pass1 as sobdetect_pass1_octopus; sobdetect_pass2 as sobdetect_pass2_octopus; + sobdetect_pass1 as sobdetect_pass1_octopus; sobdetect_pass2 as sobdetect_pass2_octopus; sobdetect_metrics as sobdetect_metrics_octopus; sobdetect_cohort_params as sobdetect_cohort_params_octopus; - sobdetect_pass1 as sobdetect_pass1_strelka; sobdetect_pass2 as sobdetect_pass2_strelka; + sobdetect_pass1 as sobdetect_pass1_strelka; sobdetect_pass2 as sobdetect_pass2_strelka; sobdetect_metrics as sobdetect_metrics_strelka; sobdetect_cohort_params as sobdetect_cohort_params_strelka; - sobdetect_pass1 as sobdetect_pass1_lofreq; sobdetect_pass2 as sobdetect_pass2_lofreq; + sobdetect_pass1 as sobdetect_pass1_lofreq; sobdetect_pass2 as sobdetect_pass2_lofreq; sobdetect_metrics as sobdetect_metrics_lofreq; sobdetect_cohort_params as sobdetect_cohort_params_lofreq; - sobdetect_pass1 as sobdetect_pass1_muse; sobdetect_pass2 as sobdetect_pass2_muse; + sobdetect_pass1 as sobdetect_pass1_muse; sobdetect_pass2 as sobdetect_pass2_muse; sobdetect_metrics as sobdetect_metrics_muse; sobdetect_cohort_params as sobdetect_cohort_params_muse; - sobdetect_pass1 as sobdetect_pass1_vardict; sobdetect_pass2 as sobdetect_pass2_vardict; + sobdetect_pass1 as sobdetect_pass1_vardict; sobdetect_pass2 as sobdetect_pass2_vardict; sobdetect_metrics as sobdetect_metrics_vardict; sobdetect_cohort_params as sobdetect_cohort_params_vardict; - sobdetect_pass1 as sobdetect_pass1_varscan; sobdetect_pass2 as sobdetect_pass2_varscan; + sobdetect_pass1 as sobdetect_pass1_varscan; sobdetect_pass2 as sobdetect_pass2_varscan; sobdetect_metrics as sobdetect_metrics_varscan; sobdetect_cohort_params as sobdetect_cohort_params_varscan; //Tumor Only - sobdetect_pass1 as sobdetect_pass1_mutect2_tonly; sobdetect_pass2 as sobdetect_pass2_mutect2_tonly; + sobdetect_pass1 as sobdetect_pass1_mutect2_tonly; sobdetect_pass2 as sobdetect_pass2_mutect2_tonly; sobdetect_metrics as sobdetect_metrics_mutect2_tonly; sobdetect_cohort_params as sobdetect_cohort_params_mutect2_tonly; - sobdetect_pass1 as sobdetect_pass1_octopus_tonly; sobdetect_pass2 as sobdetect_pass2_octopus_tonly; + sobdetect_pass1 as sobdetect_pass1_octopus_tonly; sobdetect_pass2 as sobdetect_pass2_octopus_tonly; sobdetect_metrics as sobdetect_metrics_octopus_tonly; sobdetect_cohort_params as sobdetect_cohort_params_octopus_tonly; - sobdetect_pass1 as sobdetect_pass1_vardict_tonly; sobdetect_pass2 as sobdetect_pass2_vardict_tonly; + sobdetect_pass1 as sobdetect_pass1_vardict_tonly; sobdetect_pass2 as sobdetect_pass2_vardict_tonly; sobdetect_metrics as sobdetect_metrics_vardict_tonly; sobdetect_cohort_params as sobdetect_cohort_params_vardict_tonly; - sobdetect_pass1 as sobdetect_pass1_varscan_tonly; sobdetect_pass2 as sobdetect_pass2_varscan_tonly; + sobdetect_pass1 as sobdetect_pass1_varscan_tonly; sobdetect_pass2 as sobdetect_pass2_varscan_tonly; sobdetect_metrics as sobdetect_metrics_varscan_tonly; sobdetect_cohort_params as sobdetect_cohort_params_varscan_tonly } from "../../modules/local/ffpe.nf" @@ -87,9 +87,9 @@ include {annotvep_tn as annotvep_tn_mut2_ffpe; annotvep_tn as annotvep_tn_strelk annotvep_tn as annotvep_tn_varscan_ffpe; annotvep_tn as annotvep_tn_vardict_ffpe; annotvep_tn as annotvep_tn_octopus_ffpe; annotvep_tn as annotvep_tn_lofreq_ffpe; annotvep_tn as annotvep_tn_muse_ffpe; annotvep_tn as annotvep_tn_sage_ffpe; annotvep_tn as annotvep_tn_deepsomatic_ffpe; - annotvep_tn as annotvep_tn_combined_ffpe; + annotvep_tn as annotvep_tn_combined_ffpe; annotvep_tonly as annotvep_tonly_varscan_ffpe; annotvep_tonly as annotvep_tonly_vardict_ffpe; - annotvep_tonly as annotvep_tonly_mut2_ffpe; annotvep_tonly as annotvep_tonly_octopus_ffpe; + annotvep_tonly as annotvep_tonly_mut2_ffpe; annotvep_tonly as annotvep_tonly_octopus_ffpe; annotvep_tonly as annotvep_tonly_sage_ffpe; annotvep_tonly as annotvep_tonly_deepsomatic_ffpe; annotvep_tonly as annotvep_tonly_combined_ffpe} from '../../modules/local/annotvep.nf' @@ -98,17 +98,17 @@ include {annotvep_tn as annotvep_tn_mut2; annotvep_tn as annotvep_tn_strelka; annotvep_tn as annotvep_tn_varscan; annotvep_tn as annotvep_tn_vardict; annotvep_tn as annotvep_tn_octopus; annotvep_tn as annotvep_tn_lofreq; annotvep_tn as annotvep_tn_muse; annotvep_tn as annotvep_tn_sage; annotvep_tn as annotvep_tn_deepsomatic; - annotvep_tn as annotvep_tn_combined; + annotvep_tn as annotvep_tn_combined; annotvep_tonly as annotvep_tonly_varscan; annotvep_tonly as annotvep_tonly_vardict; - annotvep_tonly as annotvep_tonly_mut2; annotvep_tonly as annotvep_tonly_octopus; + annotvep_tonly as annotvep_tonly_mut2; annotvep_tonly as annotvep_tonly_octopus; annotvep_tonly as annotvep_tonly_sage; annotvep_tonly as annotvep_tonly_deepsomatic; annotvep_tonly as annotvep_tonly_combined} from '../../modules/local/annotvep.nf' include {svaba_somatic} from '../../modules/local/svaba.nf' include {manta_somatic} from '../../modules/local/manta.nf' include {gridss_somatic} from '../../modules/local/gridss.nf' -include {survivor_sv; - gunzip as gunzip_manta; gunzip as gunzip_gridss; +include {survivor_sv; + gunzip as gunzip_manta; gunzip as gunzip_gridss; annotsv_tn as annotsv_survivor_tn annotsv_tn as annotsv_gridss; annotsv_tn as annotsv_svaba; annotsv_tn as annotsv_manta} from '../../modules/local/annotsv.nf' @@ -177,23 +177,23 @@ workflow ALIGN { //Trim and split | align | combine/mark duplicates if (params.split_fastq>0){ - fastp_split(fastqinput) - | flatMap{samplename, fqs, json, html -> + fastp_split(fastqinput) + | flatMap{samplename, fqs, json, html -> def pairsfq = fqs.collate(2) - pairsfq.collect { pair -> + pairsfq.collect { pair -> def chunkId = pair[0].getBaseName().toString().tokenize('_').first().tokenize('.')[0] return [samplename, pair, chunkId] } - } - | BWAMEM2_SPLIT - | groupTuple - | map { samplename,bam -> + } + | BWAMEM2_SPLIT + | groupTuple + | map { samplename,bam -> tuple( samplename, bam.toSorted{ it -> (it.name =~ /${samplename}_(.*?).bam/)[0][1].toInteger() } )} | COMBINE_ALIGNMENTS alignment_out=COMBINE_ALIGNMENTS.out.bams - fastp_out=fastp_split.out - | map{samplename, fqs, json,html -> - fqs.collect {fq -> + fastp_out=fastp_split.out + | map{samplename, fqs, json,html -> + fqs.collect {fq -> return tuple(samplename,fq) } } | flatten() @@ -202,36 +202,36 @@ workflow ALIGN { fastqinput | map{sample,fqs -> tuple(sample,fqs[0],fqs[1])}| bwamem2 alignment_out=bwamem2.out }else{ - fastp_out = fastp(fastqinput) | map{sample,f1,f2,json,html -> tuple(sample,f1,f2)} + fastp_out = fastp(fastqinput) | map{sample,f1,f2,json,html -> tuple(sample,f1,f2)} bwamem2(fastp_out) alignment_out=bwamem2.out } //Indel Realignment - if (params.indelrealign){ - bwaindelre = alignment_out | indelrealign + if (params.indelrealign){ + bwaindelre = alignment_out | indelrealign bqsrbambyinterval = bwaindelre.combine(splitinterval.out.flatten()) bambyinterval = bwaindelre.combine(splitinterval.out.flatten()) bqsr_ir(bqsrbambyinterval) - - bqsrs = bqsr_ir.out - | groupTuple - | map { samplename,beds -> - tuple( samplename, beds.toSorted{ it -> (it.name =~ /${samplename}_(.*?).recal_data.grp/)[0][1].toInteger() } )} + + bqsrs = bqsr_ir.out + | groupTuple + | map { samplename,beds -> + tuple( samplename, beds.toSorted{ it -> (it.name =~ /${samplename}_(.*?).recal_data.grp/)[0][1].toInteger() } )} gatherbqsr(bqsrs) tobqsr=bwaindelre.combine(gatherbqsr.out,by:0) applybqsr(tobqsr) - bamwithsample=applybqsr.out.combine(sample_sheet,by:0).map{it.swap(3,0)}.combine(applybqsr.out,by:0).map{it.swap(3,0)} + bamwithsample=applybqsr.out.combine(sample_sheet,by:0).map{it.swap(3,0)}.combine(applybqsr.out,by:0).map{it.swap(3,0)} - }else{ + }else{ bqsrbambyinterval=alignment_out.combine(splitinterval.out.flatten()) bambyinterval=alignment_out.combine(splitinterval.out.flatten()) bqsr(bqsrbambyinterval) bqsrs=bqsr.out | groupTuple - | map { samplename,beds -> + | map { samplename,beds -> tuple( samplename, beds.toSorted{ it -> (it.name =~ /${samplename}_(.*?).recal_data.grp/)[0][1].toInteger() } )} gatherbqsr(bqsrs) @@ -240,7 +240,7 @@ workflow ALIGN { applybqsr(tobqsr) bamwithsample=applybqsr.out.combine(sample_sheet,by:0).map{it.swap(3,0)}.combine(applybqsr.out,by:0).map{it.swap(3,0)} - + } emit: @@ -259,24 +259,24 @@ workflow GL { take: sample_sheet bambyinterval - + main: - //Keep Only the NormalSamples - bambyinterval_normonly=sample_sheet | map{t,n -> tuple(n)} | combine(bambyinterval,by:[0]) - | unique{it -> it[0]+ '~' + it[3] } + //Keep Only the NormalSamples + bambyinterval_normonly=sample_sheet | map{t,n -> tuple(n)} | combine(bambyinterval,by:[0]) + | unique{it -> it[0]+ '~' + it[3] } - deepvariant_step1(bambyinterval_normonly) | deepvariant_step2 + deepvariant_step1(bambyinterval_normonly) | deepvariant_step2 | deepvariant_step3 | groupTuple - | multiMap{samplename,vcf,vcf_tbi,gvcf,gvcf_tbi -> + | multiMap{samplename,vcf,vcf_tbi,gvcf,gvcf_tbi -> vcf: tuple(samplename,vcf.toSorted{it -> (it.name =~ /${samplename}_(.*?).bed.vcf.gz/)[0][1].toInteger()},vcf_tbi,"vcf") gvcf: tuple(samplename,gvcf.toSorted{it -> (it.name =~ /${samplename}_(.*?).bed.gvcf.gz/)[0][1].toInteger()},gvcf_tbi,"gvcf") } - | set{dv_out} - dv_out.vcf | bcfconcat_vcf - dv_out.gvcf | bcfconcat_gvcf | map{sample,gvcf,index -> gvcf} - | collect + | set{dv_out} + dv_out.vcf | bcfconcat_vcf + dv_out.gvcf | bcfconcat_gvcf | map{sample,gvcf,index -> gvcf} + | collect | glnexus - deepvariant_out=bcfconcat_vcf.out | join(bcfconcat_gvcf.out) + deepvariant_out=bcfconcat_vcf.out | join(bcfconcat_gvcf.out) emit: glnexusout=glnexus.out @@ -294,48 +294,48 @@ workflow VC { main: //Create Pairing for TN (in case of dups) sample_sheet_paired=sample_sheet | map{tu,no -> tuple ("${tu}_vs_${no}",tu, no)} | view() - bambyinterval=bamwithsample.combine(splitout.flatten()) + bambyinterval=bamwithsample.combine(splitout.flatten()) - bambyinterval - | multiMap {tumorname,tumor,tumorbai,normalname,normalbam,normalbai,bed -> + bambyinterval + | multiMap {tumorname,tumor,tumorbai,normalname,normalbam,normalbai,bed -> t1: tuple(tumorname,tumor,tumorbai,bed) n1: tuple(normalname,normalbam,normalbai,bed) } | set{bambyinterval_tonly} - + bambyinterval_t=bambyinterval_tonly.t1 | - concat(bambyinterval_tonly.n1) | unique() + concat(bambyinterval_tonly.n1) | unique() //Prep Pileups call_list = params.callers.split(',') as List call_list_tonly = params.tonlycallers.split(',') as List call_list_tonly = call_list.intersect(call_list_tonly) - + vc_all=Channel.empty() vc_tonly=Channel.empty() //Common for Mutect2/Varscan - if ("mutect2" in call_list | "varscan" in call_list){ - bambyinterval | + if ("mutect2" in call_list | "varscan" in call_list){ + bambyinterval | map{tumorname,tumor,tumorbai,normalname,normal,normalbai,bed -> tuple(tumorname,tumor,tumorbai,bed,"tpileup")} | unique | - pileup_paired_t - bambyinterval | + pileup_paired_t + bambyinterval | map{tumorname,tumor,tumorbai,normalname,normal,normalbai,bed -> tuple(normalname,normal,normalbai,bed,"npileup")} | unique | pileup_paired_n pileup_paired_t.out | groupTuple | - multiMap { samplename, pileups -> + multiMap { samplename, pileups -> tout: tuple( samplename, pileups.toSorted{ it -> (it.name =~ /${samplename}_(.*?).tpileup.table/)[0][1].toInteger() } ) tonly: tuple( samplename, pileups.toSorted{ it -> (it.name =~ /${samplename}_(.*?).tpileup.table/)[0][1].toInteger() } ) } | set{pileup_paired_tout} - + pileup_paired_n.out | groupTuple | - multiMap { normalname, pileups -> + multiMap { normalname, pileups -> nout: tuple (normalname, pileups.toSorted{ it -> (it.name =~ /${normalname}_(.*?).npileup.table/)[0][1].toInteger() } ) nonly: tuple (normalname, @@ -343,15 +343,15 @@ workflow VC { } | set{pileup_paired_nout} - pileup_paired_match=sample_sheet_paired |map{id,t,n-> tuple(t,id,n)} | combine(pileup_paired_tout.tout,by:0) | + pileup_paired_match=sample_sheet_paired |map{id,t,n-> tuple(t,id,n)} | combine(pileup_paired_tout.tout,by:0) | map{it.swap(2,0)} | combine(pileup_paired_nout.nout,by:0) |map{no,id,tu,tpi,npi->tuple(tu,no,tpi,npi)} - + //pileup_paired_match=pileup_paired_tout.tout.join(pileup_paired_nout.nout,by:[0,1]) contamination_paired(pileup_paired_match) if (!params.no_tonly){ - pileup_all=pileup_paired_tout.tonly | concat(pileup_paired_nout.nonly) - contamination_tumoronly(pileup_all) + pileup_all=pileup_paired_tout.tonly | concat(pileup_paired_nout.nonly) + contamination_tumoronly(pileup_all) } } @@ -399,8 +399,8 @@ workflow VC { } | set{mut2tonlyout} - learnreadorientationmodel_tonly(mut2tonlyout.mut2tout_lor) - mergemut2stats_tonly(mut2tonlyout.mut2tonly_mstats) + learnreadorientationmodel_tonly(mut2tonlyout.mut2tout_lor) + mergemut2stats_tonly(mut2tonlyout.mut2tonly_mstats) mutect2_in_tonly=mut2tonlyout.allmut2tonly | join(mergemut2stats_tonly.out) @@ -411,10 +411,10 @@ workflow VC { | map{tumor,markedvcf,markedindex,normvcf,normindex,stats,normal -> tuple(tumor,"mutect2_tonly",normvcf,normindex)} annotvep_tonly_mut2(mutect2_in_tonly) - vc_tonly = vc_tonly | concat(mutect2_in_tonly) + vc_tonly = vc_tonly | concat(mutect2_in_tonly) } - - + + } if ("strelka" in call_list){ @@ -435,7 +435,7 @@ workflow VC { if ("vardict" in call_list){ //Vardict TN vardict_in=vardict_tn(bambyinterval) | groupTuple(by:[0,1]) - | map{tumor,normal,vcf -> + | map{tumor,normal,vcf -> tuple("${tumor}_vs_${normal}",vcf.toSorted{it -> (it.name =~ /${tumor}_vs_${normal}_(.*?).vardict.vcf/)[0][1].toInteger()},"vardict","")} | combineVariants_vardict | join(sample_sheet_paired) | map{sample,marked,markedindex,normvcf,normindex,tumor,normal ->tuple(tumor,normal,"vardict",normvcf,normindex)} @@ -445,24 +445,24 @@ workflow VC { //Vardict TOnly if (!params.no_tonly){ - vardict_in_tonly=vardict_tonly(bambyinterval_t) + vardict_in_tonly=vardict_tonly(bambyinterval_t) | groupTuple() - | map{tumor,vcf-> + | map{tumor,vcf-> tuple(tumor,vcf.toSorted{it -> (it.name =~ /${tumor}_(.*?).tonly.vardict.vcf/)[0][1].toInteger()},"vardict_tonly","-i 'SBF<0.1 && QUAL >20 && INFO/DP >20'")} | combineVariants_vardict_tonly | join(sample_sheet) | map{tumor,marked,markedindex,normvcf,normindex,normal ->tuple(tumor,"vardict_tonly",normvcf,normindex)} annotvep_tonly_vardict(vardict_in_tonly) - vc_tonly=vc_tonly|concat(vardict_in_tonly) + vc_tonly=vc_tonly|concat(vardict_in_tonly) } } if ("varscan" in call_list){ //VarScan TN - varscan_in=bambyinterval.combine(contamination_paired.out,by:0) - | varscan_tn | groupTuple(by:[0,1]) - | map{tumor,normal,vcf -> + varscan_in=bambyinterval.combine(contamination_paired.out,by:0) + | varscan_tn | groupTuple(by:[0,1]) + | map{tumor,normal,vcf -> tuple("${tumor}_vs_${normal}",vcf.toSorted{it -> (it.name =~ /${tumor}_vs_${normal}_(.*?).varscan.vcf.gz/)[0][1].toInteger()},"varscan","-i 'SOMATIC==1'")} | combineVariants_varscan | join(sample_sheet_paired) | map{sample,marked,markedindex,normvcf,normindex,tumor,normal ->tuple(tumor,normal,"varscan",normvcf,normindex)} @@ -472,40 +472,40 @@ workflow VC { if (!params.no_tonly){ //VarScan TOnly - varscan_in_tonly=bambyinterval_t.combine(contamination_tumoronly.out,by:0) - | varscan_tonly | groupTuple - | map{tumor,vcf-> tuple(tumor,vcf.toSorted{it -> (it.name =~ /${tumor}_(.*?).tonly.varscan.vcf.gz/)[0][1].toInteger()},"varscan_tonly","")} - | combineVariants_varscan_tonly + varscan_in_tonly=bambyinterval_t.combine(contamination_tumoronly.out,by:0) + | varscan_tonly | groupTuple + | map{tumor,vcf-> tuple(tumor,vcf.toSorted{it -> (it.name =~ /${tumor}_(.*?).tonly.varscan.vcf.gz/)[0][1].toInteger()},"varscan_tonly","")} + | combineVariants_varscan_tonly | join(sample_sheet) | map{tumor,marked,markedindex,normvcf,normindex,normal ->tuple(tumor,"varscan_tonly",normvcf,normindex)} annotvep_tonly_varscan(varscan_in_tonly) - vc_tonly=vc_tonly|concat(varscan_in_tonly) + vc_tonly=vc_tonly|concat(varscan_in_tonly) } - + } //SAGE TN if ("sage" in call_list){ sage_in=sage_tn(bamwithsample) | map{tu,no,vcf,vcfindex-> tuple("${tu}_vs_${no}",vcf,"sage","")} - | combineVariants_sage + | combineVariants_sage | join(sample_sheet_paired) | map{sample,marked,markedindex,normvcf,normindex,tumor,normal->tuple(tumor,normal,"sage",normvcf,normindex)} annotvep_tn_sage(sage_in) vc_all=vc_all | concat(sage_in) - if (!params.no_tonly){ - sage_in_tonly=bamwithsample | map{tumor,tbam,tbai,norm,nbam,nbai -> tuple(tumor,tbam,tbai)} - | sage_tonly - | map{samplename,vcf,vcfindex->tuple(samplename,vcf,"sage_tonly","")} + if (!params.no_tonly){ + sage_in_tonly=bamwithsample | map{tumor,tbam,tbai,norm,nbam,nbai -> tuple(tumor,tbam,tbai)} + | sage_tonly + | map{samplename,vcf,vcfindex->tuple(samplename,vcf,"sage_tonly","")} | combineVariants_sage_tonly - | join(sample_sheet) + | join(sample_sheet) | map{tumor,marked,markedindex,normvcf,normindex,normal ->tuple(tumor,"sage_tonly",normvcf,normindex)} annotvep_tonly_sage(sage_in_tonly) - vc_tonly=vc_tonly | concat(sage_in_tonly) + vc_tonly=vc_tonly | concat(sage_in_tonly) } } @@ -523,14 +523,14 @@ workflow VC { //DeepSomatic TN if ("deepsomatic" in call_list){ - deepsomatic_in = deepsomatic_tn_step1(bambyinterval) - | map{tname,nname,tf,tfjson,bed -> tuple("${tname}_vs_${nname}",tf,tfjson,bed)} - | deepsomatic_step2 - | deepsomatic_step3 | groupTuple - | map{samplename,vcf,vcf_tbi -> + deepsomatic_in = deepsomatic_tn_step1(bambyinterval) + | map{tname,nname,tf,tfjson,bed -> tuple("${tname}_vs_${nname}",tf,tfjson,bed)} + | deepsomatic_step2 + | deepsomatic_step3 | groupTuple + | map{samplename,vcf,vcf_tbi -> tuple(samplename,vcf.toSorted{it -> (it.name =~ /${samplename}_(.*?).bed.vcf.gz/)[0][1].toInteger()},vcf_tbi,"deepsomatic","") } - | combineVariants_deepsomatic + | combineVariants_deepsomatic | join(sample_sheet_paired) | map{sample,marked,markedindex,normvcf,normindex,tumor,normal->tuple(tumor,normal,"deepsomatic",normvcf,normindex)} annotvep_tn_deepsomatic(deepsomatic_in) @@ -539,33 +539,33 @@ workflow VC { //DeepSomatic TOnly if (!params.no_tonly){ - deepsomatic_tonly_in = deepsomatic_tonly_step1(bambyinterval_t) - | deepsomatic_tonly_step2 - | deepsomatic_tonly_step3 | groupTuple + deepsomatic_tonly_in = deepsomatic_tonly_step1(bambyinterval_t) + | deepsomatic_tonly_step2 + | deepsomatic_tonly_step3 | groupTuple - | map{samplename,vcf,vcf_tbi -> + | map{samplename,vcf,vcf_tbi -> tuple(samplename,vcf.toSorted{it -> (it.name =~ /${samplename}_(.*?).bed.vcf.gz/)[0][1].toInteger()},vcf_tbi,"deepsomatic_tonly","") - } + } - | combineVariants_deepsomatic_tonly - | join(sample_sheet) + | combineVariants_deepsomatic_tonly + | join(sample_sheet) | map{tumor,marked,markedindex,normvcf,normindex,normal->tuple(tumor,"deepsomatic_tonly",normvcf,normindex)} annotvep_tonly_deepsomatic(deepsomatic_tonly_in) - vc_tonly=vc_tonly | concat(deepsomatic_tonly_in) - + vc_tonly=vc_tonly | concat(deepsomatic_tonly_in) + } - + } //MuSE TN - if ("muse" in call_list){ + if ("muse" in call_list){ if (!params.exome){ - bamwithsample | map{tuname,tumor,tbai,nname,normal,nbai -> tuple(tuname,tumor,tbai,nname,normal,nbai,"-G")} + bamwithsample | map{tuname,tumor,tbai,nname,normal,nbai -> tuple(tuname,tumor,tbai,nname,normal,nbai,"-G")} | muse_tn }else if (params.exome){ - bamwithsample | map{tuname,tumor,tbai,nname,normal,nbai -> tuple(tuname,tumor,tbai,nname,normal,nbai,"-E")} + bamwithsample | map{tuname,tumor,tbai,nname,normal,nbai -> tuple(tuname,tumor,tbai,nname,normal,nbai,"-E")} | muse_tn } muse_in = muse_tn.out | map{tumor,normal,vcf-> tuple("${tumor}_vs_${normal}",vcf,"muse","")} @@ -580,7 +580,7 @@ workflow VC { if ("octopus" in call_list){ octopus_in=octopus_tn(bambyinterval) | bcftools_index_octopus | groupTuple() - | map{samplename,vcf,vcfindex-> + | map{samplename,vcf,vcfindex-> def sortedVcf = vcf.toSorted{it->(it.name =~ /${samplename}_(.*).octopus.vcf.gz/)[0][1].toInteger()}.unique() def sortedIdx = vcfindex.toSorted{it->(it.name =~ /${samplename}_(.*).octopus.vcf.gz.tbi/)[0][1].toInteger()}.unique() tuple(samplename, sortedVcf, sortedIdx, "octopus", "") @@ -589,33 +589,33 @@ workflow VC { | map{samplename,marked,markedindex,normvcf,normindex -> tuple(samplename.split('_vs_')[0],samplename.split('_vs_')[1],"octopus",normvcf,normindex)} annotvep_tn_octopus(octopus_in) - octopus_in = octopus_in | octopus_convertvcf - | map{tumor,normal,vcf,vcfindex ->tuple(tumor,normal,"octopus",vcf,vcfindex)} + octopus_in = octopus_in | octopus_convertvcf + | map{tumor,normal,vcf,vcfindex ->tuple(tumor,normal,"octopus",vcf,vcfindex)} vc_all=vc_all|concat(octopus_in) //Octopus TOnly - if (!params.no_tonly){ + if (!params.no_tonly){ octopus_in_tonly=octopus_tonly(bambyinterval_t) | bcftools_index_octopus_tonly - | groupTuple() + | groupTuple() | map{samplename,vcf,vcfindex-> def sortedVcf = vcf.toSorted{it->(it.name =~ /${samplename}_(.*).tonly.octopus.vcf.gz/)[0][1].toInteger()}.unique() def sortedIdx = vcfindex.toSorted{it->(it.name =~ /${samplename}_(.*).tonly.octopus.vcf.gz.tbi/)[0][1].toInteger()}.unique() tuple(samplename, sortedVcf, sortedIdx, "octopus_tonly", "") - } + } | combineVariants_octopus_tonly - | join(sample_sheet) + | join(sample_sheet) | map{tumor,marked,markedindex,normvcf,normindex,normal ->tuple(tumor,"octopus_tonly",normvcf,normindex)} annotvep_tonly_octopus(octopus_in_tonly) octopus_in_tonly_sc=octopus_in_tonly | octopus_convertvcf_tonly - | map{tumor,vcf,vcfindex ->tuple(tumor,"octopus_tonly",vcf,vcfindex)} - vc_tonly=vc_tonly|concat(octopus_in_tonly_sc) + | map{tumor,vcf,vcfindex ->tuple(tumor,"octopus_tonly",vcf,vcfindex)} + vc_tonly=vc_tonly|concat(octopus_in_tonly_sc) } - + } - - //FFPE Steps + + //FFPE Steps if(params.ffpe){ vc_ffpe_paired=Channel.empty() vc_ffpe_tonly=Channel.empty() @@ -624,45 +624,45 @@ workflow VC { if('mutect2' in call_list){ mutect2_p1=bamwithsample1 | join(mutect2_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | sobdetect_pass1_mutect2 + | sobdetect_pass1_mutect2 mutect2_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_mutect2 - mutect2_p2 = bamwithsample1 + mutect2_p2 = bamwithsample1 | join(mutect2_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_mutect2.out) + | combine(sobdetect_cohort_params_mutect2.out) | sobdetect_pass2_mutect2 - mutect2_p1_vcfs=mutect2_p1 | map{sample,vcf,info->vcf} | collect + mutect2_p1_vcfs=mutect2_p1 | map{sample,vcf,info->vcf} | collect mutect2_p2_vcfs=mutect2_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} | collect sobdetect_metrics_mutect2(mutect2_p1_vcfs,mutect2_p2_vcfs) - - mutect2_ffpe_out=mutect2_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"mutect2",filtvcf,vcftbi)} + + mutect2_ffpe_out=mutect2_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"mutect2",filtvcf,vcftbi)} annotvep_tn_mut2_ffpe(mutect2_ffpe_out) vc_ffpe_paired=vc_ffpe_paired |concat(mutect2_ffpe_out) - if (!params.no_tonly){ - mutect2_tonly_p1=bamwithsample1 | join(mutect2_in_tonly) + if (!params.no_tonly){ + mutect2_tonly_p1=bamwithsample1 | join(mutect2_in_tonly) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} | sobdetect_pass1_mutect2_tonly mutect2_tonly_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_mutect2_tonly - mutect2_tonly_p2 = bamwithsample1 + mutect2_tonly_p2 = bamwithsample1 | join(mutect2_in_tonly) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_mutect2_tonly.out) + | combine(sobdetect_cohort_params_mutect2_tonly.out) | sobdetect_pass2_mutect2_tonly - mutect2_tonly_p1_vcfs=mutect2_tonly_p1 | map{sample,vcf,info->vcf} |collect + mutect2_tonly_p1_vcfs=mutect2_tonly_p1 | map{sample,vcf,info->vcf} |collect mutect2_tonly_p2_vcfs=mutect2_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_mutect2_tonly(mutect2_tonly_p1_vcfs,mutect2_tonly_p2_vcfs) - - mutect2_tonly_ffpe_out=mutect2_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"mutect2_tonly",filtvcf,vcftbi)} + + mutect2_tonly_ffpe_out=mutect2_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"mutect2_tonly",filtvcf,vcftbi)} annotvep_tonly_mut2_ffpe(mutect2_tonly_ffpe_out) vc_ffpe_tonly=vc_ffpe_tonly |concat(mutect2_tonly_ffpe_out) - + } } @@ -670,44 +670,44 @@ workflow VC { if('octopus' in call_list){ octopus_p1=bamwithsample1 | join(octopus_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | sobdetect_pass1_octopus + | sobdetect_pass1_octopus octopus_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_octopus - octopus_p2 = bamwithsample1 + octopus_p2 = bamwithsample1 | join(octopus_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_octopus.out) + | combine(sobdetect_cohort_params_octopus.out) | sobdetect_pass2_octopus - octopus_p1_vcfs=octopus_p1 | map{sample,vcf,info->vcf} | collect + octopus_p1_vcfs=octopus_p1 | map{sample,vcf,info->vcf} | collect octopus_p2_vcfs=octopus_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} | collect sobdetect_metrics_octopus(octopus_p1_vcfs,octopus_p2_vcfs) - octopus_ffpe_out=octopus_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"octopus",filtvcf,vcftbi)} + octopus_ffpe_out=octopus_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"octopus",filtvcf,vcftbi)} annotvep_tn_octopus_ffpe(octopus_ffpe_out) vc_ffpe_paired=vc_ffpe_paired |concat(octopus_ffpe_out) - if (!params.no_tonly){ - octopus_tonly_p1=bamwithsample1 | join(octopus_in_tonly) + if (!params.no_tonly){ + octopus_tonly_p1=bamwithsample1 | join(octopus_in_tonly) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} | sobdetect_pass1_octopus_tonly octopus_tonly_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_octopus_tonly - octopus_tonly_p2 = bamwithsample1 + octopus_tonly_p2 = bamwithsample1 | join(octopus_in_tonly) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_octopus_tonly.out) + | combine(sobdetect_cohort_params_octopus_tonly.out) | sobdetect_pass2_octopus_tonly - octopus_tonly_p1_vcfs=octopus_tonly_p1 | map{sample,vcf,info->vcf} |collect + octopus_tonly_p1_vcfs=octopus_tonly_p1 | map{sample,vcf,info->vcf} |collect octopus_tonly_p2_vcfs=octopus_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_octopus_tonly(octopus_tonly_p1_vcfs,octopus_tonly_p2_vcfs) - octopus_tonly_ffpe_out=octopus_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"octopus_tonly",filtvcf,vcftbi)} + octopus_tonly_ffpe_out=octopus_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"octopus_tonly",filtvcf,vcftbi)} annotvep_tonly_octopus_ffpe(octopus_tonly_ffpe_out) vc_ffpe_tonly=vc_ffpe_tonly |concat(octopus_tonly_ffpe_out) - } + } } if('strelka' in call_list){ @@ -717,17 +717,17 @@ workflow VC { strelka_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_strelka - strelka_p2 = bamwithsample1 + strelka_p2 = bamwithsample1 | join(strelka_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_strelka.out) + | combine(sobdetect_cohort_params_strelka.out) | sobdetect_pass2_strelka - - strelka_p1_vcfs=strelka_p1 | map{sample,vcf,info->vcf} |collect + + strelka_p1_vcfs=strelka_p1 | map{sample,vcf,info->vcf} |collect strelka_p2_vcfs=strelka_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} | collect sobdetect_metrics_strelka(strelka_p1_vcfs,strelka_p2_vcfs) - strelka_ffpe_out=strelka_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"strelka",filtvcf,vcftbi)} + strelka_ffpe_out=strelka_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"strelka",filtvcf,vcftbi)} annotvep_tn_strelka_ffpe(strelka_ffpe_out) vc_ffpe_paired=vc_ffpe_paired |concat(strelka_ffpe_out) @@ -736,20 +736,20 @@ workflow VC { if('lofreq' in call_list){ lofreq_p1=bamwithsample1 | join(lofreq_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | sobdetect_pass1_lofreq + | sobdetect_pass1_lofreq lofreq_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_lofreq - lofreq_p2 = bamwithsample1 + lofreq_p2 = bamwithsample1 | join(lofreq_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_lofreq.out) + | combine(sobdetect_cohort_params_lofreq.out) | sobdetect_pass2_lofreq - lofreq_p1_vcfs=lofreq_p1 | map{sample,vcf,info->vcf} |collect + lofreq_p1_vcfs=lofreq_p1 | map{sample,vcf,info->vcf} |collect lofreq_p2_vcfs=lofreq_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_lofreq(lofreq_p1_vcfs,lofreq_p2_vcfs) - lofreq_ffpe_out=lofreq_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"lofreq",filtvcf,vcftbi)} + lofreq_ffpe_out=lofreq_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"lofreq",filtvcf,vcftbi)} annotvep_tn_lofreq_ffpe(lofreq_ffpe_out) vc_ffpe_paired=vc_ffpe_paired |concat(lofreq_ffpe_out) } @@ -757,20 +757,20 @@ workflow VC { if('muse' in call_list){ muse_p1=bamwithsample1 | join(muse_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | sobdetect_pass1_muse + | sobdetect_pass1_muse muse_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_muse - muse_p2 = bamwithsample1 + muse_p2 = bamwithsample1 | join(muse_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_muse.out) + | combine(sobdetect_cohort_params_muse.out) | sobdetect_pass2_muse - muse_p1_vcfs=muse_p1 | map{sample,vcf,info->vcf} |collect + muse_p1_vcfs=muse_p1 | map{sample,vcf,info->vcf} |collect muse_p2_vcfs=muse_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_muse(muse_p1_vcfs,muse_p2_vcfs) - muse_ffpe_out=muse_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"muse",filtvcf,vcftbi)} + muse_ffpe_out=muse_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"muse",filtvcf,vcftbi)} annotvep_tn_muse_ffpe(muse_ffpe_out) vc_ffpe_paired=vc_ffpe_paired |concat(muse_ffpe_out) } @@ -778,41 +778,41 @@ workflow VC { if('vardict' in call_list){ vardict_p1=bamwithsample1 | join(vardict_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | sobdetect_pass1_vardict + | sobdetect_pass1_vardict vardict_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_vardict - vardict_p2 = bamwithsample1 + vardict_p2 = bamwithsample1 | join(vardict_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_vardict.out) + | combine(sobdetect_cohort_params_vardict.out) | sobdetect_pass2_vardict - vardict_p1_vcfs=vardict_p1 | map{sample,vcf,info->vcf} |collect + vardict_p1_vcfs=vardict_p1 | map{sample,vcf,info->vcf} |collect vardict_p2_vcfs=vardict_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_vardict(vardict_p1_vcfs,vardict_p2_vcfs) - vardict_ffpe_out=vardict_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"vardict",filtvcf,vcftbi)} + vardict_ffpe_out=vardict_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"vardict",filtvcf,vcftbi)} annotvep_tn_vardict_ffpe(vardict_ffpe_out) vc_ffpe_paired=vc_ffpe_paired |concat(vardict_ffpe_out) - if (!params.no_tonly){ - vardict_tonly_p1=bamwithsample1 | join(vardict_in_tonly) + if (!params.no_tonly){ + vardict_tonly_p1=bamwithsample1 | join(vardict_in_tonly) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} | sobdetect_pass1_vardict_tonly vardict_tonly_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_vardict_tonly - vardict_tonly_p2 = bamwithsample1 + vardict_tonly_p2 = bamwithsample1 | join(vardict_in_tonly) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_vardict_tonly.out) + | combine(sobdetect_cohort_params_vardict_tonly.out) | sobdetect_pass2_vardict_tonly - vardict_tonly_p1_vcfs=vardict_tonly_p1 | map{sample,vcf,info->vcf} |collect + vardict_tonly_p1_vcfs=vardict_tonly_p1 | map{sample,vcf,info->vcf} |collect vardict_tonly_p2_vcfs=vardict_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_vardict_tonly(vardict_tonly_p1_vcfs,vardict_tonly_p2_vcfs) - vardict_tonly_ffpe_out=vardict_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"vardict_tonly",filtvcf,vcftbi)} + vardict_tonly_ffpe_out=vardict_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"vardict_tonly",filtvcf,vcftbi)} annotvep_tonly_vardict_ffpe(vardict_tonly_ffpe_out) vc_ffpe_tonly=vc_ffpe_tonly |concat(vardict_tonly_ffpe_out) } @@ -821,41 +821,41 @@ workflow VC { if('varscan' in call_list){ varscan_p1=bamwithsample1 | join(varscan_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | sobdetect_pass1_varscan + | sobdetect_pass1_varscan varscan_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_varscan - varscan_p2 = bamwithsample1 + varscan_p2 = bamwithsample1 | join(varscan_in,by:[0,1]) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple("${tumor}_vs_${normal}",normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_varscan.out) + | combine(sobdetect_cohort_params_varscan.out) | sobdetect_pass2_varscan - varscan_p1_vcfs=varscan_p1 | map{sample,vcf,info->vcf} |collect + varscan_p1_vcfs=varscan_p1 | map{sample,vcf,info->vcf} |collect varscan_p2_vcfs=varscan_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_varscan(varscan_p1_vcfs,varscan_p2_vcfs) - varscan_ffpe_out=varscan_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"varscan",filtvcf,vcftbi)} + varscan_ffpe_out=varscan_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample.split("_vs_")[0],sample.split("_vs_")[1],"varscan",filtvcf,vcftbi)} annotvep_tn_varscan_ffpe(varscan_ffpe_out) vc_ffpe_paired=vc_ffpe_paired | concat(varscan_ffpe_out) - if (!params.no_tonly){ - varscan_tonly_p1=bamwithsample1 | join(varscan_in_tonly) + if (!params.no_tonly){ + varscan_tonly_p1=bamwithsample1 | join(varscan_in_tonly) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} | sobdetect_pass1_varscan_tonly varscan_tonly_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_varscan_tonly - varscan_tonly_p2 = bamwithsample1 + varscan_tonly_p2 = bamwithsample1 | join(varscan_in_tonly) | map{tumor,normal,tbam,tbai,nbam,nbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_varscan_tonly.out) + | combine(sobdetect_cohort_params_varscan_tonly.out) | sobdetect_pass2_varscan_tonly - varscan_tonly_p1_vcfs=varscan_tonly_p1 | map{sample,vcf,info->vcf} |collect + varscan_tonly_p1_vcfs=varscan_tonly_p1 | map{sample,vcf,info->vcf} |collect varscan_tonly_p2_vcfs=varscan_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_varscan_tonly(varscan_tonly_p1_vcfs,varscan_tonly_p2_vcfs) - varscan_tonly_ffpe_out=varscan_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"varscan_tonly",filtvcf,vcftbi)} + varscan_tonly_ffpe_out=varscan_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"varscan_tonly",filtvcf,vcftbi)} annotvep_tonly_varscan_ffpe(varscan_tonly_ffpe_out) vc_ffpe_tonly=vc_ffpe_tonly |concat(varscan_tonly_ffpe_out) } @@ -873,21 +873,21 @@ workflow VC { | somaticcombine_ffpe | map{tumor,normal,vcf,index ->tuple(tumor,normal,"combined_ffpe",vcf,index)} | annotvep_tn_combined_ffpe - } + } if (!params.no_tonly & call_list_tonly.size()>1){ - vc_tonly | groupTuple() + vc_tonly | groupTuple() | somaticcombine_tonly | map{tumor,vcf,index ->tuple(tumor,"combined_tonly",vcf,index)} | annotvep_tonly_combined } if (!params.no_tonly & call_list_tonly.size()>1 & params.ffpe){ - vc_ffpe_tonly | groupTuple() + vc_ffpe_tonly | groupTuple() | somaticcombine_tonly_ffpe | map{tumor,vcf,index ->tuple(tumor,"combined_tonly_ffpe",vcf,index)} | annotvep_tonly_combined_ffpe } } - + if("sage" in call_list){ somaticcall_input=sage_in @@ -898,10 +898,10 @@ workflow VC { }else{ somaticcall_input=Channel.empty() } - + emit: somaticcall_input - + } @@ -924,7 +924,7 @@ workflow SV { //Manta if ("manta" in svcall_list){ manta_out=manta_somatic(bamwithsample) - manta_out_forsv=manta_out + manta_out_forsv=manta_out | map{tumor,normal,gsv,gsv_tbi,so_sv,so_sv_tbi,unfil_sv,unfil_sv_tbi,unfil_indel,unfil_indel_tbi -> tuple(tumor,so_sv,"manta")} | gunzip_manta //annotsv_manta(manta_out).ifEmpty("Empty SV input--No SV annotated") @@ -934,7 +934,7 @@ workflow SV { //GRIDSS if ("gridss" in svcall_list){ gridss_out=gridss_somatic(bamwithsample) - gridss_out_forsv=gridss_out + gridss_out_forsv=gridss_out | map{tumor,normal,vcf,index,bam,gripssvcf,gripsstbi,gripssfilt,filttbi -> tuple(tumor,gripssfilt,"gridss")} | gunzip_gridss svout=svout | concat(gridss_out_forsv) @@ -943,23 +943,23 @@ workflow SV { if (svcall_list.size()>1){ //Survivor svout | groupTuple - | survivor_sv - | annotsv_survivor_tn + | survivor_sv + | annotsv_survivor_tn | ifEmpty("Empty SV input--No SV annotated") } - + if("gridss" in svcall_list){ - somaticsv_input=gridss_out + somaticsv_input=gridss_out | map{tumor,normal,vcf,index,bam,gripssvcf,gripsstbi,gripssfilt,filttbi -> tuple(tumor,normal,vcf,index,gripssfilt,filttbi)} }else if("manta" in svcall_list){ - somaticsv_input=manta_out + somaticsv_input=manta_out | map{tumor,normal,gsv,gsv_tbi,so_sv,so_sv_tbi,unfil_sv,unfil_sv_tbi,unfil_indel,unfil_indel_tbi -> tuple(tumor,normal,unfil_sv,unfil_sv_tbi,so_sv,so_sv_tbi)} }else{ somaticsv_input=Channel.empty() } - + emit: somaticsv_input } @@ -967,14 +967,14 @@ workflow SV { workflow CNVmouse { take: bamwithsample - + main: cnvcall_list = params.cnvcallers.split(',') as List //Sequenza if ("sequenza" in cnvcall_list){ if (params.exome){ - windowsize=Channel.value(50) + windowsize=Channel.value(50) }else{ windowsize=Channel.value(200) } @@ -998,7 +998,7 @@ workflow CNVmouse { } //FREEC Unpaired Mode - bamwithsample + bamwithsample | map{tname,tumor,tbai,nname,norm,nbai->tuple(tname,tumor,tbai)} | freec } @@ -1019,7 +1019,7 @@ workflow CNVhuman { bamwithsample somaticcall_input - main: + main: if (params.intervals){ intervalbedin = Channel.fromPath(params.intervals,checkIfExists: true,type: 'file') }else{ @@ -1028,7 +1028,7 @@ workflow CNVhuman { cnvcall_list = params.cnvcallers.split(',') as List scinput = somaticcall_input|map{t1,n1,cal,vcf,ind -> tuple("${t1}_vs_${n1}",cal,vcf,ind)} - + //Drop Purple if using Exome if (params.exome && "purple" in cnvcall_list){ cnvcall_list.removeIf { it == 'purple' } @@ -1038,8 +1038,8 @@ workflow CNVhuman { //Purple bamwithsample | amber_tn bamwithsample | cobalt_tn - purplein=amber_tn.out.join(cobalt_tn.out,by:[0,1,2]) - purplein.join(scinput) + purplein=amber_tn.out.join(cobalt_tn.out,by:[0,1,2]) + purplein.join(scinput) | map{id,t1,n1,amber,cobalt,vc,vcf,vcfindex -> tuple(id,t1,n1,amber,cobalt,vcf,vcfindex)} | purple } @@ -1047,7 +1047,7 @@ workflow CNVhuman { //Sequenza if ("sequenza" in cnvcall_list){ if (params.exome){ - windowsize=Channel.value(50) + windowsize=Channel.value(50) }else{ windowsize=Channel.value(200) } @@ -1070,7 +1070,7 @@ workflow CNVhuman { bamwithsample | freec_paired } } - + //ASCAT if ("ascat" in cnvcall_list){ if(params.exome){ @@ -1080,7 +1080,7 @@ workflow CNVhuman { bamwithsample | ascat_tn } } - + //CNVKIT if ("cnvkit" in cnvcall_list){ if(params.exome){ @@ -1097,7 +1097,7 @@ workflow CNVhuman_novc { take: bamwithsample - main: + main: if (params.intervals){ intervalbedin = Channel.fromPath(params.intervals,checkIfExists: true,type: 'file') }else{ @@ -1113,9 +1113,9 @@ workflow CNVhuman_novc { if ("purple" in cnvcall_list){ //Purple - bamwithsample | amber_tn - bamwithsample | cobalt_tn - purplein=amber_tn.out |join(cobalt_tn.out) + bamwithsample | amber_tn + bamwithsample | cobalt_tn + purplein=amber_tn.out |join(cobalt_tn.out) purplein | map{id,t1,n1,amber,t2,n2,cobalt -> tuple(id,t1,n1,amber,cobalt)} | purple_novc } @@ -1123,7 +1123,7 @@ workflow CNVhuman_novc { if ("sequenza" in cnvcall_list){ if (params.exome){ - windowsize=Channel.value(50) + windowsize=Channel.value(50) }else{ windowsize=Channel.value(200) } @@ -1135,7 +1135,7 @@ workflow CNVhuman_novc { | combine(windowsize) | sequenza } - + if ("freec" in cnvcall_list){ //FREEC if(params.exome){ @@ -1156,7 +1156,7 @@ workflow CNVhuman_novc { bamwithsample | ascat_tn } } - + //CNVKIT if ("cnvkit" in cnvcall_list){ if(params.exome){ @@ -1343,20 +1343,20 @@ workflow QC_GL_BAM { conall=qualimap_out.concat(mosdepth_out, samtools_flagstats_out,bcftools_stats_out, gatk_varianteval_out,snpeff_out,vcftools_out,collectvariantcallmetrics_out, - somalier_analysis_out) + somalier_analysis_out) | flatten | toList multiqc(conall) } -//QC NOGL-BAMs +//QC NOGL-BAMs workflow QC_NOGL_BAM { take: bams fullinterval main: - //BQSR BAMs + //BQSR BAMs fastqc(bams) samtools_flagstats(bams) qualimap_bamqc(bams) @@ -1367,7 +1367,7 @@ workflow QC_NOGL_BAM { } - somalier_extract(bams) + somalier_extract(bams) som_in=somalier_extract.out.collect() if(params.genome.matches("hg38(.*)")| params.genome.matches("hg19(.*)")){ somalier_analysis_human(som_in) @@ -1377,15 +1377,15 @@ workflow QC_NOGL_BAM { somalier_analysis_mouse(som_in) somalier_analysis_out=somalier_analysis_mouse.out.collect() } - + //Prep for MultiQC input qualimap_out=qualimap_bamqc.out.map{genome,rep->tuple(genome,rep)}.collect() samtools_flagstats_out=samtools_flagstats.out.collect() conall=qualimap_out.concat( - samtools_flagstats_out,mosdepth_out, + samtools_flagstats_out,mosdepth_out, somalier_analysis_out).flatten().toList() - + multiqc(conall) } @@ -1400,8 +1400,8 @@ workflow INPUT_BAM { row.Tumor, row.Normal ) - } - } + } + } //Either BAM Input or File sheet input if(params.bam_input){ //Check if Index is .bai or .bam.bai @@ -1426,7 +1426,7 @@ workflow INPUT_BAM { .splitCsv(header: false, sep: "\t", strip:true) .map{ sample,bam,bai -> tuple(sample, file(bam),file(bai)) - } + } } if (params.intervals){ intervalbedin = Channel.fromPath(params.intervals,checkIfExists: true, type: 'file') @@ -1434,32 +1434,32 @@ workflow INPUT_BAM { intervalbedin = Channel.fromPath(params.genomes[params.genome].intervals,checkIfExists: true, type: 'file') } matchbed(intervalbedin) | splitinterval - - - if (params.indelrealign){ - bqsrs = baminputonly | indelrealign | combine(splitinterval.out.flatten()) - | bqsr_ir - | groupTuple - | map { samplename,beds -> - tuple( samplename, beds.toSorted{ it -> (it.name =~ /${samplename}_(.*?).recal_data.grp/)[0][1].toInteger() } )} + + + if (params.indelrealign){ + bqsrs = baminputonly | indelrealign | combine(splitinterval.out.flatten()) + | bqsr_ir + | groupTuple + | map { samplename,beds -> + tuple( samplename, beds.toSorted{ it -> (it.name =~ /${samplename}_(.*?).recal_data.grp/)[0][1].toInteger() } )} | gatherbqsr - baminput2=baminputonly.combine(bqsrs,by:0) + baminput2=baminputonly.combine(bqsrs,by:0) |applybqsr - bamwithsample=baminput2.combine(sample_sheet,by:0).map{it.swap(3,0)}.combine(baminputonly,by:0).map{it.swap(3,0)} - | view() + bamwithsample=baminput2.combine(sample_sheet,by:0).map{it.swap(3,0)}.combine(baminputonly,by:0).map{it.swap(3,0)} + | view() }else{ - bamwithsample=baminputonly.combine(sample_sheet,by:0).map{it.swap(3,0)}.combine(baminputonly,by:0).map{it.swap(3,0)} - | view() + bamwithsample=baminputonly.combine(sample_sheet,by:0).map{it.swap(3,0)}.combine(baminputonly,by:0).map{it.swap(3,0)} + | view() } bambyinterval_norm=bamwithsample - | map {tum,tubam,tbai,norm,norbam,norbai -> tuple(norm,norbam,norbai)} + | map {tum,tubam,tbai,norm,norbam,norbai -> tuple(norm,norbam,norbai)} bambyinterval_tum=bamwithsample - | map {tum,tubam,tbai,norm,norbam,norbai -> tuple(tum,tubam,tbai)} + | map {tum,tubam,tbai,norm,norbam,norbai -> tuple(tum,tubam,tbai)} bambyinterval=bambyinterval_tum | concat(bambyinterval_norm) | unique - | combine(splitinterval.out.flatten()) + | combine(splitinterval.out.flatten()) emit: bamwithsample @@ -1469,6 +1469,3 @@ workflow INPUT_BAM { allbam=bambyinterval_tum | concat(bambyinterval_norm) | unique fullinterval=matchbed.out } - - - diff --git a/subworkflows/local/workflows_tonly.nf b/subworkflows/local/workflows_tonly.nf index c17ddd9..f1965e4 100644 --- a/subworkflows/local/workflows_tonly.nf +++ b/subworkflows/local/workflows_tonly.nf @@ -1,4 +1,4 @@ -include {splitinterval; matchbed; matchbed as matchbed_ascat; +include {splitinterval; matchbed; matchbed as matchbed_ascat; matchbed as matchbed_cnvkit} from '../../modules/local/splitbed.nf' include {fc_lane} from '../../modules/local/fc_lane.nf' @@ -21,10 +21,10 @@ include {indelrealign; bqsr_ir; bqsr; gatherbqsr; applybqsr; samtoolsindex } from '../../modules/local/trim_align.nf' -include {pileup_paired as pileup_paired_t; pileup_paired as pileup_paired_n; +include {pileup_paired as pileup_paired_t; pileup_paired as pileup_paired_n; pileup_paired_tonly; learnreadorientationmodel; - mutect2; mutect2filter; contamination_paired; mergemut2stats; + mutect2; mutect2filter; contamination_paired; mergemut2stats; mutect2_t_tonly; mutect2filter_tonly; contamination_tumoronly; learnreadorientationmodel_tonly; @@ -32,7 +32,7 @@ include {pileup_paired as pileup_paired_t; pileup_paired as pileup_paired_n; include {sage_tn; sage_tonly} from '../../modules/local/sage.nf' include {vardict_tn; vardict_tonly} from '../../modules/local/vardict.nf' include {varscan_tn; varscan_tonly} from '../../modules/local/varscan.nf' -include {octopus_tn; bcftools_index_octopus; +include {octopus_tn; bcftools_index_octopus; bcftools_index_octopus as bcftools_index_octopus_tonly; octopus_convertvcf; octopus_tonly; octopus_convertvcf_tonly} from '../../modules/local/octopus.nf' include {deepsomatic_tonly_step1; deepsomatic_tonly_step2; @@ -44,9 +44,9 @@ include {combineVariants as combineVariants_vardict; combineVariants as combineV combineVariants_alternative; combineVariants_alternative as combineVariants_deepsomatic; combineVariants_alternative as combineVariants_deepsomatic_tonly; combineVariants as combineVariants_sage; combineVariants as combineVariants_sage_tonly; - combineVariants_alternative as combineVariants_lofreq; + combineVariants_alternative as combineVariants_lofreq; combineVariants as combineVariants_muse; - combineVariants_alternative as combineVariants_octopus; + combineVariants_alternative as combineVariants_octopus; combineVariants_alternative as combineVariants_octopus_tonly; combinemafs_tn; somaticcombine;somaticcombine as somaticcombine_ffpe; combinemafs_tonly;somaticcombine_tonly;somaticcombine_tonly as somaticcombine_tonly_ffpe} from '../../modules/local/combinefilter.nf' @@ -55,35 +55,35 @@ include {annotvep_tn as annotvep_tn_mut2; annotvep_tn as annotvep_tn_strelka; annotvep_tn as annotvep_tn_varscan; annotvep_tn as annotvep_tn_vardict; annotvep_tn as annotvep_tn_octopus; annotvep_tn as annotvep_tn_lofreq; annotvep_tn as annotvep_tn_muse; annotvep_tn as annotvep_tn_sage; annotvep_tn as annotvep_tn_deepsomatic; - annotvep_tn as annotvep_tn_combined; + annotvep_tn as annotvep_tn_combined; annotvep_tonly as annotvep_tonly_varscan; annotvep_tonly as annotvep_tonly_vardict; - annotvep_tonly as annotvep_tonly_mut2; annotvep_tonly as annotvep_tonly_octopus; + annotvep_tonly as annotvep_tonly_mut2; annotvep_tonly as annotvep_tonly_octopus; annotvep_tonly as annotvep_tonly_sage; annotvep_tonly as annotvep_tonly_deepsomatic; annotvep_tonly as annotvep_tonly_combined} from '../../modules/local/annotvep.nf' -include {sobdetect_pass1 as sobdetect_pass1_mutect2_tonly; sobdetect_pass2 as sobdetect_pass2_mutect2_tonly; +include {sobdetect_pass1 as sobdetect_pass1_mutect2_tonly; sobdetect_pass2 as sobdetect_pass2_mutect2_tonly; sobdetect_metrics as sobdetect_metrics_mutect2_tonly; sobdetect_cohort_params as sobdetect_cohort_params_mutect2_tonly; - sobdetect_pass1 as sobdetect_pass1_octopus_tonly; sobdetect_pass2 as sobdetect_pass2_octopus_tonly; + sobdetect_pass1 as sobdetect_pass1_octopus_tonly; sobdetect_pass2 as sobdetect_pass2_octopus_tonly; sobdetect_metrics as sobdetect_metrics_octopus_tonly; sobdetect_cohort_params as sobdetect_cohort_params_octopus_tonly; - sobdetect_pass1 as sobdetect_pass1_vardict_tonly; sobdetect_pass2 as sobdetect_pass2_vardict_tonly; + sobdetect_pass1 as sobdetect_pass1_vardict_tonly; sobdetect_pass2 as sobdetect_pass2_vardict_tonly; sobdetect_metrics as sobdetect_metrics_vardict_tonly; sobdetect_cohort_params as sobdetect_cohort_params_vardict_tonly; - sobdetect_pass1 as sobdetect_pass1_varscan_tonly; sobdetect_pass2 as sobdetect_pass2_varscan_tonly; + sobdetect_pass1 as sobdetect_pass1_varscan_tonly; sobdetect_pass2 as sobdetect_pass2_varscan_tonly; sobdetect_metrics as sobdetect_metrics_varscan_tonly; sobdetect_cohort_params as sobdetect_cohort_params_varscan_tonly } from "../../modules/local/ffpe.nf" include {annotvep_tonly as annotvep_tonly_varscan_ffpe; annotvep_tonly as annotvep_tonly_vardict_ffpe; - annotvep_tonly as annotvep_tonly_mut2_ffpe; annotvep_tonly as annotvep_tonly_octopus_ffpe; + annotvep_tonly as annotvep_tonly_mut2_ffpe; annotvep_tonly as annotvep_tonly_octopus_ffpe; annotvep_tonly as annotvep_tonly_sage_ffpe; annotvep_tonly as annotvep_tonly_deepsomatic_ffpe; annotvep_tonly as annotvep_tonly_combined_ffpe} from '../../modules/local/annotvep.nf' include {svaba_tonly} from '../../modules/local/svaba.nf' include {manta_tonly} from '../../modules/local/manta.nf' include {gridss_tonly} from '../../modules/local/gridss.nf' -include {survivor_sv; - gunzip as gunzip_manta; gunzip as gunzip_gridss; +include {survivor_sv; + gunzip as gunzip_manta; gunzip as gunzip_gridss; annotsv_tonly as annotsv_survivor_tonly; - annotsv_tonly as annotsv_svaba_tonly; - annotsv_tonly as annotsv_gridss_tonly; + annotsv_tonly as annotsv_svaba_tonly; + annotsv_tonly as annotsv_gridss_tonly; annotsv_tonly as annotsv_manta_tonly} from '../../modules/local/annotsv.nf' include {freec} from '../../modules/local/freec.nf' @@ -94,13 +94,13 @@ include {cnvkit_exome_tonly; cnvkit_tonly } from '../../modules/local/cnvkit.nf' //Workflows workflow INPUT_TONLY { if(params.fastq_input){ - fastqinput=Channel.fromFilePairs(params.fastq_input) + fastqinput=Channel.fromFilePairs(params.fastq_input) }else if(params.fastq_file_input){ fastqinput=Channel.fromPath(params.fastq_file_input) .splitCsv(header: false, sep: "\t", strip:true) - .map{ sample,fq1,fq2 -> - tuple(sample, tuple(file(fq1),file(fq2))) - } + .map{ sample,fq1,fq2 -> + tuple(sample, tuple(file(fq1),file(fq2))) + } } if(params.sample_sheet){ @@ -134,26 +134,26 @@ workflow ALIGN_TONLY { intervalbedin = Channel.fromPath(params.genomes[params.genome].intervals,checkIfExists: true,type: 'file') } matchbed(intervalbedin) | splitinterval - + //Trim and split | align | combine/mark duplicates if (params.split_fastq>0){ - fastp_split(fastqinput) - | flatMap{samplename, fqs, json, html -> + fastp_split(fastqinput) + | flatMap{samplename, fqs, json, html -> def pairsfq = fqs.collate(2) - pairsfq.collect { pair -> + pairsfq.collect { pair -> def chunkId = pair[0].getBaseName().toString().tokenize('_').first().tokenize('.')[0] return [samplename, pair, chunkId] } - } - | BWAMEM2_SPLIT - | groupTuple - | map { samplename,bam -> + } + | BWAMEM2_SPLIT + | groupTuple + | map { samplename,bam -> tuple( samplename, bam.toSorted{ it -> (it.name =~ /${samplename}_(.*?).bam/)[0][1].toInteger() } )} | COMBINE_ALIGNMENTS alignment_out=COMBINE_ALIGNMENTS.out.bams - fastp_out=fastp_split.out - | map{samplename, fqs, json,html -> - fqs.collect {fq -> + fastp_out=fastp_split.out + | map{samplename, fqs, json,html -> + fqs.collect {fq -> return tuple(samplename,fq) } } | flatten() @@ -162,22 +162,22 @@ workflow ALIGN_TONLY { fastqinput | map{sample,fqs -> tuple(sample,fqs[0],fqs[1])}| bwamem2 alignment_out=bwamem2.out }else{ - fastp_out = fastp(fastqinput) | map{sample,f1,f2,json,html -> tuple(sample,f1,f2)} + fastp_out = fastp(fastqinput) | map{sample,f1,f2,json,html -> tuple(sample,f1,f2)} bwamem2(fastp_out) alignment_out=bwamem2.out } - if (params.indelrealign){ - bwaindelre = alignment_out | indelrealign + if (params.indelrealign){ + bwaindelre = alignment_out | indelrealign bqsrbambyinterval=bwaindelre.combine(splitinterval.out.flatten()) bambyinterval=bwaindelre.combine(splitinterval.out.flatten()) bqsr_ir(bqsrbambyinterval) - - bqsrs = bqsr_ir.out - | groupTuple - | map { samplename,beds -> - tuple( samplename, beds.toSorted{ it -> (it.name =~ /${samplename}_(.*?).recal_data.grp/)[0][1].toInteger() } )} + + bqsrs = bqsr_ir.out + | groupTuple + | map { samplename,beds -> + tuple( samplename, beds.toSorted{ it -> (it.name =~ /${samplename}_(.*?).recal_data.grp/)[0][1].toInteger() } )} gatherbqsr(bqsrs) tobqsr=bwaindelre.combine(gatherbqsr.out,by:0) @@ -187,12 +187,12 @@ workflow ALIGN_TONLY { | map{samplename,tumor,tumorbai -> tuple( samplename,tumor,tumorbai)} bambyinterval=bamwithsample.combine(splitinterval.out.flatten()) - }else{ + }else{ bqsrbambyinterval=alignment_out.combine(splitinterval.out.flatten()) bqsr(bqsrbambyinterval) bqsrs=bqsr.out | groupTuple - | map { samplename,beds -> + | map { samplename,beds -> tuple( samplename, beds.toSorted{ it -> (it.name =~ /${samplename}_(.*?).recal_data.grp/)[0][1].toInteger() } )} gatherbqsr(bqsrs) @@ -241,7 +241,7 @@ workflow VC_TONLY { vc_tonly=Channel.empty() - if ("mutect2" in call_list | "varscan" in call_list){ + if ("mutect2" in call_list | "varscan" in call_list){ pileup_paired_tonly(bambyinterval) pileup_paired_tout=pileup_paired_tonly.out.groupTuple() .map{samplename,pileups-> tuple( samplename, @@ -252,17 +252,17 @@ workflow VC_TONLY { //Mutect2 if ("mutect2" in call_list){ - mutect2_t_tonly(bambyinterval) - + mutect2_t_tonly(bambyinterval) + mutect2_t_tonly.out.groupTuple() - | multiMap { tumor,vcfs,f1r2,stats -> + | multiMap { tumor,vcfs,f1r2,stats -> mut2tout_lor: tuple(tumor, f1r2.toSorted{ it -> (it.name =~ /${tumor}_(.*?).f1r2.tar.gz/)[0][1].toInteger() } ) mut2tonly_mstats: tuple( tumor, stats.toSorted{ it -> (it.name =~ /${tumor}_(.*?).tonly.mut2.vcf.gz.stats/)[0][1].toInteger() }) allmut2tonly: tuple(tumor, vcfs.toSorted{ it -> (it.name =~ /${tumor}_(.*?).tonly.mut2.vcf.gz/)[0][1].toInteger() } ) - } + } | set{mut2tonlyout} learnreadorientationmodel_tonly(mut2tonlyout.mut2tout_lor) @@ -271,20 +271,20 @@ workflow VC_TONLY { mut2tonly_filter=mut2tonlyout.allmut2tonly | join(mergemut2stats_tonly.out) | join(learnreadorientationmodel_tonly.out) - | join(contamination_tumoronly.out) + | join(contamination_tumoronly.out) - mutect2_in_tonly=mutect2filter_tonly(mut2tonly_filter) + mutect2_in_tonly=mutect2filter_tonly(mut2tonly_filter) | join(sample_sheet) - | map{tumor,markedvcf,markedindex,finalvcf,finalindex,stats -> tuple(tumor,"mutect2_tonly",finalvcf,finalindex)} + | map{tumor,markedvcf,markedindex,finalvcf,finalindex,stats -> tuple(tumor,"mutect2_tonly",finalvcf,finalindex)} annotvep_tonly_mut2(mutect2_in_tonly) - - vc_tonly=vc_tonly|concat(mutect2_in_tonly) + + vc_tonly=vc_tonly|concat(mutect2_in_tonly) } //VarDict if ("vardict" in call_list){ vardict_in_tonly=vardict_tonly(bambyinterval) | groupTuple() - | map{tumor,vcf-> + | map{tumor,vcf-> tuple(tumor,vcf.toSorted{it -> (it.name =~ /${tumor}_(.*?).tonly.vardict.vcf/)[0][1].toInteger()},"vardict_tonly","-i 'SBF<0.1 && QUAL >20 && INFO/DP >20'")} | combineVariants_vardict_tonly | join(sample_sheet) @@ -293,159 +293,159 @@ workflow VC_TONLY { vc_tonly=vc_tonly.concat(vardict_in_tonly) } - + //VarScan_tonly if ("varscan" in call_list){ varscan_in_tonly=bambyinterval.combine(contamination_tumoronly.out,by: 0) - | varscan_tonly | groupTuple() + | varscan_tonly | groupTuple() | map{tumor,vcf-> tuple(tumor,vcf.toSorted{it -> (it.name =~ /${tumor}_(.*?).tonly.varscan.vcf/)[0][1].toInteger()},"varscan_tonly","")} - | combineVariants_varscan_tonly + | combineVariants_varscan_tonly | join(sample_sheet) - | map{tumor,marked,markedindex,normvcf,normindex ->tuple(tumor,"varscan_tonly",normvcf,normindex)} + | map{tumor,marked,markedindex,normvcf,normindex ->tuple(tumor,"varscan_tonly",normvcf,normindex)} annotvep_tonly_varscan(varscan_in_tonly) vc_tonly=vc_tonly|concat(varscan_in_tonly) } - + //Octopus_tonly if ("octopus" in call_list){ octopus_in_tonly=bambyinterval | octopus_tonly | bcftools_index_octopus | groupTuple() - | map{tumor,vcf,vcfindex -> + | map{tumor,vcf,vcfindex -> def sortedVcf = vcf.toSorted{it -> it.name}.unique() def sortedIdx = vcfindex.toSorted{it -> it.name}.unique() tuple(tumor, sortedVcf, sortedIdx, "octopus_tonly", "") } | combineVariants_alternative | join(sample_sheet) - | map{tumor,marked,markedindex,normvcf,normindex ->tuple(tumor,"octopus_tonly",normvcf,normindex)} + | map{tumor,marked,markedindex,normvcf,normindex ->tuple(tumor,"octopus_tonly",normvcf,normindex)} annotvep_tonly_octopus(octopus_in_tonly) - octopus_in_tonly_sc=octopus_in_tonly | octopus_convertvcf_tonly - | map{tumor,normvcf,normindex ->tuple(tumor,"octopus_tonly",normvcf,normindex)} + octopus_in_tonly_sc=octopus_in_tonly | octopus_convertvcf_tonly + | map{tumor,normvcf,normindex ->tuple(tumor,"octopus_tonly",normvcf,normindex)} vc_tonly=vc_tonly|concat(octopus_in_tonly_sc) } //DeepSomatic Tonly if ("deepsomatic" in call_list){ - deepsomatic_tonly_in=deepsomatic_tonly_step1(bambyinterval) - | deepsomatic_tonly_step2 - | deepsomatic_tonly_step3 | groupTuple - | map{samplename,vcf,vcf_tbi -> + deepsomatic_tonly_in=deepsomatic_tonly_step1(bambyinterval) + | deepsomatic_tonly_step2 + | deepsomatic_tonly_step3 | groupTuple + | map{samplename,vcf,vcf_tbi -> tuple(samplename,vcf.toSorted{it -> (it.name =~ /${samplename}_(.*?).bed.vcf.gz/)[0][1].toInteger()},vcf_tbi,"deepsomatic_tonly","") - } - | combineVariants_deepsomatic_tonly - | join(sample_sheet) + } + | combineVariants_deepsomatic_tonly + | join(sample_sheet) | map{tumor,marked,markedindex,normvcf,normindex->tuple(tumor,"deepsomatic_tonly",normvcf,normindex)} annotvep_tonly_deepsomatic(deepsomatic_tonly_in) - vc_tonly=vc_tonly | concat(deepsomatic_tonly_in) - + vc_tonly=vc_tonly | concat(deepsomatic_tonly_in) + } - + /* //SAGE if ("sage" in call_list){ sage_in_tonly=sage_tonly(bamwithsample) - | groupTuple() - | map{samplename,vcf,vcfindex -> tuple(samplename,vcf,"sage_tonly")} + | groupTuple() + | map{samplename,vcf,vcfindex -> tuple(samplename,vcf,"sage_tonly")} | combineVariants_sage_tonly - | join(sample_sheet) + | join(sample_sheet) | map{tumor,marked,markedindex,normvcf,normindex ->tuple(tumor,"sage_tonly",normvcf,normindex)} annotvep_tonly_sage(sage_in_tonly) - vc_tonly=vc_tonly | concat(sage_in_tonly) + vc_tonly=vc_tonly | concat(sage_in_tonly) } */ - //FFPE Steps + //FFPE Steps if(params.ffpe){ vc_ffpe_tonly=Channel.empty() bamwithsample1=bamwithsample if('mutect2' in call_list){ - mutect2_tonly_p1=bamwithsample1 | join(mutect2_in_tonly) + mutect2_tonly_p1=bamwithsample1 | join(mutect2_in_tonly) | map{tumor,tbam,tbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} | sobdetect_pass1_mutect2_tonly mutect2_tonly_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_mutect2_tonly - mutect2_tonly_p2 = bamwithsample1 + mutect2_tonly_p2 = bamwithsample1 | join(mutect2_in_tonly) | map{tumor,tbam,tbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_mutect2_tonly.out) + | combine(sobdetect_cohort_params_mutect2_tonly.out) | sobdetect_pass2_mutect2_tonly - mutect2_tonly_p1_vcfs=mutect2_tonly_p1 | map{sample,vcf,info->vcf} |collect + mutect2_tonly_p1_vcfs=mutect2_tonly_p1 | map{sample,vcf,info->vcf} |collect mutect2_tonly_p2_vcfs=mutect2_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_mutect2_tonly(mutect2_tonly_p1_vcfs,mutect2_tonly_p2_vcfs) - mutect2_tonly_ffpe_out=mutect2_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"mutect2_tonly",filtvcf,vcftbi)} + mutect2_tonly_ffpe_out=mutect2_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"mutect2_tonly",filtvcf,vcftbi)} annotvep_tonly_mut2_ffpe(mutect2_tonly_ffpe_out) vc_ffpe_tonly=vc_ffpe_tonly |concat(mutect2_tonly_ffpe_out) } if('octopus' in call_list){ - octopus_tonly_p1=bamwithsample1 | join(octopus_in_tonly) + octopus_tonly_p1=bamwithsample1 | join(octopus_in_tonly) | map{tumor,tbam,tbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} | sobdetect_pass1_octopus_tonly octopus_tonly_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_octopus_tonly - octopus_tonly_p2 = bamwithsample1 + octopus_tonly_p2 = bamwithsample1 | join(octopus_in_tonly) | map{tumor,tbam,tbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_octopus_tonly.out) + | combine(sobdetect_cohort_params_octopus_tonly.out) | sobdetect_pass2_octopus_tonly - octopus_tonly_p1_vcfs=octopus_tonly_p1 | map{sample,vcf,info->vcf} |collect + octopus_tonly_p1_vcfs=octopus_tonly_p1 | map{sample,vcf,info->vcf} |collect octopus_tonly_p2_vcfs=octopus_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_octopus_tonly(octopus_tonly_p1_vcfs,octopus_tonly_p2_vcfs) - octopus_tonly_ffpe_out=octopus_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"octopus_tonly",filtvcf,vcftbi)} + octopus_tonly_ffpe_out=octopus_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"octopus_tonly",filtvcf,vcftbi)} annotvep_tonly_octopus_ffpe(octopus_tonly_ffpe_out) vc_ffpe_tonly=vc_ffpe_tonly |concat(octopus_tonly_ffpe_out) } if('vardict' in call_list){ - vardict_tonly_p1=bamwithsample1 | join(vardict_in_tonly) + vardict_tonly_p1=bamwithsample1 | join(vardict_in_tonly) | map{tumor,tbam,tbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} | sobdetect_pass1_vardict_tonly vardict_tonly_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_vardict_tonly - vardict_tonly_p2 = bamwithsample1 + vardict_tonly_p2 = bamwithsample1 | join(vardict_in_tonly) | map{tumor,tbam,tbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_vardict_tonly.out) + | combine(sobdetect_cohort_params_vardict_tonly.out) | sobdetect_pass2_vardict_tonly - vardict_tonly_p1_vcfs=vardict_tonly_p1 | map{sample,vcf,info->vcf} |collect + vardict_tonly_p1_vcfs=vardict_tonly_p1 | map{sample,vcf,info->vcf} |collect vardict_tonly_p2_vcfs=vardict_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_vardict_tonly(vardict_tonly_p1_vcfs,vardict_tonly_p2_vcfs) - vardict_tonly_ffpe_out=vardict_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"vardict_tonly",filtvcf,vcftbi)} + vardict_tonly_ffpe_out=vardict_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"vardict_tonly",filtvcf,vcftbi)} annotvep_tonly_vardict_ffpe(vardict_tonly_ffpe_out) vc_ffpe_tonly=vc_ffpe_tonly |concat(vardict_tonly_ffpe_out) } if('varscan' in call_list){ - varscan_tonly_p1=bamwithsample1 | join(varscan_in_tonly) + varscan_tonly_p1=bamwithsample1 | join(varscan_in_tonly) | map{tumor,tbam,tbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} | sobdetect_pass1_varscan_tonly varscan_tonly_p1 | map{sample,vcf,info->info} | collect | sobdetect_cohort_params_varscan_tonly - varscan_tonly_p2 = bamwithsample1 + varscan_tonly_p2 = bamwithsample1 | join(varscan_in_tonly) | map{tumor,tbam,tbai,vc,normvcf,tbi->tuple(tumor,normvcf,tbam,vc)} - | combine(sobdetect_cohort_params_varscan_tonly.out) + | combine(sobdetect_cohort_params_varscan_tonly.out) | sobdetect_pass2_varscan_tonly - varscan_tonly_p1_vcfs=varscan_tonly_p1 | map{sample,vcf,info->vcf} |collect + varscan_tonly_p1_vcfs=varscan_tonly_p1 | map{sample,vcf,info->vcf} |collect varscan_tonly_p2_vcfs=varscan_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->vcf} |collect sobdetect_metrics_varscan_tonly(varscan_tonly_p1_vcfs,varscan_tonly_p2_vcfs) - varscan_tonly_ffpe_out=varscan_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"varscan_tonly",filtvcf,vcftbi)} + varscan_tonly_ffpe_out=varscan_tonly_p2 | map{sample,vcf,info,filtvcf,vcftbi->tuple(sample,"varscan_tonly",filtvcf,vcftbi)} annotvep_tonly_varscan_ffpe(varscan_tonly_ffpe_out) vc_ffpe_tonly=vc_ffpe_tonly |concat(varscan_tonly_ffpe_out) } @@ -457,9 +457,9 @@ workflow VC_TONLY { if (call_list.size()>1){ vc_tonly - | groupTuple() - | somaticcombine_tonly - | map{tumor,vcf,index ->tuple(tumor,"combined_tonly",vcf,index)} + | groupTuple() + | somaticcombine_tonly + | map{tumor,vcf,index ->tuple(tumor,"combined_tonly",vcf,index)} | annotvep_tonly_combined } @@ -481,8 +481,8 @@ workflow VC_TONLY { workflow SV_TONLY { take: bamwithsample - - main: + + main: svcall_list = params.svcallers.split(',') as List svout=Channel.empty() @@ -491,15 +491,15 @@ workflow SV_TONLY { svaba_out=svaba_tonly(bamwithsample) .map{ tumor,bps,contigs,discord,alignments,so_indel,so_sv,unfil_so_indel,unfil_sv,log -> - tuple(tumor,so_sv,"svaba_tonly")} + tuple(tumor,so_sv,"svaba_tonly")} annotsv_svaba_tonly(svaba_out).ifEmpty("Empty SV input--No SV annotated") svout=svout | concat(svaba_out) } //Manta if ("manta" in svcall_list){ manta_out=manta_tonly(bamwithsample) - .map{tumor, sv, svtbi, indel, indeltbi, tumorsv, tumorsvtbi-> - tuple(tumor,tumorsv,"manta_tonly")} + .map{tumor, sv, svtbi, indel, indeltbi, tumorsv, tumorsvtbi-> + tuple(tumor,tumorsv,"manta_tonly")} annotsv_manta_tonly(manta_out).ifEmpty("Empty SV input--No SV annotated") svout=svout | concat(manta_out) } @@ -509,7 +509,7 @@ workflow SV_TONLY { gridss_out=gridss_tonly(bamwithsample) gridss_out_forsv=gridss_out | map{tumor,vcf,index,bam,gripssvcf,gripsstbi,gripssfilt,filttbi -> - tuple(tumor,gripssfilt,"gridss_tonly")} | gunzip_gridss + tuple(tumor,gripssfilt,"gridss_tonly")} | gunzip_gridss annotsv_gridss_tonly(gridss_out_forsv).ifEmpty("Empty SV input--No SV annotated") svout=svout | concat(gridss_out_forsv) } @@ -517,24 +517,24 @@ workflow SV_TONLY { //Survivor if (svcall_list.size()>1){ //Survivor - svout | groupTuple - | survivor_sv - | annotsv_survivor_tonly + svout | groupTuple + | survivor_sv + | annotsv_survivor_tonly | ifEmpty("Empty SV input--No SV annotated") } if("gridss" in svcall_list){ - somaticsv_input=gridss_out + somaticsv_input=gridss_out | map{tumor,vcf,index,bam,gripssvcf,gripsstbi,gripssfilt,filttbi -> tuple(tumor,vcf,index,gripsstbi,gripssfilt,filttbi)} }else if("manta" in svcall_list){ - somaticsv_input=manta_out - .map{tumor, sv, svtbi, indel, indeltbi, tumorsv, tumorsvtbi-> + somaticsv_input=manta_out + .map{tumor, sv, svtbi, indel, indeltbi, tumorsv, tumorsvtbi-> tuple(tumor,sv,svtbi,tumorsv,tumorsvtbi)} }else{ somaticsv_input=Channel.empty() } - + emit: somaticsv_input @@ -545,8 +545,8 @@ workflow SV_TONLY { workflow CNVmouse_tonly { take: bamwithsample - - main: + + main: cnvcall_list = params.cnvcallers.split(',') as List if ("freec" in cnvcall_list){ @@ -569,12 +569,12 @@ workflow CNVhuman_tonly { bamwithsample somaticcall_input - main: + main: cnvcall_list = params.cnvcallers.split(',') as List if ("freec" in cnvcall_list){ //FREEC-Unpaired only - bamwithsample | freec + bamwithsample | freec } if (params.exome && "purple" in cnvcall_list ){ @@ -586,12 +586,12 @@ workflow CNVhuman_tonly { bamwithsample | amber_tonly bamwithsample | cobalt_tonly purplein=amber_tonly.out.join(cobalt_tonly.out) - purplein.join(somaticcall_input)| - map{t1,amber,cobalt,vc,vcf,index -> tuple(t1,amber,cobalt,vcf,index)} + purplein.join(somaticcall_input)| + map{t1,amber,cobalt,vc,vcf,index -> tuple(t1,amber,cobalt,vcf,index)} | purple_tonly } - //CNVKIT + //CNVKIT if ("cnvkit" in cnvcall_list){ if(params.exome){ matchbed_cnvkit(intervalbedin) @@ -600,21 +600,21 @@ workflow CNVhuman_tonly { bamwithsample | cnvkit_tonly } } - + } workflow CNVhuman_novc_tonly { take: bamwithsample - main: + main: cnvcall_list = params.cnvcallers.split(',') as List if ("freec" in cnvcall_list){ //FREEC-Unpaired only - bamwithsample | freec - } - + bamwithsample | freec + } + if (params.exome && "purple" in cnvcall_list){ cnvcall_list.removeIf { it == 'purple' } } @@ -624,7 +624,7 @@ workflow CNVhuman_novc_tonly { bamwithsample | amber_tonly bamwithsample | cobalt_tonly purplein=amber_tonly.out.join(cobalt_tonly.out) - map{t1,amber,cobalt -> tuple(t1,amber,cobalt)} + map{t1,amber,cobalt -> tuple(t1,amber,cobalt)} | purple_tonly_novc } @@ -645,14 +645,14 @@ workflow QC_TONLY { fastpout bqsrout fullinterval - + main: //QC Steps For Tumor-Only-No Germline Variant QC fc_lane(fastqin) fastq_screen(fastpout) kraken(fastqin) - //BQSR BAMs + //BQSR BAMs fastqc(bqsrout) samtools_flagstats(bqsrout) qualimap_bamqc(bqsrout) @@ -663,20 +663,20 @@ workflow QC_TONLY { } - somalier_extract(bqsrout) + somalier_extract(bqsrout) som_in=somalier_extract.out.collect() - if(params.genome.matches("hg38(.*)")| params.genome.matches("hg19(.*)")){ + if(params.genome.matches("hg38(.*)")| params.genome.matches("hg19(.*)")){ somalier_analysis_human(som_in) somalier_analysis_out=somalier_analysis_human.out.collect() } - else if(params.genome.matches("mm10")){ + else if(params.genome.matches("mm10")){ somalier_analysis_mouse(som_in) somalier_analysis_out=somalier_analysis_mouse.out.collect() } - + //Prep for MultiQC input fclane_out=fc_lane.out.map{samplename,info->info}.collect() - fqs_out=fastq_screen.out.collect() + fqs_out=fastq_screen.out.collect() kraken_out=kraken.out.map{samplename,taxa,krona -> tuple(taxa,krona)}.collect() qualimap_out=qualimap_bamqc.out.map{genome,rep->tuple(genome,rep)}.collect() @@ -684,21 +684,21 @@ workflow QC_TONLY { samtools_flagstats_out=samtools_flagstats.out.collect() conall=fclane_out.concat(fqs_out,kraken_out,qualimap_out,fastqc_out, - samtools_flagstats_out,mosdepth_out, + samtools_flagstats_out,mosdepth_out, somalier_analysis_out).flatten().toList() - + multiqc(conall) } -//QC Tumor Only-BAMs +//QC Tumor Only-BAMs workflow QC_TONLY_BAM { take: bams fullinterval - + main: - //BQSR BAMs + //BQSR BAMs fastqc(bams) samtools_flagstats(bams) qualimap_bamqc(bams) @@ -709,7 +709,7 @@ workflow QC_TONLY_BAM { } - somalier_extract(bams) + somalier_extract(bams) som_in=somalier_extract.out.collect() if(params.genome.matches("hg38(.*)")| params.genome.matches("hg19(.*)")){ somalier_analysis_human(som_in) @@ -719,23 +719,23 @@ workflow QC_TONLY_BAM { somalier_analysis_mouse(som_in) somalier_analysis_out=somalier_analysis_mouse.out.collect() } - + //Prep for MultiQC input qualimap_out=qualimap_bamqc.out.map{genome,rep->tuple(genome,rep)}.collect() samtools_flagstats_out=samtools_flagstats.out.collect() conall=qualimap_out | concat( - samtools_flagstats_out,mosdepth_out, - somalier_analysis_out) + samtools_flagstats_out,mosdepth_out, + somalier_analysis_out) | flatten | toList - + multiqc(conall) } //Variant Calling from BAM only workflow INPUT_TONLY_BAM { main: - //Either BAM Input or File sheet input + //Either BAM Input or File sheet input if(params.bam_input){ bambai = params.bam_input + ".bai" baionly = bambai.replace(".bam", "") @@ -744,7 +744,7 @@ workflow INPUT_TONLY_BAM { if (bamcheck1.size()>0){ baminputonly=Channel.fromPath(params.bam_input) - | map{it-> tuple(it.simpleName,it,file("${it}.bai"))} + | map{it-> tuple(it.simpleName,it,file("${it}.bai"))} }else if (bamcheck2.size()>0){ bai=Channel.from(bamcheck2).map{it -> tuple(it.simpleName,it)} baminputonly=Channel.fromPath(params.bam_input) @@ -760,7 +760,7 @@ workflow INPUT_TONLY_BAM { }else if(params.bam_file_input) { baminputonly=Channel.fromPath(params.bam_file_input) .splitCsv(header: false, sep: "\t", strip:true) - .map{ sample,bam,bai -> + .map{ sample,bam,bai -> tuple(sample, file(bam),file(bai)) } @@ -773,7 +773,7 @@ workflow INPUT_TONLY_BAM { intervalbedin = Channel.fromPath(params.genomes[params.genome].intervals,checkIfExists: true,type: 'file') } matchbed(intervalbedin) | splitinterval - + bamwithsample=baminputonly emit: @@ -781,7 +781,6 @@ workflow INPUT_TONLY_BAM { splitout=splitinterval.out sample_sheet fullinterval=matchbed.out - - -} + +}