fix: workbench pipeline execution on Google Batch#1
Draft
samhornstein wants to merge 7 commits into
Draft
Conversation
Fixes several issues preventing FloRes from running on Verily Workbench: - wb/run.sh now passes -profile and -c flags to nextflow - Adds params_google_batch.config with all gs:// paths for cloud execution - Restores params.config to local-only defaults (no hardcoded google-batch executor) - Moves process resource declarations into config/google_batch.config - Parameterizes hardcoded bucket names with GCS_REF_BUCKET env var
…ecution
Apply learnings from AMR workbench conversion: add fastp/bowtie2/make to
container, pin nextflow=24, COPY bin/ to /opt/amrplusplus/bin, replace all
$baseDir/bin/ refs with container paths, and use ${task.cpus} instead of
${threads} in bwa/trimmomatic modules.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rkbench Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add resource overrides for bowtie2_index and bwa index processes
(n2-highmem-8, 64GB) to prevent OOM kills on full genome builds
- Fix multiqc output naming for newer multiqc versions by adding
--outdir and --filename flags
- Add errorStrategy 'ignore' to runbracken for empty taxonomic levels
- Add Domain ('D') taxonomic level to kraken2_long_to_wide_update.py
- Use Nextflow-uploaded bin/ scripts instead of container-baked paths
in krakenresults process
- Fix GCS glob handling in host index loading by removing Paths.get()
- Add pre-built GRCh38 bowtie2 host index to skip 50-min build step
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use consistent container path for krakenresults script (/opt/amrplusplus/bin/ instead of $HOME/.nextflow-bin/) - Gitignore wb.env and gcp.env (workspace-specific config) - Add setup instructions to wb.env.template - Add helpful error message when env file is missing - Make runbracken errorStrategy targeted to exit code 1 only Note: container image must be rebuilt to include the Domain 'D' fix in kraken2_long_to_wide_update.py for krakenresults to work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
bowtie2_indexandbwa indexprocess overrides (n2-highmem-8, 64GB) to prevent OOM kills on full genome buildsflores-workbench, addfastp,bowtie2,make, bundlebin/scripts, pinnextflow=24--outdir/--filenameflags)errorStrategy(ignore exit code 1 only) for empty taxonomic levelsD) taxonomic level tokraken2_long_to_wide_update.py; use container-baked script paths consistentlyPaths.get()infastq_host_removal.nf; add pre-built index toparams_google_batch.configwb.env,gcp.env); add setup instructions to templates${threads}with${task.cpus}throughoutTest plan
-resume), 22 min, all 18 process types passedkraken2_long_to_wide_update.py🤖 Generated with Claude Code